Escolar Documentos
Profissional Documentos
Cultura Documentos
with
ggplot2
Exploratory
Data
Analysis
What
is
ggplot2?
An
implementa:on
of
the
Grammar
of
Graphics
by
Leland
Wilkinson
Wri@en
by
Hadley
Wickham
(while
he
was
a
graduate
student
at
Iowa
State)
A
third
graphics
system
for
R
(along
with
base
and
la&ce)
Available
from
CRAN
via
install.packages()!
Web
site:
h@p://ggplot2.org
(be@er
documenta:on)
What
is
ggplot2?
Grammar
of
graphics
represents
and
abstrac:on
of
graphics
ideas/objects
Think
verb,
noun,
adjec:ve
for
graphics
Allows
for
a
theory
of
graphics
on
which
to
build
new
graphics
and
graphics
objects
Shorten
the
distance
from
mind
to
page
Grammar
of
Graphics
In
brief,
the
grammar
tells
us
that
a
sta:s:cal
graphic
is
a
mapping
from
data
to
aesthe/c
a@ributes
(colour,
shape,
size)
of
geometric
objects
(points,
lines,
bars).
The
plot
may
also
contain
sta:s:cal
transforma:ons
of
the
data
and
is
drawn
on
a
specic
coordinate
system
from
ggplot2
book
Example
Dataset
Factor
label
informa:on
important
for
annota:on
40
30
hwy
20
x coord
displ
y
coord
data
frame
Modifying
aesthe:cs
auto
legend
placement
40
hwy
30
20
drv
displ
color aesthe:c
Adding a geom
40
hwy
30
20
displ
Histograms
30
count
drv
4
20
f
r
10
0
10
20
30
40
hwy
Facets
qplot(displ,
hwy,
data
=
mpg,
facets
=
.
~
drv)
30
4
20
10
40
0
30
20
count
30
10
0
30
20
10
20
f
hwy
0
5
displ
10
20
30
40
hwy
MAACS
Cohort
Mouse
Allergen
and
Asthma
Cohort
Study
Bal:more
children
(aged
517)
Persistent
asthma,
exacerba:on
in
past
year
Study
indoor
environment
and
its
rela:onship
with
asthma
morbidity
Recent
publica:on:
h@p://goo.gl/WqE9j8
Exhaled
nitric
oxide
Example: MAACS
Fine
par:culate
ma@er
Sensi:zed
to
mouse
allergen
Histogram
of
eNO
40
count
30
20
10
0
2
log(eno)
Histogram
by
Group
40
count
30
mopos
no
20
yes
10
0
2
log(eno)
Density
Smooth
0.4
0.4
mopos
density
density
0.3
0.2
no
yes
0.2
0.1
0.0
2
log(eno)
0.0
2
log(eno)
2
6
mopos
no
yes
log(pm25)
log(pm25)
0
log(eno)
log(eno)
log(eno)
mopos
no
yes
log(pm25)
log(eno)
mopos
no
yes
log(pm25)
qplot(log(pm25), log(eno), data = maacs, color = mopos, geom = c("point", "smooth"), method = "lm")
yes
log(eno)
log(pm25)
qplot(log(pm25), log(eno), data = maacs, geom = c("point", "smooth"), method = "lm", facets = . ~ mopos)
Summary
of
qplot()
The
qplot()
func:on
is
the
analog
to
plot()
but
with
many
built-in
features
Syntax
somewhere
in
between
base/la$ce
Produces
very
nice
graphics,
essen:ally
publica:on
ready
(if
you
like
the
design)
Dicult
to
go
against
the
grain/customize
(dont
bother;
use
full
ggplot2
power
in
that
case)
Resources
The
ggplot2
book
by
Hadley
Wickham
The
R
Graphics
Cookbook
by
Winston
Chang
(examples
in
base
plots
and
in
ggplot2)
ggplot2
web
site
(h@p://ggplot2.org)
ggplot2
mailing
list
(h@p://goo.gl/OdW3uB),
primarily
for
developers
What
is
ggplot2?
An
implementa:on
of
the
Grammar
of
Graphics
by
Leland
Wilkinson
Grammar
of
graphics
represents
and
abstrac:on
of
graphics
ideas/objects
Think
verb,
noun,
adjec:ve
for
graphics
Allows
for
a
theory
of
graphics
on
which
to
build
new
graphics
and
graphics
objects
Basic
Plot
normal weight
NocturnalSympt
10
overweight
0.5
2.0
2.5
0.5
1.5
1.0
1.0
1.5
2.0
2.5
logpm25
Building
Up
in
Layers
> head(maacs)!
logpm25
bmicat NocturnalSympt!
2 1.5361795 normal weight
1!
Data
Frame
3 1.5905409 normal weight
0!
4 1.5217786 normal weight
0!
5 1.4323277 normal weight
0!
Aesthe:cs
6 1.2762320
overweight
8!
8 0.7139103
overweight
0!
Ini:al
call
to
!
ggplot
> g <- ggplot(maacs, aes(logpm25, NocturnalSympt))!
!
> summary(g)!
data: logpm25, bmicat, NocturnalSympt [554x3]!
Summary
of
mapping: x = logpm25, y = NocturnalSympt!
ggplot
object
faceting: facet_null() !
!
No
Plot
Yet!
> g <- ggplot(maacs, aes(logpm25, NocturnalSympt))!
> print(g)!
Error: No layers in plot!
!
> p <- g + geom_point()!
Explicitly
save
and
print
> print(p)!
ggplot
object
!
> g + geom_point()!
Auto-print
plot
object
without
saving
10
0.5
1.5
1.0
2.0
2.5
logpm25
NocturnalSympt
NocturnalSympt
10
10
0.5
1.0
1.5
2.0
logpm25
g + geom_point() + geom_smooth()!
2.5
0.5
1.0
1.5
2.0
2.5
logpm25
g + geom_point() + geom_smooth(method = "lm)!
Labels
from
facet
variable
normal weight
10
0.5
2.0
2.5
logpm25
0.5
1.5
1.0
Add facets
overweight
1.0
1.5
2.0
2.5
Annota:on
Labels:
xlab(),
ylab(),
labs(),
gg:tle()
Each
of
the
geom
func:ons
has
op:ons
to
modify
For
things
that
only
make
sense
globally,
use
theme()
Example:
theme(legend.posi:on
=
"none")
10
NocturnalSympt
NocturnalSympt
Modifying Aesthe:cs
10
bmicat
normal weight
overweight
5
0.5
1.0
1.5
2.0
2.5
logpm25
g + geom_point(color = "steelblue,
size = 4, alpha = 1/2)!
Constant values
0.5
1.0
1.5
2.0
2.5
logpm25
g + geom_point(aes(color = bmicat),
size = 4, alpha = 1/2)!
Data variable
Modifying
Labels
MAACS Cohort
Nocturnal Symptoms
10
0.5
bmicat
normal weight
overweight
1.0
1.5
2.0
2.5
log PM2.5
g + geom_point(aes(color = bmicat)) + labs(title = "MAACS Cohort") + labs(x = expression("log "
* PM[2.5]), y = "Nocturnal Symptoms")!
NocturnalSympt
Modied
smoother
0
0.5
1.0
1.5
2.0
2.5
logpm25
10
0.5
bmicat
normal weight
overweight
1.0
1.5
2.0
2.5
logpm25
100
50
testdat$y
75
25
20
40
60
80
100
testdat$x
testdat <- data.frame(x = 1:100, y = rnorm(100))!
testdat[50,2] <- 100 ## Outlier!!
plot(testdat$x, testdat$y, type = "l", ylim = c(-3,3))!
0
0
25
50
75
100
Outlier missing
Axis Limits
Outlier included
1
2
3
0
25
50
75
100
25
50
75
100
x
g + geom_line() + coord_cartesian(ylim = c(-3, 3))!
Non-default font
Final Plot
Mul:ple panels
Transparent
points
Smoother
Labels/Title
Summary
ggplot2
is
very
powerful
and
exible
if
you
learn
the
grammar
and
the
various
elements
that
can
be
tuned/modied
Many
more
types
of
plots
can
be
made;
explore
and
mess
around
with
the
package
(references
men:oned
in
Part
1
are
useful)