Escolar Documentos
Profissional Documentos
Cultura Documentos
html
We will examine data concerning college enrollment. First, read the data (provided via a csv file)
into R.
1 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
str(colleges.df)
2 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
dim(colleges.df)
## [1] 777 15
Added to the data is a graduation rate category, with graduation rates from 0 to 33% labeled low,
34 to 70% labeled 'medium', and 81 and over labeled 'high'.
colleges.df$graduation.category
3 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
Q1) Create a histogram using the attribute graduation rate (graduation.rate) and map the
gradudation category to the fill aesthetic; the breaks should appear in 5 unit increments.
Q2) Create a density plot for alumni donation percentage (perc.alumni); smooth the curve (adjust
factor to 10) and fill it with the 'private' attribute (set the transparency to 40%).
4 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
Let's start with the assumption that students wish to graduate. Accordingly, demand
(application.count) for colleges with high graduate rates (graduation.rate) should be relatively
high.
Q3) Create a scatterplot examining this assumption about graduation rate and demand. Include a
line (colored as darkgoldenrod3) representing the relationship in the plot (do not include the
confidence interval). Label the x axis 'Graduation Rate' and the y axis 'Applications'. Only show
colleges with 25,000 or fewer applications (by manipulating the graph presentation, not by
manipulating the dataset). Color the glyphs skyblue.
5 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
In a sentence, how would you interpret the relationship between the two variables?
As we can see in the scatterplot, the darkgoldenrod3 line represents that graduation
rate and application count are positively correlated.
The relationship between graduation rates and applications may be better clarified by looking at
segments of graduation rates. In other words, applicants do not respond to slight difference in
graduation rates (10% vs. 12%) but material differences (10% vs. 50%).
Q4) Create a single graph with three boxplots, one for each graduation category (the data frame
derived attribute), for applications, restricting the y axis to 7500.
6 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
In a sentence, how
would you interpret the relationship as evidenced by the boxplots?
As we can see in the scatterplot, the median of the high graduation category boxplot
is higher than the other two boxplots.So we can say that, as the application count for
high graduation rate category has high application count.
It seems reasonable to assert that students performing better in high school (as measured by
finishing in the top 10%) would seek 'better' schools all else equal (still, there may be selection
biases whereby better performing students earn admittance to more difficult schools which would
mitigate the relationship empirically). We will define 'better'
7 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
In a sentence, how
would you interpret the relationship as evidenced by the scatterplot?
We can see in the scatterplot that there is negative linear relationship between
high school performance to school selectiveness.
8 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
In a sentence, how
would you interpret the relationship as evidenced by the scatterplot?
It is evident from the scatterplot that student are more attracted to the public
colleges than the private colleges.
To see if a similar relationship exists with selectiveness and attractiveness and percent of
student body in the top 25% of their high school, we simply need to establish that the 10%
and 25% are correlated.
Q7) Create a scatterplot using the top 10% and top 25% attributes.
9 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
In a sentence, how
would you interpret the relationship as evidenced by the scatterplot?
10 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
In a sentence,
describe how the school that is below .5 on the y axis and above 2.5 on the x axis is
different than the overall trend represented in the graph.
As we can see in the bubblechart, public and private schools are scattered
throughout. However,the school that is below .5 on the y axis and above 2.5 on
the x axis are public schools.
Why are the public schools clustered where they are on the graph.
For y between 0.4 to 0.6 and for x between 0.25 to 1 public schools are
clustered.
Q9) The code for a scatterplot showing applications (count) to acceptance (count) by
college is below. Note that the axis ranges have been set the same so a numerical as well
as relative selectiveness can be read from the graph.
11 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
A pocket of exclusive schools appears in the range of applications over 10,000 with
acceptance less than 5,000. Create the same scatterplot but with an x axis between 10000
and 20000 and a y axis between 0 and 5000. Include in this graph the names of the
schools close to the data points (black text, size aesthetic set to 4). NOTE: We have not
covered the text labels in class. This question is included to have you research on your own
some of the qplot functions.
12 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
Q10) Propose your own hypothesis of data relationships (only using the data available in
the colleges.csv file) by editing the rmd file below with your proposition. Explore your
proposition with the creation of a graph that you have defined.
Hypothesis: Colleges having high expenditures have high donations from alumni.
13 of 14 10/4/2016 12:07 PM
qplot from the ggplot2 package file:///E:/Kelly/Homework2_qplot.html
As we can see in the scatterplot, the colleges having more expenditure have
more percent alumni donating to the school.
14 of 14 10/4/2016 12:07 PM