Escolar Documentos
Profissional Documentos
Cultura Documentos
Abstract
In this paper we test a claim made about the quality of American
universities by performing statistical analysis on data recorded in 2005[1].
Introduction
Objectives
3
3.1
Test
Preparing the data for use with R
Before testing, we needed to make the dataset compatible with the R environment. We first wrote a script for inserting comma (,) delimiters into the raw
textual data; we then used spreadsheet software to parse the data into tabular
form. Next, we used Rs read.xlsx2() function to directly import the tabular
data as a variable.
3.2
3.3
Histograms
Next we generated histograms for the north and south graduation rates, respectively. Each reveals an approximately normal distribution (though the first
dataset may appear to be slightly skewed):
3.4
Summary Statistics
We then used R to compute all relevant summary statistics for both samples.
To do this, we typed the following:
> summary(graten)
Min. 1st Qu. Median
16.00
55.00
67.00
> summary(grates)
Min. 1st Qu. Median
8.00
40.25
53.00
Max.
100.00
Max.
100.00
Thus, the 5 number summary for graduation rates of colleges sampled in the
north is [16, 55, 67, 80, 100], while the 5 number summary for graduation rates
of colleges sampled in the south is [8, 40.25, 53, 66, 100]. We also see that the
N = 66.86, while the
mean graduation rate of colleges sampled in the north is X
S = 54.17.
mean graduation rate of colleges sampled in the south is X
We also used R to compute the sample variances:
> var(graten)
[1] 317.0165
> var(grates)
[1] 348.3213
Thus s2N = 317.0165 and s2S = 348.3213.
3.5
Inferential Statistics
against H1 : N > S
We note that the sample variances are slightly different, and not similar enough
to be considered approximately equal. Thus, we apply a one-sided two sample
Welch t-test to test our claim. To do this in R, we used the t.test() function.
Choosing a significance level of = 0.01, we obtained
> t.test(graten,grates,conf.level=0.99,alternative=c("greater"))
Welch Two Sample t-test
data: graten and grates
t = 10.355, df = 825.02, p-value < 2.2e-16
alternative hypothesis: true difference in means is greater than 0
99 percent confidence interval:
9.834299
Inf
sample estimates:
mean of x mean of y
66.86355 54.17259
4
Thus, our p-value is 2.21016 and our (99%) confidence interval is [9.834299,].
Conclusion
Since our p-value is 2.21016 < = 0.01, we concluded that, in the year 2005,
the success of colleges located in the northern US was significantly greater than
that of colleges located in the southern US. Furthermore, we are 99% confident
that this success rate is at least 9.83% greater (approximately) for colleges in
the north.
References
[1] U.S. News & World Reports Guide to America0 s Best Colleges, 1995.
Also listed at Amstat at http://www.amstat.org/publications/jse/datasets/usnews.txt
[2] U.S. Department of Education. Institute of Education Sciences, National Center for Education Statistics. Digest of Education Statistics.
Degree-granting Postsecondary Institutions, by Control and Level of
Institution: Selected Years, 1949-50 through 2012-13. Retrieved from
http://nces.ed.gov/programs/digest/d13/tables/dt13 317.10.asp