Você está na página 1de 21

14

Final Project Math 1040


Chelsey Busk
Salt Lake Community College

[FINAL PROJECT MATH 1040] August 3, 2014


Term Project
As a group we decided to do a data analysis project on body measurements.
We will go over data that has been collected on 508 adults, consisting of 259
females and 247 males. During the project we will create charts based upon one
categorical variable, gender, and one quantitative variable, shoulder girth. For each
variable we will be using two different sampling methods, a simple random sample
as well as a systematic sample. We will also be able to find the mean, standard
deviation, and the five numbers summary using the quantitative variable. This
information we will use to create frequency histograms and box plot data. Finally
we will use the level of confidence and significance to complete a hypothesis test
for the population proportion.


[FINAL PROJECT MATH 1040] August 3, 2014

PART 1: Analysis of categorical variable
Step 1: Graphing
The categorical variable for this set of data is gender, and for this section we are
going to construct a pie chart as well as a Pareto chart for the entire population. As
a result we found that the population consists of 51.19% female, and 48.81% male.
In these charts, as well as throughout the project the women are numbered as 0 and
the males as 1.

Pie Chart- Population Gender
The pie chart shows us that there is slightly more females than males in the
population of people that participated in the study.

[FINAL PROJECT MATH 1040] August 3, 2014

Pareto Chart- Population Gender
Here, we see the same data as shown in the above pie chart, but we use a Pareto
chart, which is best to be able to compare variables. It is much easier to see the
exact number of people, of each gender, that participated in the study.

[FINAL PROJECT MATH 1040] August 3, 2014

Step 2: Graphing with Sampling Methods
Simple Random Sample
The first method of sampling I used on the gender population was a simple random
sample. To acquire this I simply put the data into a program called Stat Crunch,
which then proceeded to randomly select forty of the total five-hundred and eight
people. Once the forty were selected I created a pie chart as well as a Pareto chart
to show how many people from each gender were chosen.

Pie Chart
As can be seen with the pie chart, there
was slightly more males chosen in this
sample than females. Which is opposite
of the entire population, which had
slightly more females.




Pareto Chart
This chart shows the same sample, but
is a little easier to read. It clearly
shows the number of people in each
gender.


[FINAL PROJECT MATH 1040] August 3, 2014


Systematic Sampling Method
With this sampling method I choose a number to start with, which happened to be
5, and then I counted every tenth participant after that and recorded their gender.
While this sample method is not completely random, it allows for me to choose
people without being bias, because I have to take a certain number. Again I will
show a Pie chart along with a Pareto chart.

Pie Chart
The pie chart shows that this sampling
method also resulted in have more
males than females; which is odd in
comparison to the entire population
that has more females than males.
Again, females are 0 and males are 1.



Pareto Chart
Showing the same sample as the pie chart
above, this chart gives us a numerical
value for each gender within the sample.


[FINAL PROJECT MATH 1040] August 3, 2014

Part 2: Analysis of quantitative variable
Step 1: Graphing population
For the quantitative variable I chose to analyze the shoulder girth. I did a number
of calculations to give me the population mean, standard deviation, as well as a
five-number summary. A five number summary includes the minimum value,
median value, maximum value, as well as the 25
th
quartile and the 75
th
quartile.
Below is a box summary of all of these values. A frequency histogram as well as a
box plot will be constructed to show the results.
Summary statistics:
Column Mean Std. dev. Median Min Max Q1 Q3
Shoulder Girth 108.19507 10.374834 108.2 85.9 134.8 99.4 116.6

Frequency Histogram
The histogram splits
the population into
groups determined
by the measurements
of shoulder girth, and
then plots them on
the graph by
frequency. This
allows us to see how
many people fit into
each class. Though it
is not normally
distributed, it is not
necessarily skewed
to one side or the
other either.
[FINAL PROJECT MATH 1040] August 3, 2014

Box Plot
The box plot is useful because it allows us to see the minimum and maximum
values, as well as boxing in the area between the 25
th
quartile and 75
th
quartile.
This lets us see the five-number summary easily in one chart.



[FINAL PROJECT MATH 1040] August 3, 2014

Step 2: Graphing with Sampling Methods
Having done the graphs for the entire population, I will now construct similar
graphs using samples of the population. This is helpful because I can then compare
the samples to the population, looking for similarities and differences.
Simple Random Sample Method
Again, I started with the simple random sample, allowing the computer to
randomly choose forty participants for me. First I will show a summary of the
statistics, then the frequency histogram and the box plot.
Summary statistics:
Column Mean Std. dev. Median Min Max Q1 Q3
Sample(Shoulder Girth) 107.2075 8.992458 107.35 90 123.1 99.8 115.35

Frequency Histogram
This histogram uses the data from the shoulder
girth, using a sample for forty. In comparison to the
population, this graph is not nearly as uniform, but
then the measurements are not spread as far.



Box Plot
Again, the box plot just shows us the five-number
summary, allowing us to see the range of our
sample. This data is recorded from a simple random
sampling of the entire population, focusing on
shoulder girth.
[FINAL PROJECT MATH 1040] August 3, 2014


Systematic Sampling Method
As with before I start with a determined number, this time I chose six. So I began
with the sixth person and then took the measurements from every eleventh
participant after that. Once calculations were done on the sample, I created a
summary of the statistics, a frequency graph, and a box plot; which will all be
shown below.
Summary statistics:
Column Mean Std. dev. Median Min Max Q1 Q3
sample(Should Girth) systematic 107.2675 8.6913275 107.85 90.5 123.4 99 113.85

Frequency Histogram
This graph is a little different from the
others; it slightly more normally distributed
than the last sample, and has the bulk of the
people in the middle.





Box Plot
The box plot shows the data from the five-
number summary, acquired from the
systematic sampling of shoulder girth.

[FINAL PROJECT MATH 1040] August 3, 2014

Comparison of Categorical Data and Quantitative
Data
Gender Variable (Categorical Data)
With the categorical data I went through and created both a pie chart and a Pareto
chart for the entire population, a simple random sample, and a systematic sample.
After finishing these charts I was able to compare all the data, with doing so I
found:
1. The original population had slightly higher levels of females than males; the
Pareto Chart shows us the exact number. While both methods of sampling
result in more males than females.
2. The results from the Simple Random Sample show more males than
females, which differs from the entire population, but the number of males is
not greatly larger than the number of females. There are many ways to fulfill
the Simple Random Sample, there are bound to be times when there are
more males than females.
3. The results from the Systematic Sample show that the males outweigh the
females by a lot. Also different from the entire population, but this result
was expected. Since I had to take every tenth person, and the data set is set
up showing all the male results before the females, it is not surprising that I
got more males than females.
Shoulder Girth Variable (Quantitative Data)
With the quantitative data I took the entire population, looking at the results of
their shoulder girth measurements, and created a summary of statistics, a
Frequency Histogram, and a Box Plot for each; entire population, Simple Random
Sample, and Systematic Sampling. Upon comparison of these results I found:
1. The Frequency histogram and Box Plot for the entire population show a
fairly large range of measurements when determining the shoulder girth. The
Histogram is not normally distributed, while from left to right the numbers
to increase, there is a slight dip in the middle before the numbers continually
decrease.
[FINAL PROJECT MATH 1040] August 3, 2014

Comparison of Categorical Data and Quantitative
Data Continued

2. The Simple Random Samples histogram looks nothing like the one for the
entire population. There is no normal distribution, just random peaks.
Though it looks nothing alike, we can compare the Box Plots and see that
the five-number summary only have little variance.
3. The Systematic Sample histogram seems to be closer to a normally
distributed graph, except the highest point is not in center. As with before
the Box Plot shows us not much variance from the original population,
though there seems to be more people lying between the 25
th
quartile and the
median.
[FINAL PROJECT MATH 1040] August 3, 2014

Part 3: Confidence Level
Section 1: Population
Categorical Variable: Gender
All values of the categorical Variable: 1: Male and 0: Female
Choose one of the above values to use in Part 4 and Part 5 of the project: Female

P= 259
Sample 1 Sample 2
n= 40 n=40
x= 19 x= 15
= .475 = .375

Section 2: Population
Quantitative Variable: Shoulder Girth
= 108.195
= 10.375

Sample 1 Sample 2
n = 40 n = 40
x-bar = 107.208 x-bar = 107.268
s = 8.992 s = 8.691

[FINAL PROJECT MATH 1040] August 3, 2014


Step 1: Confidence Intervals for Categorical Variable
Using the worksheet about, I will find a confidence interval for each of the gender
samples. First I must find the margin of error, to do so I will use a confidence level
of 95%, which is most common. I will then insert my values into the following
formulas.
Margin of Error: E= 1.96/n
Confidence Interval: P = - E < p < + E
Sample 1
E= 1.96 (.475)(.525)/40
E = .15476
P = .475 - .15476 < p < .475 + .15476
P = .32024 < p < .62976

Sample 2
E = 1.96 (.375)(.625)/40
E = .15003
P = .375 - .15003 < p < .375 + .15003
P = .22497 < p < .52503




[FINAL PROJECT MATH 1040] August 3, 2014

Step 2: Confidence Intervals for Quantitative
Variable
As with the categorical variable, I will find confidence interval for the mean of the
quantitative variable, using the same confidence level of 95%. I will then insert my
values from the worksheet into the following formulas.
Margin of Error: E = 1.685(s/n)
Confidence Interval:

- E < <

+ E
Sample 1
E= 1.685(8.992/ 40)
E = 2.39567
Confidence Interval:
107.208 - 2.39567 < < 107.208 + 2.39567
104.812 < < 109.604
Sample 2
E= 1.685(8.691/ 40)
E= 2.31547
Confidence Interval:
107.268 2.31547 < < 107.268 + 2.31547
104.953 < 109.583





[FINAL PROJECT MATH 1040] August 3, 2014

Confidence Intervals for Quantitative Variable
Continued
As with the categorical variable, I will find confidence interval for the standard
deviation of the quantitative variable, using the same confidence level of 95%. I
will then insert my values from the worksheet into the following formulas.

Confidence Interval: (n-1)

< < (n-1)


Sample 1
Confidence Interval:
(40-1)

< < (40-1)


55.758 26.509
7.520 < < 10.907
Sample 2
Confidence Interval
(40-1)

< < (40-1)


55.758 26.509
7.269 < < 10.542

Confidence Intervals are a range of values used to estimate the true value of a
population parameter. I used a confidence level of 95% on both the categorical
data and the quantitative data, with this I can say that I am 95% confident that the
number we are looking for is between these two values in each of the confidence
interval equations. So for the mean, I can say that I am 95% sure that they mean
falls between the numbers of 104.812 and 109.604, for the first sample of
quantitative data.
[FINAL PROJECT MATH 1040] August 3, 2014

PART 4: Significance Level
Step 1: Hypothesis Testing
In this section I will conduct a series of hypothesis tests for both the categorical
data and the quantitative data. Hypothesis testing works by, stating a claim, then
gathering enough evidence to support that claim, therefore is rejecting the null
hypothesis. In order to do this, one must, find the test statistic, obtain the P-value,
and state a conclusion on whether to support the claim or not. For the categorical
data I will be using the population proportion, and for the quantitative data I will
be using the population mean.

: Null hypothesis; states that the value of the population


parameter is equal to the claimed value.

Alternative Hypothesis; states that the value of the


parameter somehow differs from the null hypothesis.

Test statistic for p: z=



Test statistic for : t =



P-Value: If P-value , reject


If P-value , fail to reject



[FINAL PROJECT MATH 1040] August 3, 2014


Hypothesis testing of categorical data
Sample 1

: p = .510

p .510
Test statistic: z =

()()

z = -.44
P-value: .66
= .05
.66 .05; Fail to reject null hypothesis.
Conclusion: There is insufficient evidence to support the claim that the population
proportion is not equal to .510.

Sample 2

: p = .510

p .510
Test statistic: z =

()()

z = -1.76
P-Value: .0784
= .05
.0784 .05; Fail to reject the null hypothesis
Conclusion: There is insufficient evidence to support the claim that the population
proportion is not equal to .510.
[FINAL PROJECT MATH 1040] August 3, 2014


Hypothesis testing of quantitative data
Sample 1

: = 108.195

108.195
Test statistic: t =


t = -.6942
P-Value = .4917
= .05
.4917 .05; Fail to reject null hypothesis.
Conclusion: There is insufficient evidence to support the claim that the mean is
not equal to 108.195.
Sample 2

: = 108.195

108.195
Test Statistic: t =


t = -.6746
P-Value = .5039
= .05
.5039 .05; Fail to reject null hypothesis.
Conclusion: There is insufficient evidence to support the claim that the mean is
not equal to 108.195.

[FINAL PROJECT MATH 1040] August 3, 2014

Comparing Hypothesis tests
The hypothesis testing allows us to see if the numbers we come up with for our
samples are a good representation of our population. With all four of the samples
used I was able to reject the claims that the population proportion is not equal to
the actual proportion (.510), and the claims that the population mean is not equal to
the actual mean (108.195).
With every set of hypothesis testing there is a possibility of making what is called a
Type I Error. This means that we reject the null hypothesis, when the null
hypothesis is indeed true. By looking back at the equations, we can see that I did
not make this mistake. I failed to reject the null hypothesis, and the null hypothesis
is true.







[FINAL PROJECT MATH 1040] August 3, 2014

Reflection
Though I have always enjoyed math, I have never taken a math class that I
have been able to apply to my everyday life. I feel that, that is different with
statistics. While going through the course and learning new material, most of the
examples we did could easily be related to daily activities. This project for
example, I got to perform tests and different studies on a data set that included
body measurements. At the end of this project I am now able to tell you the
population proportion, mean, and standard deviation pertaining to the shoulder
girth of females!
One of the greatest things I will take away from this is the fact that I can
now determine if a certain value is considered usual or unusual. I am going into the
nursing field where I will be dealing with numbers and different test results all day,
every day. It will be nice to have a way of determining if test results or unusual or
not, help determine if a patient is in an emergency state or not.
Hopefully I can take all that I have learned throughout this semester and
apply it throughout my life; now in my college studies, and later during my career!

Você também pode gostar