Você está na página 1de 5

Megan Garza

MATH 1040
Tues/thurs@ 10
9/23/14

Skittles Term Project


My statistics class is performing a statistical analysis of how many Skittles of each
color typically come in the average bag of skittles. Each student of the class was
instructed to purchase one 2.17 ounce bag of Original Skittles, count how many
skittles of each color were found in the bag, and submit that data to the instructor
to be organize in an Excel spreadsheet.

Colors Of Skittles Collected By A Statistics Class

orange
yellow
red
green
purple

Colors of Skittles Collected By a Statistics Class


600
500
400
300
200
100
0
orange

yellow

red

green

purple

You can conclude that based on the collected data that out of 2435 skittles; 20.534% were
red, 18.316% were orange, 19.466% were yellow, 20.656% were green, and 21.027%
were purple. It appears that there are a fairly similar number of each color of skittle
distributed into the bags, although if you base your values upon the data collected from
individual bags of skittles, you may not believe this to be so. My bag of skittles, for
instance, contained an unusually high number of yellow candies.

My 2.17oz Bag of Skittles

red
orange
yellow
green
purple

My 2.17oz Bag of Skittles


20
18
16
14
12
10
8
6
4
2
0
red

orange

green

purple

yellow

Summary Statistics of Total Number of Skittles Per Bag:


Mean: 64.1
Standard Deviation: 13.2
Min Value: 45
Q1: 59
Q2: 61
Q3: 62
Max Value: 114
Total # Skittles In My Bag: 67
Total # of Bags In Sample: 38
Most of the data collected falls within a small range with low variability and a few outliers.
The frequency histogram appears to be skewed to the right. My own bag of skittles
contributed to the outliers in the sample, because there were 18 yellow skittles, where as
there were only 11 to 14 of the other colors in the bag of skittles that I purchased. It
appears that on average, there are a similar number of each color and a similar quantity of
skittles in each bag with a few outliers, which is about what I expected.

Reflection
Quantitative data consist of numbers representing counts or measurements. Categorical or
Qualitative data consist of names or labels that are not numbers representing counts or
measurements. Quantitative data is expressed well in a scatterplot, time-series graph,
frequency polygon, and stem and leaf plot. On the contrary, Qualitative data is expressed best
in bar graphs, pie charts, and pareto charts. Quantitative data is countable and its differences
are meaningful. For instance, 200$, 700$, 35$ can be ordered and their differences are
meaningful. Qualitative data is not countable and its differences are not meaningful. For
example the responses yes, no, and maybe cannot be counted or ordered. Subtracting one zip
code from another or one ranking of 1-5 stars from another does not make sense, thus it is not
meaningful.

Confidence Interval Estimates


Confidence Interval: a range of values used to estimate the true value of a population.
Construct a 99% confidence interval estimate for the true proportion of yellow candies:
17.43% < p < 21.57%
We can be 99% confident that the population mean is between 17.43% and 21.57%
Construct a 95% confidence interval estimate for the true mean # of candies per bag:
59.798 << 68.36
We can be 95% confident that the mean is between 59.798 and 68.36.
Construct 98% confidence interval estimate for the standard deviation of the # of
candies per bag:
11.106 < < 20.488
We can be 98% confident that the standard deviation falls between 11.106 and 20.488.

Hypothesis Tests
Hypothesis Test: a procedure testing a claim about a property of a population
Use a .05 significance level to test the claim that 20% of all Skittles candies are red:
H0: p= .2 (original claim)
H1: p.2
p= .5101 > .05 significance level
There is not sufficient evidence to reject the null, because p value is > significance level.
Use a .01 significance level to test the claim that the man number of candies is a bag of
Skittles is 55.
H0: u= 55 (original claim)
H1: u55
p= .000143 < .01 significance level
There is sufficient evidence to reject the null because the p value is < the significance level.

Reflection
A confidence interval gives us a range of values that the population parameter we are testing
should fall within. The percentage describes the degree of confidence we have in the accuracy
of the range we have calculated. The 99% confidence interval estimate for the true
proportion of yellow candies was 17.43%-21.57%. The actual data reflected a 19.47%
proportion of yellow candies, which does fall within our confidence interval. The 95%
confidence interval estimate for the true mean number of candies per bag was 59.798-68.36.
The actual data reflected 64.079 as the mean number of candies per bag, which does fall
within our confidence interval. The 98% confidence interval estimate for the standard
deviation of the number of candies per bag was 11.106-20.488. The actual data reflected a
standard deviation of 13.025, which does fall within the confidence interval. So it appears we
can safely conclude that our data did fall within a reasonable range, which validates to some
degree, the accuracy of our data.
Our hypothesis testing also lines up with the data we collected. We conducted a hypothesis
test on a claim that 20% of all the Skittles were red, which revealed that there was not enough
evidence to reject that claim. Our data revealed that 20.53% of our candies were red. The
second hypothesis test we conducted on the claim that the mean number of Skittles per bag
was 55 revealed that there was sufficient evidence to reject that claim. Our data collected
revealed a mean of 64.079 candies per bag. Because our confidence intervals and hypothesis
testing correlates with the data we collected, it validates the accuracy of our data.
Our sampling method was certainly imperfect. It left a lot of trust in the hands of students
who may have forgotten to do the project until the last minute and subsequently made up
numbers, or may have forgotten what size of bag to purchase, or people may have miscounted,
which may account for the outliers. It may have been better to have some oversight on the
counting process to ensure accuracy of data collection. Fortunately, our sample size was large
enough that it helped make our experiment more accurate by averaging out some of those
inconsistencies.

Você também pode gostar