Escolar Documentos
Profissional Documentos
Cultura Documentos
Zeke Sorensen
Math 1040
Zeke Sorensen
Skittles Project
In this study we have 22 people that randomly selecedt bags of Skittles candies from various
stores and counted the amount of each color of candies in the bags. Then we recorded our findings for
use of data analysis in order to draw a conclusion about the amount of each color in the bags. After we
counted the colors of skittles in our own bags and recorded the data in the same place we had a sample
that we can work with.
I have taken the total of each color from everyones bag and constructed charts and graphs
representing the data. Though the colors seem to be roughly equal in count you can see slight variations
in their quantities. This is pretty much what I expect to see given the use of automation that companies
that produce candies on this scale use. Their methods for packaging and producing have been refined
down to a science. The data shows that there are more red candies distributed throughout. Orange
could be the lowest color of candy distributed but it appears to vary from bag to bag as it is not the
lowest count for my bag. This was not a simple random sample because every bag in a population did
not have the same chance of being selected. So the data could be more accurate if a true simple random
sample had been selected.
For the data that we are using the categorical data would be the colors of the candies. The
quantitative data would be the numbers of each candy. The Pie Charts and the Histograms make sense
with the categorical data as well as with the quantitative data. The Box Plot do not make as much sense
with the categorical data but they do with the quantitative data. Calculations that require the data to be
broken out by color would utilize both the categorical and the quantitative while equations that that are
looking at the values as whole would not require the categorical. If you were rating the colors on which
ones you liked best and assigning them a grade then you would not need the quantitative data. The
categorical data has no real count or measure. While the quantitative data is related to counts and
measures.
1
Zeke Sorensen
A confidence interval is a range of values used to estimate the true value of a population
parameter. It uses multiple values instead of just one. And it gives a better sense of how good our
estimate of the population parameter is.
Confidence Interval for the true mean number of candies per bag:
Margin of error, E = 1.371284
95% Confident the population mean is within the range:
-1.321284 < mean <1.421284
Confidence interval for the standard deviation of the number of candies per bag:
98% Confidence Interval for the St. Dev.:
16.47896 < SD < 25.30334
A hypothesis test is a procedure for testing a claim about a property of a population. A claim is
made about a population and you conduct a hypothesis test to either reject the claim or fail to reject the
claim.
Alternative Hypothesis:
p not equal p(hyp)
1.9600
P-Value:
0.7948
Zeke Sorensen
95% Confidence interval:
0.174473 < p < 0.2195169
Zeke Sorensen
Here is the equation for testing a claim about a proportion using the P-Value Method-
We can use this method when technology such as computers or calculators are not available. To
use this method we must first express the claim symbolicallyH0 : p = .20
H1 : p .20
Then we take our significance level = 0.05
And find the values we need p = 236/1198 = .197
p = .20
q = 1 - .20 = .80
n = 1198
We must check our requirements for the Hypothesis test1. Needs to be a simple random sample
2. Conditions for a binomial distribution must be satisfied.
3. And the conditions of np 5 and nq 5.
When we plug these numbers into the equation we get about -.2596
Usisng a Z Score table we see that this has a Z score of .3974
1-.3974 = .6026
Because this is a two tailed test we will get a P value twice that of 1.2052
Because this is greater than the 0.05 we fail to reject.
Zeke Sorensen
For testing a claim for a mean we need to get our notation.
Then we need to check the requirements 1. The sample must be a simple random sample.
2. The population needs to be normally distributed and/or n > 30
Here is the equation for testing a claim for a mean
Zeke's Bag
Red Candies
Orange Candies
Green Candies
Purple Candies
Yellow Candies
Zeke Sorensen
Zeke's Bag
16
120
14
100
12
80
10
8
60
40
4
20
2
0
0
Red Candies
Cumulative %
Cummulative
Count
Cumulative %
10
10
16.39344
11
21
34.42623
12
33
54.09836
13
15
46
61
75.40984
100
Zeke's Frequencies
16
14
12
10
8
6
4
2
0
Zeke Sorensen
Zekes bag data
zeke's bag
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count
20.33333333
8.163604868
12.5
#N/A
19.99666639
399.8666667
5.8439745
2.409171152
51
10
61
122
6
Zeke Sorensen
Box Plot for Zekes bag -
Zeke Sorensen
Class Totals
Red Candies
Orange Candies
Yellow Candies
Green Candies
Purple Candies
Class totals
Colors
Orange Candies
total
217
Cumulativ
e Count
217
Cumulative %
18.11352254
Purple Candies
Yellow Candies
Green Candies
Red Candies
232
236
237
276
449
685
922
1198
37.47913189
57.17863105
76.96160267
100
Zeke Sorensen
Frequency of number of candies per bag -
Class Frequency
300
250
200
150
100
50
0
Red Candies
Green Candies
Purple Candies
10
Zeke Sorensen
Class data
Red Candies
Mean
Median
Mode
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count
Orange Candies
13.8
13.5
12
2.375311614
5.642105263
9
10
19
276
20
Mean
Median
Mode
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count
Yellow Candies
10.85
10
9
2.906888371
8.45
12
7
19
217
20
Mean
Median
Mode
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count
Green Candies
11.8
11.5
13
2.930780388
8.589473684
11
6
17
236
20
Purple Candies
Mean
11.85 Mean
11.6
Median
12 Median
12
Mode
14 Mode
12
Standard Deviation 3.856300381 Standard Deviation 3.704904289
Sample Variance 14.87105263 Sample Variance 13.72631579
Range
16 Range
14
Minimum
5 Minimum
6
Maximum
21 Maximum
20
Sum
237 Sum
232
Count
20 Count
20
Class totals
Red Candies
14
17
15
13
12
17
12
13
11
12
15
17
14
11
19
15
12
13
10
14
276
11
Zeke Sorensen
Class Box Plot
12
Zeke Sorensen
Reflection
From this project I have learned new techniques for analyzing statistical data and the use of
different tools. Some of the tools that I used are Excel, Stat Disk and other web based tools. Some of the
concepts that we used that were new to me are: analyzing, compiling and interpreting Histograms,
Boxplots and Pareto charts. Also the use of Hypothesis testing and constructing confidence intervals
were also new concepts to me as well. I gained new insight on how to utilize tools such as Excel for
organizing and calculating the data.
When constructing a confidence interval and performing hypothesis tests I used critical thinking
along with raw mathematics. Once the process was done I also had to use critical thinking to interpret
the results. There was a lot of crunching numbers and working with them. I think that this project
resembles story problems that I have encountered in other classes, only this project took a real life
scenario and we had to utilize the data from that scenario to formulate the problems. I think that this
class has built my critical thinking skills for story problems and determining how to use the data in an
equation or to get results that I need. I believe that it has given me a greater understanding of
interpreting the results from my work and what the concepts are used for.
One example of how this project has helped my problem solving skills was in the construction of
the Histogram and the paleo chart. For the Histogram it was not real apparent how I could apply the
data to construct the histogram. I had to really analyze the data to see how it could be applied in a
histogram. So I took the amount of skittles in each bag and showed the frequency of the numbers of
candies in each bag. It also helped me understand the Hypothesis testing and what the results are
showing and how we can use the Hypothesis test to determine the validity of a claim.
As a whole this project has helped to show me how statistics are used or can be used for almost
everything in life. Being able to take data gathered from real world events and utilizing the concepts
13
Zeke Sorensen
learned in this project to better understand what is going on or possible outcomes. I feel like this project
has developed my ability to do this. Also have found new ways and tools that can help me deal with
greater amounts of data more accurately and efficiently.
14