Você está na página 1de 15

SKITTLES TERM PROJECT

A Look at Skittles Data

Zeke Sorensen
Math 1040

Zeke Sorensen
Skittles Project
In this study we have 22 people that randomly selecedt bags of Skittles candies from various
stores and counted the amount of each color of candies in the bags. Then we recorded our findings for
use of data analysis in order to draw a conclusion about the amount of each color in the bags. After we
counted the colors of skittles in our own bags and recorded the data in the same place we had a sample
that we can work with.
I have taken the total of each color from everyones bag and constructed charts and graphs
representing the data. Though the colors seem to be roughly equal in count you can see slight variations
in their quantities. This is pretty much what I expect to see given the use of automation that companies
that produce candies on this scale use. Their methods for packaging and producing have been refined
down to a science. The data shows that there are more red candies distributed throughout. Orange
could be the lowest color of candy distributed but it appears to vary from bag to bag as it is not the
lowest count for my bag. This was not a simple random sample because every bag in a population did
not have the same chance of being selected. So the data could be more accurate if a true simple random
sample had been selected.
For the data that we are using the categorical data would be the colors of the candies. The
quantitative data would be the numbers of each candy. The Pie Charts and the Histograms make sense
with the categorical data as well as with the quantitative data. The Box Plot do not make as much sense
with the categorical data but they do with the quantitative data. Calculations that require the data to be
broken out by color would utilize both the categorical and the quantitative while equations that that are
looking at the values as whole would not require the categorical. If you were rating the colors on which
ones you liked best and assigning them a grade then you would not need the quantitative data. The
categorical data has no real count or measure. While the quantitative data is related to counts and
measures.
1

Zeke Sorensen
A confidence interval is a range of values used to estimate the true value of a population
parameter. It uses multiple values instead of just one. And it gives a better sense of how good our
estimate of the population parameter is.

Confidence Interval for the true proportion of yellow candies:


Margin of error, E = 0.0284013
99% Confidence Interval (using normal approx):
0.1485603 < p < 0.2053629

Confidence Interval for the true mean number of candies per bag:
Margin of error, E = 1.371284
95% Confident the population mean is within the range:
-1.321284 < mean <1.421284

Confidence interval for the standard deviation of the number of candies per bag:
98% Confidence Interval for the St. Dev.:
16.47896 < SD < 25.30334
A hypothesis test is a procedure for testing a claim about a property of a population. A claim is
made about a population and you conduct a hypothesis test to either reject the claim or fail to reject the
claim.
Alternative Hypothesis:
p not equal p(hyp)

Sample proportion: 0.196995


Test Statistic, z: -0.2600
Critical z:

1.9600

P-Value:

0.7948

Zeke Sorensen
95% Confidence interval:
0.174473 < p < 0.2195169

Margin of error, E = 0.0610944

99% Confident the population mean is within the range:


60.23891 < mean <60.36109
The data in the above hypothesis test shows that our P-value is greater than our z score we fail
to reject. The mean hypothesis test show that the mean should be between 60.2 and 60.3, so we can
say that the mean number of candies in a bag is not likely to be 50.

Zeke Sorensen
Here is the equation for testing a claim about a proportion using the P-Value Method-

We can use this method when technology such as computers or calculators are not available. To
use this method we must first express the claim symbolicallyH0 : p = .20
H1 : p .20
Then we take our significance level = 0.05
And find the values we need p = 236/1198 = .197
p = .20
q = 1 - .20 = .80
n = 1198
We must check our requirements for the Hypothesis test1. Needs to be a simple random sample
2. Conditions for a binomial distribution must be satisfied.
3. And the conditions of np 5 and nq 5.
When we plug these numbers into the equation we get about -.2596
Usisng a Z Score table we see that this has a Z score of .3974
1-.3974 = .6026
Because this is a two tailed test we will get a P value twice that of 1.2052
Because this is greater than the 0.05 we fail to reject.

Zeke Sorensen
For testing a claim for a mean we need to get our notation.
Then we need to check the requirements 1. The sample must be a simple random sample.
2. The population needs to be normally distributed and/or n > 30
Here is the equation for testing a claim for a mean

Zeke's Bag

Red Candies

Orange Candies

Green Candies

Purple Candies

Yellow Candies

Zeke Sorensen

Zeke's Bag
16

120

14

100

12
80

10
8

60

40

4
20

2
0

0
Red Candies

Orange Candies Purple Candies Green Candies Yellow Candies


Count

Cumulative %

Zekes bag totals


Count
Yellow
Candies
Green
Candies
Purple
Candies
Orange
Candies
Red Candies

Cummulative
Count

Cumulative %

10

10

16.39344

11

21

34.42623

12

33

54.09836

13
15

46
61

75.40984
100

Zeke's Frequencies
16
14
12
10
8
6
4
2
0

Zeke Sorensen
Zekes bag data
zeke's bag
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

20.33333333
8.163604868
12.5
#N/A
19.99666639
399.8666667
5.8439745
2.409171152
51
10
61
122
6

Zekes Bag totals


Red Candies
15

Orange Candies Yellow Candies


Green Candies Purple Candies
Total Candies
13
10
11
12
61

Zeke Sorensen
Box Plot for Zekes bag -

Zeke Sorensen

Class Totals

Red Candies

Orange Candies

Yellow Candies

Green Candies

Purple Candies

Class totals
Colors
Orange Candies

total
217

Cumulativ
e Count
217

Cumulative %
18.11352254

Purple Candies
Yellow Candies
Green Candies
Red Candies

232
236
237
276

449
685
922
1198

37.47913189
57.17863105
76.96160267
100

Zeke Sorensen
Frequency of number of candies per bag -

Class Frequency
300
250
200
150
100
50
0
Red Candies

Orange Candies Yellow Candies

Green Candies

Purple Candies

10

Zeke Sorensen
Class data
Red Candies
Mean
Median
Mode
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count

Orange Candies
13.8
13.5
12
2.375311614
5.642105263
9
10
19
276
20

Mean
Median
Mode
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count

Yellow Candies
10.85
10
9
2.906888371
8.45
12
7
19
217
20

Mean
Median
Mode
Standard Deviation
Sample Variance
Range
Minimum
Maximum
Sum
Count

Green Candies
11.8
11.5
13
2.930780388
8.589473684
11
6
17
236
20

Purple Candies

Mean
11.85 Mean
11.6
Median
12 Median
12
Mode
14 Mode
12
Standard Deviation 3.856300381 Standard Deviation 3.704904289
Sample Variance 14.87105263 Sample Variance 13.72631579
Range
16 Range
14
Minimum
5 Minimum
6
Maximum
21 Maximum
20
Sum
237 Sum
232
Count
20 Count
20

Class totals
Red Candies
14
17
15
13
12
17
12
13
11
12
15
17
14
11
19
15
12
13
10
14
276

Orange Candies Yellow Candies


Green Candies Purple Candies
Total Candies
12
15
5
17
63
9
9
13
12
60
17
8
6
15
61
7
15
15
13
63
11
16
10
11
60
9
13
11
10
60
9
15
15
13
64
11
17
13
6
60
10
10
8
20
59
10
12
12
6
60
10
6
14
16
61
11
13
12
6
59
12
8
14
12
60
8
11
17
13
60
9
13
7
11
59
13
10
11
12
61
12
11
14
12
61
19
10
10
10
62
8
13
21
10
62
10
11
9
7
51
217
236
237
232
1198

11

Zeke Sorensen
Class Box Plot

12

Zeke Sorensen
Reflection
From this project I have learned new techniques for analyzing statistical data and the use of
different tools. Some of the tools that I used are Excel, Stat Disk and other web based tools. Some of the
concepts that we used that were new to me are: analyzing, compiling and interpreting Histograms,
Boxplots and Pareto charts. Also the use of Hypothesis testing and constructing confidence intervals
were also new concepts to me as well. I gained new insight on how to utilize tools such as Excel for
organizing and calculating the data.
When constructing a confidence interval and performing hypothesis tests I used critical thinking
along with raw mathematics. Once the process was done I also had to use critical thinking to interpret
the results. There was a lot of crunching numbers and working with them. I think that this project
resembles story problems that I have encountered in other classes, only this project took a real life
scenario and we had to utilize the data from that scenario to formulate the problems. I think that this
class has built my critical thinking skills for story problems and determining how to use the data in an
equation or to get results that I need. I believe that it has given me a greater understanding of
interpreting the results from my work and what the concepts are used for.
One example of how this project has helped my problem solving skills was in the construction of
the Histogram and the paleo chart. For the Histogram it was not real apparent how I could apply the
data to construct the histogram. I had to really analyze the data to see how it could be applied in a
histogram. So I took the amount of skittles in each bag and showed the frequency of the numbers of
candies in each bag. It also helped me understand the Hypothesis testing and what the results are
showing and how we can use the Hypothesis test to determine the validity of a claim.
As a whole this project has helped to show me how statistics are used or can be used for almost
everything in life. Being able to take data gathered from real world events and utilizing the concepts

13

Zeke Sorensen
learned in this project to better understand what is going on or possible outcomes. I feel like this project
has developed my ability to do this. Also have found new ways and tools that can help me deal with
greater amounts of data more accurately and efficiently.

14

Você também pode gostar