Você está na página 1de 8

This semester in statistics we used skittles to represent the concepts we were learning

throughout the semester and as we covered each section of concepts there was an accompanying
portion of the term project we completed. This was done in two portions for each part. The first
portion was a group portion and the second an individual portion. After we each bought a bag of
skittles and submitted our individual totals the project began with totaling the number of skittles
our group had by color and the total number for the class. This is display in both a pie chart and
bar graph.

Term Project Part 2 - Group Portion

1. Determine the proportion of each color within the overall sample gathered by the class.

Our group discussed different ways to make an educated guess for the proportions of each
color. Our options came down using the proportions from our group data, or basing the opinion
on life experience. So, we decided to use both. Individually we guessed different colors as the
most likely to have the highest total. After comparing totals with the group and relying on life
experience of eating skittles our guess is yellow having the highest total, followed by red then
orange with purple and green having the lowest totals.

Our groups totals

Bag Red Orange Yellow Green Purple Bag Total

Totals 64 62 71 44 55 296

Now open the data set and compute the proportions of Red, Orange, Yellow, Green, and Purple
candies in the class data set. Note that the sample size is the total number of candies collected by
the class.

Red Orange Yellow Green Purple Sample Size

568 626 612 577 595 2978


2. In StatCrunch, create a pie chart and a Pareto chart for the total number of candies of each
color in our class data set. Submit copies of your graphs in this report.
3. Does the class data represent a random sample? What would the population be? Collaborate to
discuss sampling and our data in a paragraph or two.

Yes, the data represents a random sample, if you looked at your individual sample a much
different expectation could be made in regards to color percentages. Looking even from group 8
to total class outcome there tends to be a shift when more samples were collected.
All the skittles worldwide is our population, each individual bag of skittles is our
individual. We all went to the store and randomly picked one bag of candy. We then surveyed
each bag by counting the colors.

After working in a group to answer the 3 parts of this first assignment individually we
compared the totals from our own bag against the totals from the group and discussed if the class
totals differed from what we had expected.

Term Project Part 2 Individual Portion

Totals

Red Orange Yellow Green Purple Total


My Bag 16 11 11 11 10
Class 568 626 612 577 595 2978
Count

Once I saw the class totals the graphs reflected what I expected from the totals. Based on
my individual bag and then talking with the group I was surprised that purple wasnt the lowest
total. I thought that would be the lowest instead of red and red would have been higher than
green. One observations that was an outlier was the bag removed from the total that had 88. In
our group we had one member that had 4 green and 21 orange skittles were other were closer
together in the totals. Outliers such as this can impact by having first the one sample that didnt
match the other samples because it was the same size and others that had outlier in the same bag
swings the totals on certain colors.
The classes distribution varied in some variances from my own data. Red had the highest
count in my bag where it was the lowest overall. The rest of my data was evenly distributed with
purple only having one less than orange, yellow, and green. With the larger sample size of the
entire class the differences between the total grew with Orange being significantly higher in total
than red.

As we moved on from basic representation of the data we moved onto calculating and
understand important statistics. This includes the mean, median, standard deviation, min and max
values for a data set and well as the quartiles for the data. We also learned additional ways to
represent data and how a box plot can show if the data set has any outliers. Using what we had
learned as a group we applied our new knowledge to our skittles project.

Term Project Part 3 - Group Portion


Using the total number of candies in each bag in our class sample, compute the following
measures for the variable Total candies in each bag:

A) Mean number of candies per bag: 60.8


B) Standard deviation of the number of candies per bag: 2.8
C)5-number summary for the number of candies per bag:
Minimum Q1 Median Q3 Maximum
54 59 61 62.5 66
Create a frequency histogram for the variable Total candies in each bag
Create a box plot for the variable Total candies in each bag

After completing the group portion and evaluating the different statistics for the totals for
the entire class the next step was comparing that to our own bag and discussing how closely our
own totals match the classes. Part of analyzing data in statistics is determining what kind of data
is being working with. The individual portion includes a discussion on qualitative and
quantitative value types.
Term Project Part 3 Individual Portion

The number of candies in my bag is 59 and there is a total of 49 bags in the class sample.
The mean is what I expected it to be around. With 59 in my bag there was a possibly that most
bags would be near that total. Since my bag had several small pieces that werent even close to a
full candy both the mean and median seem in line with what my bag had. The graphs however
were slightly skewed left which is surprising to me because my color distribution was mostly
symmetric. I expected the class graph to be bell shaped but more symmetric rather than skewed
in one direction.
Quantitative data is data that is meaningful as in it can be compared numerical and the
values can be added and subtracted from each other. Categorical data is qualitative data and is
arranged in categories.

When using frequency values for categorical data pie charts, frequency histograms, and box plots
make sense. The show the frequency at which values appear and categorical data can be broken
down into percentages for a pie chart. Cumulative Ogives that require the values to be added
wouldnt work with categorial data.

For Quantitative data is doesnt make sense to use pie chart because the data may not be
able to be dived to where it equals 100% like when comparing land areas. Cumulative and
relative frequency work because the vales in the data can be added as well as in ogives. Time
series graphs work because they display how qualitative data can be represented over time. For
example, womens wages for a decade. The values can be compared and added or subtracted to
show amounts of increase or decrease.

Finally, was the culmination of what the course was leading up to. Taking the data
collected. Analyzing it. Then being able to describe statistical data and provide values with a
given level of confidence providing a confidence interval.

Term Project Part 4 - Group Portion

1. Construct a 99% confidence interval estimate for the population proportion of yellow
candies.
Sample size: n = 2,978
Number of yellow candies: x = 612
= 612/2,978 = 0.20551
Confidence Interval: (0.186, 0.225)
The distribution is normal, and we can assume the sample size is no more than 0.05 of the
population.
2. Discuss and interpret (with complete sentences) the confidence interval for p.
We are 99% confident that the proportion of skittles that are yellow is in between 0.186
and 0.225.
3. Construct a 95% confidence interval estimate for the population mean number of
candies per bag.
Entering our data into the ti-84 calculator [invT(0.975, 48)] gave us 2.011. We then
entered our numbers into the equation, and got (59.996, 61.604).
4. Discuss and interpret (with complete sentences) the confidence interval for the mean.
We are 95% confident that the mean number of candies per bag of skittles is between
60.0 and 62.0. We rounded up since you normally would not expect to have partial candies.

After developing confidence intervals for the number of yellow candies per bag and the
mean number of skittles per bag, individually we discussed the purpose behind providing
confidence intervals.
Term Project Part 4 - Individual Portion

The general purpose of a confidence interval is to be able to use statistical data to form an
interval range with a level of confidence that that values of a random sample of a population will
fall within that range. If the level of confidence is 95% then it is expected that 95 out of 100
samples taken would fall in the range of the confidence interval. Given this interval we can say
with a certain percentage of confidence when a sample is taken that the values with fall between
the low and high end of the interval the percentage of times we created the level of confidence
for. Using sample data for proportions or the population mean a confidence interval can be
created to estimate what range the values from a sample will fall within.

To summarize through the semester, I was introduced to the world of statistics through
this course. I learned the value of statistics and how statistical data can be collected. How to
verify conditions to determine if confidence intervals can be created from the data set which
takes into account that each sample is independent, taken by a random sample, and less than 5%
of the population size. I know have a better understand of statistics and the process in which they
are obtained from data.
Course Reflection

In taking this course, I have gained a better understanding of statistics. Usually when I see
them listed in commercials and new reports I dont pay much attention to them. Before the course I
wasnt aware that when inference is made in statistics that it is given with a confidence level. I dont
remember seeing one given with stats on T.V.

I have often wondered how reliable the stats they are giving are and what the sampling
method they used and who has included in the sample. I have learned there are several ways samples
can be taken such as a simple random sample by selecting individuals randomly from a population
or doing cluster sample where the population is divided up into clusters and then random clusters
are sample.

Through this course learning to analyze statistical situations it has increased my skills in
problems solving. Given the information I had to determine which equations needed to be used in
which situations and if the requirements were met to use those equations. An example is for the
proportion of a sample where n 0,05 N and np(1-P) 10 to ensure the sample size is large
enough to be treated as approximately normal. A large part of computer science is using coding to
solve problems and come up with solutions to provide usable efficient programs for end users. The
problem-solving skills I have learned in this course can be applied to computer science.

Through gathering the data for this project with the class I have learned how a simple
random sample can be taken in a real-world application. In the most recent part we were able apply
what weve learned in this course to build confidence levels for probability of yellow candies per bag
and for the mean number of candies per bag.

Você também pode gostar