Você está na página 1de 12

Gokingco1

StephensonGokingco
Mr.Crow
Period0BDualCreditUTStatistics
19September2015

TheDataExplorationMiniProject

Thefollowingvaluesinthisstatisticalreportrepresenttheamountoftimesonehasplayeda
boardgameinthepastmonth.Thepopulationthatthisthesevaluesaresourcedfromis
randomlyselectedLASAstudents.Unitsaresimplytheamountoftimesonehasplayedaboard
gameinthepastmonthandthefrequenciesofthoserespectiveoccurrences.Thisisnot
necessarilyaspecificunitofmeasurehowever,itisquantitativeinnature.Datawascollected
overaspanoftwodaysviasimplequestioning/interviewingwiththesolequestionof:How
manytimeshaveyouplayedaboardgameinthepastmonth?Imadesuretoaskstudentsof
varyingclassesandvaryingyearsinordertopreventintroductionofbiasintothedata.No
outsidesourceswereusedintheformulationofthisdatathedataisofmyowncollection.I
collectedthistypeofdatabecauseduringthebeginningoftheyear,mysisterandIhadserious
thoughtsaboutcreatingaSettlersofCatanclubatschool.Wehadtheresourcestomaketheclub
work,suchasoneexpansionset,aregularset,andevenaseafarersset.Unfortunately,wenever
gottomakingtheclubareality,butwerehopingtosucceedinmakingitnextschoolyear.Inthe
meantime,collectingdataonhowmuchstudentsatLASAplayboardgames,willhelpin
gaugingstudentinterestshouldtheclubbemade.1

(a)and(b)areincludedinthisintroductoryparagraph.

Gokingco2

Samplesizeis30becausethereare30valuesinthedata.The5numbersummaryisthe
following:001265.Themeanis5.97(roundedtotwodecimalplaces).Themedianisthe
middlevalueofthe5numbersummary,sothatnumberis1.Therangeisthelastvalueofthe5
numbersummaryminusthefirstone,sothatnumberis65.Thestandarddeviationis16.06
(roundedtotwodecimalplaces).Varianceis257.83(roundedtotwodecimalplaces).
Interquartilerangecanbecalculatedusingthefourthvalueofthe5numbersummaryminusthe
second.Inotherwords,itisthethirdquartileminusthefirstquartile,whichmeansthatthe
interquartilerangeis2.Anydatapointthatismorethan1.5timestheinterquartilerangebelow
thefirstquartileorabovethethirdquartileisconsideredanoutlier.Multiplyingtheinterquartile
rangeby1.5givesus3.Wetakethreeandadditontothethirdquartileandsubtractitfromthe
firstquartile.Therefore,anydatapointabove5orbelow3willbeconsideredanoutlier.Note

Gokingco3
thatanegativenumberisnotanumberthatappearsinourdatasetbecauseitisimpossibleto
playanegativenumberofboardgames.Theoutliersinourdataare:7,65,25,and60.2

>##########

>#boardgamedataisthedataframeforallofthecollecteddata
>
>#vectorofvaluesfromboardgame
>boardgame<boardgamedata$Boardgame.Frequency
>#5numbersummary
>fivenum(boardgame)
[1]001265
>#mean
>mean(boardgame)
[1]5.966667
>#standarddeviation
>sd(boardgame)
[1]16.05697
>#varianceisstandarddeviationsquared
>sd(boardgame)*sd(boardgame)
[1]257.8264
>#IQR
>fivenum(boardgame)[4]fivenum(boardgame)[2]
[1]2
>#histogramofthedata
>hist(boardgame,xlab="TimesPlayedaBoardgame",main="HistogramofTimesPlayeda
BoardgameinPastMonth")
>#boxplotofthedata
>boxplot(boardgame,ylab="TimesPlayedaBoardgame",main="BoxplotofTimesPlayedaBoard
GameinPastMonth")
>#stemplotofthedata
>stem(boardgame)

Thedecimalpointis1digit(s)totherightofthe|

0|000000000000001111222222337
1|
2|5
3|
4|

(c)(e)oftheoriginaldataisinthisparagraph.Workshownbelowincludesstemplot.

Gokingco4
5|
6|05

Gokingco5

Gokingco6
Now,weregoingtoadd100toeachnumberinourdatasetanddothesamecalculationsaswe
didpreviously.Thesamplesizeisstill30novaluesfromthedatasetweresubtractedoradded.
Thefivenumbersummaryofthedataisthefollowing:100100101102165.Themeanofthe
datais105.97(roundedtotwodecimalplaces).Thismeanis100morethantheoriginalmeanof
theoriginaldata.Themedianisthemiddlenumberofthefivenumbersummary,sothatnumber
is101.Thisis100morethantheoriginalmedian.Therangeisthelastnumberofthefive
numbersummaryminusthefirst,soweget65.Thestandarddeviationis16.06(roundedtotwo
decimalplaces),andthisisthesamestandarddeviationthatwegotintheoriginal.Varianceis
257.83(roundedtotwodecimalplaces).Theinterquartilerangeisthefourthnumberofthefive
numbersummary(thirdquartile)minusthefirst(firstquartile),andweget2.Multiplyingthe
interquartilerangeby1.5givesus3.Wetakethreeandadditontothethirdquartileandsubtract
itfromthefirstquartile.Therefore,anydatapointabove105orbelow97willbeconsideredan
outlier.Theoutliersinourdataare:107,165,125,and160.3

>##########100
>
>#add100toeachvalueinboardgameandcreatevectoroutofthis
>boardgamehundred<boardgame+100
>#5numbersummary
>fivenum(boardgamehundred)
[1]100100101102165
>#mean
>mean(boardgamehundred)
[1]105.9667
>#standarddeviation
>sd(boardgamehundred)
[1]16.05697
>#varianceisstandarddeviationsquared
>sd(boardgamehundred)*sd(boardgamehundred)
[1]257.8264
>#IQR
>fivenum(boardgamehundred)[4]fivenum(boardgamehundred)[2]
[1]2
>#histogramofthedata
>hist(boardgamehundred,xlab="TimesPlayedaBoardgame",main="HistogramofTimesPlayeda
BoardgameinPastMonth")
>#boxplotofthedata
>boxplot(boardgamehundred,ylab="TimesPlayedaBoardgame",main="BoxplotofTimesPlayeda
BoardGameinPastMonth")
>#stemplotofthedata
>stem(boardgamehundred)
3

(c)(e)ofadding100plusstemplotintheworkshownbelow.

Gokingco7

Thedecimalpointis1digit(s)totherightofthe|

10|000000000000001111222222337
11|
12|5
13|
14|
15|
16|05

Gokingco8

Now,weregoingtoincreasethenumbersinouroriginaldataby50%andmakethesame
calculations.Samplesizeis30.5numbersummaryis:001.5397.5.Themeanis8.95(rounded
totwodecimalplaces),whichis50%addedtotheoriginalmean.Medianis1.5,whichis50%
addedtotheoriginalmedian.Rangeis97.5.Standarddeviationis24.09(roundedtotwodecimal
places),whichis50%addedtotheoriginalstandarddeviation.Varianceis580.11(roundedto
twodecimalplaces).IQRis3.Outliersare:90,37.5,97.5,and10.5.4

>##########50
>
>#increasenumbersby50%
>boardgamefifty<boardgame+(boardgame/2)
>#5numbersummary
>fivenum(boardgamefifty)
[1]0.00.01.53.097.5
>#mean
>mean(boardgamefifty)
[1]8.95
4

(c)(e)areincludedinthisparagraph,andtheworkshownbelowhasthestemplot

Gokingco9
>#standarddeviation
>sd(boardgamefifty)
[1]24.08546
>#varianceisstandarddeviationsquared
>sd(boardgamefifty)*sd(boardgamefifty)
[1]580.1095
>#IQR
>fivenum(boardgamefifty)[4]fivenum(boardgamefifty)[2]
[1]3
>#histogramofthedata
>hist(boardgamefifty,xlab="TimesPlayedaBoardgame",main="HistogramofTimesPlayeda
BoardgameinPastMonth")
>#boxplotofthedata
>boxplot(boardgamefifty,ylab="TimesPlayedaBoardgame",main="BoxplotofTimesPlayedaBoard
GameinPastMonth")
>#stemplotofthedata
>stem(boardgamefifty)

Thedecimalpointis1digit(s)totherightofthe|

0|00000000000000222233333355
1|1
2|
3|8
4|
5|
6|
7|
8|
9|08

Gokingco10

Gokingco11
Now,weregoingtodosomecalculationsassumingthatouroriginaldataisanormal
distribution(anditclearlyisnt).5unitsaboveourmeanis10.97(roundedtotwodecimal
places).Wetakethisnumberandsubtractthemeanfromit.Then,wedividethatnumberbyour
standarddeviation.Wethengetazscoreof0.31(roundedtotwodecimalplaces),andthat
correspondsto38%(roundedtotwodecimalplaces)ofthepopulationbeinggreaterthan5units
aboveourmean.Explaininghowtofindthepercentthatisbetween3unitsbelowthemeanand
2unitsabovethemeanisalittlecomplicatedtoputinwords,buttheworkshownbelowwill
clearanyconfusion.Webasicallytookthezscoresof3unitsbelowthemeanand2unitsabove
themeanandsubtractedthepercentagesthattheycorrespondedto.Weendedupwitha
percentageof12%(roundedtotwodecimalplaces).Tofindthenumberofunitsrequiredforthe
top10%,wehadtodoalittlebitofalgebra.Fromatable,wedecidedtousethezscoreof1.29
asourcutoffpointforthetop10%.Wetookthatnumber,multiplieditbythestandard
deviation,thenfinallyaddedthemean.Wegot26.68(roundedtotwodecimalplaces).

>##########normal
>
>#%thatisgreaterthan5unitsabovemean
>zFive<((mean(boardgame)+5)mean(boardgame))/(sd(boardgame))
>1pnorm(zFive)
[1]0.3777516
>
>#%between3unitsbelowmeanand2unitsabovemean
>zThree<((mean(boardgame)3)mean(boardgame))/(sd(boardgame))
>zTwo<((mean(boardgame)+2)mean(boardgame))/(sd(boardgame))
>pnorm(zTwo)pnorm(zThree)
[1]0.1236675
>
>#unitsfortop10%
>(1.29*sd(boardgame))+mean(boardgame)
[1]26.68016

Lookingatourdata,ImveryreluctanttoevenmakeaSettlersofCatanclub.Thereisahigh
concentrationofpeopleplayingzeroboardgamesinthepastmonthasconveyedintheboxplot.
Interquartilerangeisverylow,whichmeansthattheconcentrationofzerostoclosetozerosis
prettyhigh.Onecouldarguethattheclubcouldbesemisuccessfulbecausethereareoutliers
whoplayplentyofboardgamesduringtheirfreetime,butIcouldarguethatoncetheyfindout
thatthemajorityofpeopleatLASAwillnotjointheclub,theywillstarttoloseinterest,andIll
beleftwithasmallgroupofloyalclubmembers.Imlookingforasteadyflowofpeoplecoming
intomyclubtoplaysomeCatan,sowhatmakesmethinkthatifthemajorityofpeoplehave
playedzeroboardgamesinthepastmonth,theyllplayCatan?Ultimately,therearewaytoo
manypeopleuninterestedinboardgamesasshownbytherightskewedhistogram.47%ofmy
populationhaveplayed0boardgamesinthepastmonth,and90%havenotbrokenpastplaying

Gokingco12
10boardgamesinthepastmonth.IfIweretomakeaSettlersofCatanclub,Iwouldbelooking
atahandfulofloyalclubmembersandnotmuchelse.

Você também pode gostar