Você está na página 1de 28

DEMYSTIFYINGGAUSSIANDISTRIBUTION

Theaimofthisparticularexcursionistoexploreandletexplainfrom
anintuitivestandpoint,oneofthemostubiquitousrealitiesever
discoveredbythegeniusofourera.Iwillapproachthissubjectina
logicalyetrandomstylebypickingupaspectsfromvariousareas
whenandwhererequiredtohelpstimulate(hopefully)theinterest
ofthereaderinunderstandingthebeautyofthisconcept.Though
thetreatmentprovidedisintuitive(tomybetterability)bybreaking
apartthemysteriousandscarylookingequationintotangiblepieces,
therewillbesomemathematicalrigorinvolvedattimes(whichcan
beskippedoverwithoutanylossofdetail)thatcanthrowlightfor
themathematically(orrathercuriously)inclinedindividualstowards
understandingtheapproachtakentoarriveatthefinalproduct.
Evenotherwise,pleasedontgetscared/demotivatedlookingat
symbolslike_, , , p, o, 0, ] ,
d
dx
, c ctcthatyouwill
frequentlyencounterinthepagestocome!Theyhavebeen
adequatelyexplainedfromanintuitiveperspectivetohelp
appreciatetheentirejourneywhichispurelyaresultofmypartially
exhaustiveresearchonthesame.
GaussiandistributionnamedafterCarlFriedrichGauss,consideredas
theprinceofmathematicians,isalsoknownasNormaldistribution,
owingtothefrequencywithwhichnaturalphenomenonfollowthis
pattern(hencenormal)oraBellcurvesinceitresemblestheshapeof
abell.
Whatisthebasicutilityofanyprobabilitydistribution?Forthat
matter,whatisprobability?Whatisadistribution?Howdoesithelp
usunderstandstuff?
ThewordProbabilityisusedwhenwearespeakingofuncertainty
.whichisduetolackofclarityorwhatwerefertoasnoise.Noise
owingtotoomanyfactorsthatarenotinourdirectcontrol.Sun
risingintheeastisacertaintyunlikeanexpectationofanimmediate
outcomeof6ofathrowndiewhichisaprobability.Wherethereis
noclarity,thereisprobabilityandwherethereiscompleteclarityof
theoutcome,itbecomesacertainty.
Theoutcomeofaheadoratailwasaprobabilityforaslongasthe
tossedcoinwasintransitandimmediatelyaftercollapsingonthe
ground,itbecameacertainty(byeitherprojectingaheadora
tail).Mathematicallyassigningvalues,wecansaythatthepossibility
ofaneventnothappeningandhappeningliesbetween0%and
100%whichcanbeequatedtoallthevalueslyingbetween0and
1(including0and1).Soiftheprobabilityofaneventhappeningis
0.56,itisequivalentto56%andifitis1,thenitis100%.Thesumof
alltheprobabilitiesofallpossiblescenariosinanyeventshouldsum
upto1.Intheexampleofacoin,wehave2outcomes,Headsortails,
hencetheindividualprobabilityweightassignedis1/(no.ofpossible
states)whichis=0.5.Inthecaseofadie,thereare6possible
statesandhencetheindividualprobabilityweightageperscenariois
1/6=0.16.whichmeans,sincethereare6
scenarios(1,2,3,4,5,6),summingupeachoftheprobabilityvalues,we
get0.16+0.16+016+0.16+0.16+0.16=1.Thislogicholdsgoodforany
event(therearemoreadvancedconceptsandinterrelationswhichis
beyondthescopeofthepresentdiscussion).
Sincewegotclarityofwhatprobabilityis,letsspeakabouta
distribution.Distributionreferstoassignmentofvaluesoverany
givenarea.Area?Werefertoareaasaspacewithinwhichwecan
accommodatetheoccurrenceofaparticularevent.Letsdrillitdown
usinganexample.

Intheabovediagram,weseeasnake(ajumpingone!)whichis
abouttocrossarectangularclosedfencewhichwewillequatetoa
closedarea.Letsequatethisareato1(whichreferstothesumofall
Individualprobabilityvalues).whichmeans,theprobabilityifthe
snakefallsintothefenceis1anditis0ifitfallsoutsidethefence.


.Now,withintheareaof1,wehavedivideditinto3equalparts.So
now,ifwewanttoassigntheprobabilitytoindividualboxes,based
ontheabovelogic,sincethereare3boxes,its1/3=0.33whichequals
33.3%.whichmeanstheprobabilityofthesnakefallingintoanyof
the3boxes1,2or3is33.3%.Observethataswearebreakingdown
theboxfurther,wearegettingmoreinformationabouttheposition
ofthesnake.Whatistheprobabilityofthesnakefallinginthearea
thatincludesbox1andbox2.Itisnothingbutthesumofindividual
probabilitieswhichis0.33+0.33=0.66=66.6%.thisiswhatwereferto
ascumulativeprobability(theoverallprobabilityarrivedatbyadding
individualprobabilities).whatisthemostcommonpointfromwhich
itmeasuresthesamefromboththeextremities(firstandthirdbox)?
Itisnothingbutthecentreofthesecondbox.
Whenthesnakeputsitsnormal(natural)efforttojump,itwill
mostlylandinthesecondbox(Averagesnake)!Why?Sinceifitputs
inanextraeffort,itlandsinbox3(excitedsnake)andwithlesseffort,
inbox1(Lazysnake!).Weseethatbothbox1andbox3are
exceptionsinsnakesperformancewhereasitsnaturalpotential
makesitlandmostlyinbox2.Inagivennumberofchances,the
maximumlandingswillbeinbox2whichmakesitanormalsnake!
ThisiswhatwerefertoasacentraltendencycalledMean(or
average).butintheabovecase,whyistheprobabilityequallydivided
amongthethreeboxes?Sincealltheboxescreatedbydividingthe
rectangleareofequalsizeandhenceequalareaandhenceequal
probabilityweightages.Whichmeansthesnakeisequallylazy,
normalandexcitedifitfallsintoanyofthe3boxes!Soletsremodel
theaboveexample.

Intherevisedfence,weknowtheareacoveredbyallthe3boxes1,
2and3areallnotthesame.Weseethatbox2coversmaximum
areaandboxes1,3coverminimumareasandhenceweseethatthe
shapesarenotuniformandhencetheareasandtheprobabilitiesare
notuniform.Forthematterofconvenience,letsassumethatbox2
covers50%ofthearea,whichmeansthattheprobabilityofthe
snakefallingintobox2orratherbeinganaverageperformeris
0.5.sosincetheother2boxessumupto50%whichequals25%
each,theprobabilityofthesnakebeinglazyorexcitedis0.25.Since
mostofthetimes,thesnakefallsinthecentreofthefence,it
becomestheaveragepotentialofthesnake.
Anymissoutfromthiscentertoeitherend(box1or3)isconsidered
adeviationinthesnakesperformance.Itbecomesapositive
deviationifthesnakecrossesthecenterandanegativedeviationif
thesnakelagsbehindthecenter.Therecanbesomanydeviations
dependingonthenumberofgrids(variouspotentialsofasnake)that
separatetheboxes.

Intheabovetriangularfence,ifwedrawacentrallinefromitspeak
asbelow,weget2perfectlysymmetrical(mirrorimage)rightangled
trianglesABCandACD.Whichmeanstheareaisdividedinto2
halvesandhenceeachtrianglehasanareaof0.5

Ifwehavetheabilitytodrawlinesintoeachoftheabove2right
angledtrianglestodividethem,ifIcandraw9linesinABCwith
samespacing,IcanmirrorimagethesamenumberoflinesinACD.
WhichmeansifIhave10smallboxedareasmarkedby9linesinone
triangle,igetthesamenumberof10smallboxedareasinother
triangle(sincebotharemirrorimagesofeachother).Sincewehave
10boxesonthelefthandsideofthecentrallineAC,wereferto
themas10negativedeviations.Sincethereare10boxesontheright
sideofAC,wehave10positivedeviations.Torefertoeachofthe
deviationsfromacommonperspectivebyneglectingthe
directionality,weusetheterminologyasstandarddeviationwhich
refersto+andofasetofdeviations.IfIsay2standarddeviations,
itmeans2boxestotheleftofACand2boxestotherightofAC.
Intheabovesnakeexample,weonlyreferredtoatrianglewith3
boxes.Whatifwearedividingthetriangleintomanyboxes!Wesee
thattheareaoccupiedbyboxeswilldecreaseastheyreachtowards
eitheroftheendsBorD.TheboxesclosetoACwillcovermaximum
areaandhencemaximumprobabilityunlikeboxesfarawayfromAC
whichcoversmallareasandhencelessprobability.
Whichmeansthechancesofaneventhappeningatthecenteristhe
highest(referredtoasthemaximumfrequency)anditdecreases
graduallyasitmovestowardsBandD.Thisiswhatwecallasa
distributionpattern.

Thespreadofprobabilitiesfrom0to1acrosstheentiretriangleis
whatisreferredtoasaprobabilitydistribution.Sincetheaboveone
isatriangle,wecallitatriangularprobabilitydistribution.Inthe
sameway,basedonthetypeofshapeadistributionacquires,we
havemanyotherdistributionseachhavingdifferentproperties.
Intheabovetriangle,weseethatthereisasteepandlinearincrease
fromBtoAandagainasteepdecreasefromAtoD.Butmostofthe
realtimeeventsoroccurrencesdonotfollowthisperfectlinear
trendastheyareconstantlyandgraduallyevolvingprocesses.Thisis
whereweneedtodiscussaboutoneofmoreobscurefriendse.
Yes,themathematicalconstantorratherthenaturaluniversal
constantofgrowthe.
ereferstotheexponentialgrowth.Butwhatisexponential?It
referstothebaserateofgrowththatanynaturallyevolvingentity
canattaininagiventimeframe.
Bacteriadoublingtheirgrowthevery24hoursormoneybecomes
doubleitselfinayear,whatsbasicallyhappeningisthoughthetime
frameisdifferent(24hours,1year,1sec...Etc),theoriginalis
replicatingitselftherebycreating1more.
Whichmeans1(original)+1(outcomeoforiginal)=2,whichis
(1 +1uu%)
n
,wherenreferstothetimeperiod.
Butisitsothatthegrowthishappeningsodiscretelyinlinearsteps?
Isitsothatsuddenlyafter24hoursIseethatthebacteriagot
doubledormoneybecamedoubledsuddenlyafterayear?
No.Itsaverygradualandcontinuousprocess.Letsdigin.Ifwelook
atmoney,say100Rsisyielding100Rsin1yearwhichisdouble,ifI
ambreakingupthetimeframeinto6monthseach,my100Rswould
haveearned50Rsattheendof6
th
monthwhichis
(1 + 1uu%-
1
2
)
2
=2.25
Nowthese50Rswillstartearninganadditionalinterestof100%
whichis50Rsinayearor25Rsin6months.These25Rswillstart
earning25Rsinayearor12.5Rsin6monthsandsoon.Thismeans
thateachoftheoutcomesfromtheoriginaliscontinuously
compounded.Letsincreasethetimeframefrom2periods(6months
each)to365periods(1dayeach)whichbecomes
(1 +
1
365
)
365
=2.714
Ifwegoonslicingthetimeperiodtothemaximumpossibleextent,
wewillarriveatamaximumcompoundedrateofreturnwhichis
2.71456whichisreferredtoasewhichis
c
x
= 1 +
x
1!
+
x
2
2!
+
x
3
3!
+ , - < x <
Soanynaturalrateofgrowthfollowsanexponentialpattern,
whetheritisthebacterialgrowth,populationgrowth,radioactive
decayandmanynaturalprocessescanbemodelledusingc.
Soc
x
isdefinedasanexponentialfunctionthatwewillbedealing
withtoderivethenormaldistribution.
Whatisafunctionbytheway?Afunctionshowsarelationship
between2entities,oneisindependentandtheotherdependent.
Sayifforexamplecloudscauserain,whichmeansrainisanoutcome
ofcloudsandcloudswerethecauseofit.Sincerainisdependent
andcloudisindependent(inthepresentframeworkof2entities),
letsequateCloud=x,rain=y,thenxcreatedyi.e.Cloudscreatedrain.
Inmathematicalnotation,werefertoitasy=f(x)orrain=f(clouds)
wecallitasrainisafunctionofclouds,
LikewiseMisery=f(Desire).Thismeansanychangeinconstant
x(desire)willhaveadirectimpactonitsoutcomey(misery).
Soletsconsidertheexponentialfunctiony=c
x
.Beforewedelve
intowhatisthevaluexshouldtakeinthisfunction,letsgetbackto
thebasicnormaldistributionsandtheirnature.

IfwelookatthedistributionsabovewithYshowingthefrequencyof
occurrenceofagivenvariableandXshowingthespreadofthedata
values,wefindthatwithrespecttomeanof5,thedataisclosely
concentratedbetween0an10unlikeforthemeanof20,whereitis
spreadfaracrossbetween10and30.whichmeansdistribution1had
lessdeviationanddistribution2hadmoredeviationfromthemean
therebystretchingtheentireareaunderthecurve.
ThoughDistribution1looksbiggerthandistribution2,bothactually
coverthesamearea.Itsjustthatthedistribution2isstretched
acrosshorizontallyonbothendstherebyincreasingthespreadof
datapointsbetween10and30.Wecanobservethattheshapeofthe
distributiondependsontheincreaseanddecreaseofthe
mean(frequencyofmean)onXaxisanddeviationonYaxis.Imagine
blowingaballoonwithablackspotonitandseehowthespotsarea
getsincreasedbasedontheextentofinflationthoughthetotalarea
isaconstant!
Inthesamewayasfordist1, p = S, o = SandforDist2,p =
2u, o = 1u,therecanbeinfinitenumberofdistributionswithinfinite
valuesforp onJ o.Tofindouttheareaundereachofthesecurves
wouldbeextremelylaboriousandnonvalueadded,hencewelook
atanapproachwherewecanstandardizebymeasuringeachof
thesedistributionsonacommonplatform.
Todothis,weconsiderthevaluesp = u ando=1.Intheabove
case,distribution1wasplottedwithvariousvaluesofxonXaxis
from0to10.sinceweknowthevalueforp onJ o,wecan
standardizeitbysubtractingthevalueofpfromxandthendividing
theresultbyo.Itbecomes
Z=(x -p)o
Whatwearedoingisfiguringoutthedistanceofeachpointonthe
graphfromitsmeanandthendividingitbyitscorrespondingo
whichgeneratesastandardnumberforeachvalueonXaxiscalled
theZnumber.OncethevaluesofXarebeensolvedbasedonthe
abovenotation,theygettransformedtovaluesmostlyranging
between3and3(therewillbevaluesofz > S andz<3butthey
hardlycontributeto0.3%ofthedatapointswhichwillfalloutside
thenormaldistributionandwillbeconsideredasoutliersandhence
forthesakeofclarity,wewilldefinetherangeofzonlybetween
_S)withameanof0.ThisiswhatwecallasaZTransformation.A
veryimportantpointtonotehereisthatweareconvertingallthe
normalpatternswithvaryingvaluesofpandointoastandard
normalpatternandforallnonnormalpatterns,adifferentapproach
hastobetaken(beyondthescopeofthistopic).Aswediscussedin
thepast,sinceweconsideredo as1here,3means3negative
deviationsfrom0(lefthandside).+3means3positivedeviationsto
therighthandsideof0.Byignoringthe+/,wesaythatthe
transformeddataliesbetween3standarddeviationswitha
transformedmeanof0.
Sowhichmeansthattheentireareacoveredbythenormalcurveis
dividedinto6parts,3oneitherendthiswerefertoas3standard
deviations.Thisthentransformstoastandardnormaldistribution
withp=0ando = 1.
Thereasonweemphasizeonlyonthese2parameters, pandois
thatwecanknowthelocationofanypointinthecurvejustby
referringtomeanpasacentralpointofthecurveandfigureout
howdistantthepointisfromthecentralpointbylookingatthe
deviationzonethatitisfallingunder.Justtryrelatingitwiththe
snake/fenceanalogytogetbetterclarity.

Ifwelookatthecurveonthelefthandside,weseethatthevalues
rangefrom- to +.Thismeansthatallthevaluesonthenumber
line,bothonthepositiveandnegativeaxisareaccommodatedby
thecurve.Alsonoticethatbothendsofthecurvearenottouching
thepositiveandnegativexaxisassincethevaluesaretending
to _,theywillneverbeabletotouchtheaxis.
Touchingtheaxisimpliesthatthevaluesarefinite.Onceweapplya
Ztransformation,weareconvertingthisrangeof(- to +)to
3to+3.
Nowletstrymodellinganormalcurvebyusingthefunctionc
x
.Post
transformation,thevaluesofxwillrangefrom3to+3.So,weare
modellingthegrowthfunctioneusingthisrangeasweknowthatthe
transformedvaluesliebetweenthese2limits.

Whenwesubstitutethevaluesbetween3and+3inthefunction
F(x)=c
x
,wegetthebelowgraph.Wecanclearlyseethatthevalues
goupexponentiallyfromc
-3
toc
+3

Instead,ifweusethefunctionc
-x
,wegetthebelowgraph.Thisis
nothingbuttheinverseofexponentialfunctionrangingfromc
3
to
c
-3
knownasthelogarithmicfunction(anotherinterestingpattern!)


Ifwelookattheabove2graphs,wecaninferthatournormalcurve
isacombinationofthese2patterns,onewhichisanincreasing
function,reachestheclimaxatthemeanandbecomesadecreasing
function.
Letsraisethepowerofxto2makingitaquadraticexponential
function.
NowforF(x)=c
x
2
,weevaluatetherangefromc
(-3)
2

toc
(+3)
2
which
becomesminimumatc
0
=1andmaximumatc
9
=8103
(atx = _S, on citbcr siJc o u).Theplotofthegraphisshown
below.

3
Basedontheabovegraph,nowwecaneasilyreplicatethenormal
curvebychangingtheorientationoftheabovegraphbyaddinga
minussign.Letsseehowthisworks.
ForF(x)=c
-x
2
,forthelimits3and+3,thefunctionstakesvalues
fromc
-(-3)
2

toc
-(+3)
2
whichbecomesc
-9
atx=3andc
0
atx=0
andthenagainc
-9
atx=+3whichgivesusthepatternofthe
increasinganddecreasingfunction.Thegraphlooksasbelowwhich
perfectlygeneratesanormalorabellshapedcurve.

Actually,theabovefunctionF(x)=c
-x
2
shouldberewrittenas
F(x)=c
-z
2
asallvaluesfrom3to+3aretransformedvaluesofx
from(- to + ).
So,ForF(x)=c
-z
2
,bysubstitutingz=
(x-)
c
,weget
F(x)=c
-[
x-
o

F(x)=c
-(x-)
2
c
2

ThewidthoftheBellcurve,definedbyoishalfthedistancebetween
itsinflectionpoints.Aninflectionpointforanormalcurveisapoint
atwhichthenormalcurvechangesitsshapefrombeingoutside
concavetoinsideconcave.

_ - Inlcction point
cucs n
cucs out

Wecanclearlyseeabovethepoint(circled)fromwhichthedirection
ofthecurveischanging.Rememberthatforanormalcurve,the
inflectionpoint(oneithersideofthemean)perfectlycoincides
(explanationbeyondthescopeofthisdiscussion)withthefirst
standarddeviationofthecurve.
Hereinthiscaseitwillbebetweenthefirststandarddeviation
whichisequalto2deviations,+and.Bydividingthevalueoverthe
exponentbyhalf,weadjustthewidthofthecurveto1deviation
pointinsteadof2.
Theequationthenbecomes
F(x)=c
-(x-)
2
2c
2
A
Thisisthefunctionthatgeneratesthenormalcurve.Butaswe
discussedinthepast,thepresentequationlooksatanormalcurve
withanygivenvalueofp onJ o.Tostandardizeit,wehaveto
substitutethevaluesofp to u onJ o to 1.thentheequationgets
transformedto
F(x)=c
-z
2
2

Sameastheequationabovewithzexceptforthewidthadjustment.
Theabovefunctionisjustanormalcurvebutnotanormal
probabilitydistributionfunction.Sinceweknowthattheprobability
liesbetween0and1,with1asthemaximumwhichalsoistherange,
wehavetodivideourfunctionbyitscalculatedareatoarriveata
resultof1.Ifyourememberthesnakefenceanalogy,youwould
haverememberedthatweequatedtheentireareato1toevaluate
probability.
Sayforexampleiftheareaofourfenceis50units,tomakeitequal
to1,Ineedtodivideitby50unitssothat
50
50
= 1.Thisprocessof
convertinganormalfunctionintoaprobabilityfunctionisknownas
normalizingandtheconstantisknownasanormalizingconstant.
Inthesameway,wewillbeevaluatingtheareaoftheabove
function(curve)withinthelimits(-, +)anddivideitbyits
respectivestandardareatoequateitto1.
i.e.P(x)=] F(x)
+
-
/Area(F(x))=1B
WhereP(x)istheprobabilitydensityfunction.
Wewillintegratetheabovefunctionbetweenthelimits_to
arriveattheareacoveredbythefunction
A=] c
-x
2
2
Jx
+
-

Toresolvetheaboveintegral,letsmultiplytheintegralbyitself
(withvariabley),thentheequationbecomes
A
2
=_ c
-(x
2
+
2
)2
Jx Jy
+
-
1
WeusetheCartesianpolarcoordinatesystem(beyondthepresent
scope)whichisusedtoderivetheexpressionforanypointona2
dimensionalplanebyrelatingitwiththedistanceofthepointfrom
theoriginandtheangletraversedbyit,
Wegettwomainexpressions,x = icos 0andy=r sin0whichby
squaringandadding,wegetthebelowexpression
r = x
2
+y
2

{Since x
2
+y
2
= r
2
(sin
2
0 +cos
2
0) - r
2
] , (sin
2
0 + cos
2
0 = 1)
BysubstitutingitinEquation1andtransformingthelimitsofthe
integral,weget
A
2
= ] ] c
-
r
2
2
2n
0

0
rJr J02
(Where0toindicatestheradiusaspectofaparticleand0to2n
indicatesthetotalanglerangetraversedbytheparticleontheaxis,
dxanddyarereplacedbydrandd0
Let

2
2
= u=2r Jr = 2 Ju = rJr = Ju
(Afterdifferentiatingwithrespecttorandu)

SubstitutinginEquation2
A
2
= _ _ c
-u
Ju J0
2n
0

= A
2
= _ c
-u
Ju(J0)
0
2n

= A
2
= 2n ] c
-u
Ju

0

= A
2
= 2n (-c
-u
)

= A
2
= 2n (-c
-
+c
0
)
= A
2
= 2n(u + 1)

= A
2
= 2n(u + 1)
= A = 2n

Theaboveresultthatweobtainedisknownasthenormalizing
constantfortheexponentialfunctionthatwillconvertitintoa
probabilitydistributionfunction.LetssubstituteitinB,thenit
becomes
P(x) =
c
-z
2
2
2n

Thisistheformulaforstandardnormaldistribution.

Thenontransformedformulacanbearrivedatbydividingthe
constantwithEquationA
= P(x) =
c
-(x-)
2
2c
2
o2n

Wherewemultiply2nwithotoarriveattheareaofanynon
standardnormaldistributionasoisthemeasureofthedistribution
width.ThisishowaplottedNormalcurvelookslike.

Wecanclearlyseethatbeyondthezscoreof3,thereisanextremely
smallportionofareaavailableandforallpracticalpurposes,we
mostlyrefertozvaluesfrom3to+3.Nowthatwehavederivedthe
Normalprobabilitydistribution,ournextgoalistofindoutthearea
occupiedbythecurvewhichofcourseweknowisequalto1.But
withininthiscurve,whatifwewanttofindouthowmuchareais
presentbetweenthemeanpandfirststandarddeviationo?Life
wouldhavebeensoeasyifeverydistributionwouldhavebeenso
linearwhichisoftennotthecase.Eventhen,itsimportantto
understandsimplicitytogetahangofcomplexity.

Letstakeanexampletounderstandthisbetter.


Lookattheabovedistributionpattern.Itslinearwithoutanycurves.
Findingtheareawouldbesoeasyassinceweknowitsarectangle
with6divisions(deviations)andthetotalareashouldaccountto1,so
eachboxorratherintheabovecase,areabetweentwodeviation
gridlineswillbe1/6=0.16.Sotheprobabilitythatagivenpointwill
fallintoanyoftheabovegridsis0.16or16%.WhatifIdonthavea
clarityofwhichgriditfallsintobutwanttoknowwhatisthe
probabilityofthepointfallinginaboxabove0?
Fromtheabove,weknowthatthereare3boxesafter0andsince
eachboxcarriesaprobabilityweightageof0.16,for3boxesitwillbe
3*0.16=0.5or50%.
Whatistheprobabilitythatagivenpointwillfallinthefirststandard
deviationrange.Rememberfromwhatwehavediscussed,1standard
deviationmeans1deviationoneithersideofthezero(sincehereits
thefirststandarddeviation)whichis_1.Fromtheabove,weknow
thatthereare2boxesthatarepresentbetween1and+1,hencethe
probabilitywillbenothingbuttheareaof2boxeswhichis
0.16*2=0.33or33%.
Whatistheprobabilityofapointfallingin0.5standarddeviation?
Tricky?Notreally.Sinceweknowthat0.5omeans_ u.So,which
means0.5boxesoneithersideof0(since1deviationequals1box
here),theareacoveredwillbe0.5boxes*2=1Box=0.16=16%.Till
now,wewereonlyfindingprobabilitiesinbetween.Whatabout
at?SayIwanttoknowtheprobabilityofthepointlyingat1
st

standarddeviation?Letsthinkabouthowwedefinearea.Areais
alwaysaproductof2linesinsimpleterms.Whetherwecallitlength
orbreadthorwidthisasperconventionsinvolvedwhichgivesusthe
2dimensionalpictureofanyobject.Thelinesintheabovecaseare
anytwolinesinthedistributionusingwhichwearriveatthearea.
Butwhataboutapoint?Doesapointhaveanyarea?Letsseebelow
forsomeimportantinsights.

Thefirstlineweseeiscontinuousandasitisbreakingdownfurther,
itisbecomingdiscontinuousordiscrete.Nowwhatarethesenew
terminologiesContinuousandDiscrete?Itsinherentinthewords
themselves.Whenwesaycontinuous,itswithoutanybreakinthe
Flow.Breakisanypauseorobstructionthatisnotallowingapoint
tobecontinuous.Whichmeanswhat?Ifwelookatthefourthline
abovewhichisadottedline,weseethatthereisaregularpause
(gaporspace)ateverysingleinstantbecauseofwhichitdemarcates
andgivesanidentitytoeverysinglepointoutthere.Thisiswhatwe
termasDiscrete.Whichmeansitcantakeoneandonlyonevalue.
Wecanseethatthisdotteddiscretelineasitisgoingupisincreasing
insize(withspaces)andfinallywhenweeliminatethespacesorgaps
betweenthesespots,itbecomesaContinuousline.
Wecannotidentifyanyuniquepointonacontinuouslineandthe
entirelineisasingleentity.Butweknowthatthiscontinuouslineis
madeupofdiscretedotswhichceasetobediscreteaftera
threshold.Butwhatisthisthresholdnoonecanactuallyquantify
thoughwecandefinetoanextent.Assumetheabovecontinuous
linewasmadeupof100pointsordotswhicharediscrete.Nowmy
definitionofadotasdiscretewillceasetobesowhenIincreasethe
magnification.Thedotstartslookinglikeacontinuouslinewhich
furthercanbebrokendownintodotswhichwhenmagnifiedwill
againlooklikeacontinuouslineandtheprocesswillgoonad
infinitum.Youcangettransportedfromanormalleveltoamicro,
nano,pico,femtolevel.Soontillyouendupatthelastatom(or
ratherthemorefundamentalquark)thatmadethelinecomplete!!
Rememberhowagiganticstarlookslikeatwinklingspotfromthe
earthwhichitselfisanegligiblespotfromthestarsperspective!!Its
allrelative.Thisiswhyforthesakeofapproximationtoovercome
thegranularityissues,theareaofanygivenpointonacontinuous
scaleis0.
Oneveryimportantinsighthereisthatdotsintegratetoformaline
orlinedifferentiatestogiveadot.Soundslikecalculusisnt?!!Drops
integratetoformanoceanoranoceandifferentiatestogiveadrop.
Wecanusethebelownotations.
]rops = 0cconAnd
d
dx
(0ccon) = rop
Integration(Elongateds)meanssumofwhichsumsupallthe
valuesinagivenrange.
Differentiationmeansdifferentialofwhichfiguresouttherateat
whichadependentvariablechangeswithrespecttoindependent
variable.
Rememberwearrivedattheresultof2nbyintegrating(summing
up)theareabetween_.
Fromtheaboveexample,
] ot = Iinc
100
0
(Sincewesaid100pointsmakealine)
Summationdoesnotmean1+2+3+.......+100.Itmeanssumofa
smallestfixedquantumnnumberoftimeswhichis
Jx +Jx + Jx . n timcs = 1+1+1+.......100timesand
d
dx
(Iinc) = ot(Thebasicpossibleelementaryunit)
Whichmeanswhenyouaretryingtobreak1continuouslinemade
of100discretedotsintoitsleastpossiblediscreteunit(hereinthis
caseits1),wearedifferentiatingit.Andournormaldistributionisa
continuousdistribution(rememberthenatureofexponential
processes)andnotdiscrete.Henceweneedtoalwaysspecifya
searchrange(upperandlowerlimit)whenitcomestofindingthe
addressofapointinthenormalcurve.Thisisthecruxoffiguringout
probabilitiesforanyprobabilitydistribution.
Theonlychallengeisthatnotalldistributionsareasstraightforward
astheaboveone.Sincetheaboveoneisarectangulardistribution
(withastraightlineparalleltoXaxis),thefrequencylimitwhichis
theheightofapointonXaxistothehorizontallineisalwaysa
constantunlikeadistributioninwhichthefrequencychanges
continuouslyowingtoacontinuouslyfluctuatingline(curve)to
whichthedistanceofapointfromXaxisiscontinuouslychanging.
Wecannolongertreatitasaregularpolygonlikeasquareora
rectangleandcalculateitsareabutweneedtoresorttocalculus.
Everrememberwhatwedowhenwehavetofigureoutthe
probabilitiesofagivenvariablexwithinaspecifiedstandard
deviationofanormaldistribution?
WeareaskedtolookintotheZTable(attheappendixofanyfat
statisticsbook)whichgivesouttheprobabilitieswithinanygiven
range.Illshowyouhowthisisdone.Inordertofindoutthearea
withinanormalcurvewetranslateagivencomplexfunctiontoits
equivalentpolynomialformandcalculatetheprobabilities
accordingly.Inthiscase,weuseaTaylorpolynomialexpression
whichiwillbederivingandwilldemonstrateanexampleastohowit
isusedtoevaluatetheareaofthecurvewithinaspecifiedrange.
ATaylorseriesisparticularlyusedtotranslateagivensmoothcurve
(hereinthiscasethebellscurve)ofagivencentre(mean)
representedbythecomplexfunction
P(x) =
c
-z
2
2
2n

Intoitspolynomialformwhichcanbethenusedtocalculatethe
cumulative(total)areaoccupiedbythecurvewithinanyspecified
interval.TheadvantageofconvertingitintoaTaylorpolynomialis
thattheentirefunctiongetsconvertedintoitsequivalentnumeric
formwhichcanbeeasilyintegratedoveragiveninterval.Butthereis
aslighterrorinvolvedinconvertingafunctiontoitsequivalent
Taylorformwhichisadjustedtoarriveataresult.
Fromthebelowexpression
P(x) =
c
-z
2
2
2n

Weknowthat2nisusedasanormalizingconstant(whichmeansit
makestheentirefunctionequateto1).
Letsjustlookatthenumeratorwhichisc
-z
2
2
whichisoftheform
c
x
(wheie x = -z
2
2).Rememberhowwearrivedatthevalueof
theexponentialfunctionc
x
?Forx = 1,itbecomesc
1
= c=2.718
Nowc
x
= 1 +
x
1!
+
x
2
2!
+
x
3
3!
+ , - < x <
(fromthediscussiononexponentialfunction)
Bysubstitutingx = 1,weget
c
1
= 1 +
1
1!
+
1
2
2!
+
1
3
3!
+ =2.718
Nowforc
-z
2
2
whichisoftheformc
x
(wherex = -z
2
2)
c
x
= 1 +
x
1!
+
x
2
2!
+
x
3
S!
+
x
4
4!
+
x
5
S!
+
x
6
6!
+
= 1 +
x
1
+
x
2
2
+
x
3
6
+
x
4
24
+
x
5
12u
+
x
6
72u
+
Bysubstitutingx = -z
2
2
Weget
c
-z
2
2
= 1 -
z
2
2
+
z
4
8
-
z
6
48
+
z
8
S84
-
z
10
S84u
+
z
12
46u8u
-
Whichisanalternateseriesof_'s
Nowfortheexamplesake,letsfigureouttheareaoccupiedbythe
abovepolynomialformwithintherange0to1.Here0representsthe
meanwhichisthecentreofthenormaldistributionand1represents
the1
st
standarddeviationfromthemean(oneithersideof0)
i. c _1sinceitissymmetricalcurve(theshapeontheleftsideof
themeanisthemirrorimageoftheshapeontherightsideofthe
meanandviceversa).
Asihaveexplainedbefore,thepurposeofintegrationistosumup
thearea(representedbyafunction)withinagivenrange.
Soweneedtoapplyintegrationfortheabovefunctionwithinthe
limits0and1whichis
_ c
-
z
2
2
1
0
Jz
= _ 1 -
z
2
2
+
z
4
8
-
z
6
48
+
z
8
S84
-
z
10
S84u
+
z
12
46u8u
-
1
0

= z -
z
3
6
+
z
5
4u
-
z
7
SS6
+
z
9
S4S6
-
z
11
4224u
0
1

Where(z
n
=
z
n+1
n+1
)(anintegrationoperation)
= 1 -
1
6
+
1
4u
-
1
SS6
+
1
S4S6
-
1
4224u

(Aftersubstituting z = 1)
= u.8SS62S
= mcons opproximotcly
Thereisevenanerrorterm= u.uuuuu2thatisincludedintheabove
arrivedvalue.
Thismeans
_ c
-
z
2
2
1
0
= u.8SS62S

Butweneedtoevaluatetheabovefunctionbyincludingthe
normalizationconstantwhichis
1
2n

Hence
1
2n
_ c
-
z
2
2
1
0

=
1
2 S.14
_ c
-
z
2
2
1
0

= u.S989 u.8SS62S
= u.S41S
Whichisnothingbuttheareacoveredbythecurvefrom0to1which
isthe1
st
deviationontherighthandside.Sinceitisasymmetrical
curve,theareacoveredbythecurvefrom0to1onthelefthand
sidewillalsobethesame,hencetheareacoveredbythe1
st

standarddeviationwillbe
_1 = u.S41S(Righthandsideof0)+0.3413(Lefthandsideof0)
=0.6826= 68.26%
Whichmeansthefirststandarddeviationofanormalcurvecovers
68.26%ofthetotalareaofthecurve(with34.13%oneachsideof
thecurve).
Inthesameway,ifwecalculatetheintegralofthecurvewithinthe
limits0and3whichis3deviationsfrom0,weget
1
2n
_ c
-
z
2
2
3
0

= u.S (Thisistheareaontherighthandsideofthemean)
Sinceitisasymmetricalone,eventheareaontheleftsideofthe
curvewillbeequalto0.5,hence_S = u.S +u.S 1

Hence
1
2n
_ c
-
z
2
2
= u.997
3
-3

Rememberalmostallthedatapoints(99.7%)fallinto_Sobutnot
all.Therewillbeapproximately0.30%ofdatapoints(0.13%on
eithersideofthemean)fallingintothe4
th
standarddeviation.This
means
1
2n
_ c
-
z
2
2
= 1
4
-4


Wecanseethatthemaximumheightastandardnormalcurvecan
attain(atthecenter,mean)=
1
2n
=0.398 u.4
Therecanbemuchmorethatcanbespokenaboutanormal
distribution(probablyinoneofmyfutureexcursions)butmyprime
aiminthispaperwastopresentthedetailsofthegenesis,evolution
andutilityofanormalcurveinasimplifiedmannerandIhopethe
sameisachievedinthisregard.
Feedbackmostwelcomeatspaceinstime@gmail.com
Regards,
KalyanSunkara

Você também pode gostar