Você está na página 1de 8

AMELIAII:AProgramforMissingData

AmeliaIIisanRpackagethatperformsmultipleimputationtodealwithmissingdata,insteadofother methods,suchaspairwiseandlistwisedeletion.Inmultipleimputation,valuesareimputedforeach missingcellinyourdatasetandcompleteddatasetsarecreated.Inthesecompleteddatasets,the observedvaluesstaythesame,butthemissingvaluesarefilledinwithimputationsbasedonaboot strappedEMBalgorithm.Afterimputation,youconductyourstatisticalanalyseswiththecompleted datasetsandthencombinetheresultsoftheimputeddatasets. Forthisexample,theworld95.savfromthePASW17.0sampledatasetsisbeingused.Thereare11 variables,ninearecontinuousvariablesandtwoarecategoricalvalues.Asyoucanseebelow,datais missing,signifiedbytheblankcells.

YoucandownloadtheAmeliaIIpackagefromhttp://gking.harvard.edu/amelia/(pleasemakesureyou havedownloadedtheRprogramfirst[http://cran.rproject.org/]). YoucanalsousethestandaloneAmeliaViewpackagebythefollowingcommandsinR:

> library(Amelia) > AmeliaView()


Page1of8

Step1.UseAmeliaIItoimputedataformissingvalues.
I. Oncethepackageisdownloaded,AmeliaViewwillopeninanewwindow. II. III. SelectImportCSV.Locateyourfilethatissavedasacommaseparatedvalue(.csv)fileandclick open. Thedataisnowloadedintotheprogram.Thisviewprovidesyoudescriptivestatistics(Min, Max,Mean,SD)ofyourvariables,aswellashowmanydatapointsaremissingpervariable (Missing).Forexample,theurbanvariableismissingonevalueoutof109values. Transformation.Usethisoptiontoclassifyvariablemeasurementtypeandtotransform variables(e.g.,logisticorsquareroot),ifnecessary. Lag.Usethisoptionfortimeseriesdata;lagsarevariablesthattakethevalueof anothervariableintheprevioustimeperiod. Lead.Usethisoptionfortimeseriesdata;leadstakethevalueofanothervariableinthe nexttimeperiod. Bounds.Usethisoptiontoplacerestrictionsontherangeoftheimputedvalues.

Page2of8

IV. Transformation.TheAmeliapackagerecognizedthecountryvariableasanIDvariableand classifieditassuch.Totransformthetwocategoricalvariables(regionandclimate),rightclick ontherowofthevariable. SelectNominal.

Page3of8

V.

Repeatthesamestepsfortheclimatevariable. Bounds.Sincefourofthecontinuousvariables(urban,literacy,lit_male,andlit_female)are percentages,boundsneedtobeaddedtorestricttheimputedvaluesrangefrom0to100.Also, theclimatevariableneedstoberestrictedfortheavailablevaluesof1to9(theregionvariable doesnothavemissingdata,sonoboundaddedtothevariable). Rightclickontheurbanvariablerow,andselectAddorEditBounds.

TheAddorEditBoundsboxappearsforyoutoentertheminimumandmaximumvalues;type0 forminimumand100formaximum.SelectOK.

VI. Repeatthesamestepsfortheremainingvariablesthatneedboundsadded. SelectOptionsfromthetopmenu.SelectOutputFileOptions.TheOutputOptionsbox appears;bydefaultthenameoftheimputeddatasetshaveimpattheendofthefilenameand 5imputeddatasetsareselected.SelectOK.

Page4of8

VII.

SelectImpute!ImputationiscompletewhenSuccessfulImputation.appearsatthebottom rightofthescreen.

VIII. SelectOutputLog.Theoutputloggivesyouthechainlengthofeachimputation.Forexample, Imputation4schainlengthwas133. IX. The5imputeddatasetsaresavedinthesamelocationastheoriginalfile.

Page5of8

X. Belowisanexampleofonecompletedatasetfromtheimputedfiles.

Page6of8

Step2.Poolparameterestimatesandstandarderrors.
I. II. Runstatisticalanalyses(e.g.,multipleregression,canonicalcorrelation,etc.)onthe5imputed datasets. Computethemeanoftheparameterestimatesofthe5imputeddatasets.Forexample,a multipleregressionwasconductedusethe5imputeddatasetsandthereare5betaestimates fortheliteracyvariable.Themeanofthe5betaestimatesis3.41. Imputation b 1 2 3 4 5 III. SE Variance(SE2) 3.35 0.11 0.011 3.55 0.15 0.023 3.47 0.15 0.022 3.41

3.23 0.15 0.022 3.43 0.15 0.023

Topoolthestandarderrorestimates,youneedtocomputethewithinimputationvarianceand thebetweenimputationvariance. a. Thewithinimputationvarianceistheaverageofthesquaredstandarderrorsacrossthe manalyses,

1 m U = U i m i 1
isthevarianceestimatefromtheithimputeddataset,andmisthenumberof whereU i imputations.
First,sumthevarianceestimates(0.101),thenmultiplybyonefifth.Thewithin imputationvarianceis0.020. b. Betweenimputationvarianceisthevariabilityofthemparameterestimatesaroundthe meanestimate.

isthebetaestimatefromtheithimputed wheremisthenumberofimputations,Q i Q isthemeanparameter. data,and

1 m B = (Qi Q ) 2 m i 1

Page7of8

First,findthedeviationscoresforthebetaestimates,squarethem,andthensumthe squareddeviations.Next,multiplythatvaluebyonefifth.Thebetweenimputation varianceis0.009. c. Usethefollowingequationtocomputethetotalvariance:

Fortheexample,thetotalvarianceis0.031.Therefore,themultipleimputation standarderroris0.18(0.031 . Imputation b 1 2 3 4 5 IV. Repeatthestepsforalloftheparameterestimatesandstandarderrorsinyourmodel. SE Variance(SE2)

1 T = U + 1 + B m

3.23 0.15 0.022 3.43 0.15 0.023 3.35 0.11 0.011 3.55 0.15 0.023 3.47 0.15 0.022 3.41 0.18

Page8of8

Você também pode gostar