0 Votos favoráveis0 Votos desfavoráveis

22 visualizações20 páginasNov 14, 2010

© Attribution Non-Commercial (BY-NC)

XLS, PDF, TXT ou leia online no Scribd

Attribution Non-Commercial (BY-NC)

22 visualizações

Attribution Non-Commercial (BY-NC)

- The Law of Explosive Growth: Lesson 20 from The 21 Irrefutable Laws of Leadership
- Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
- Hidden Figures Young Readers' Edition
- The E-Myth Revisited: Why Most Small Businesses Don't Work and
- Micro: A Novel
- The Wright Brothers
- The Other Einstein: A Novel
- State of Fear
- State of Fear
- The Power of Discipline: 7 Ways it Can Change Your Life
- The Kiss Quotient: A Novel
- Being Wrong: Adventures in the Margin of Error
- Algorithms to Live By: The Computer Science of Human Decisions
- The 6th Extinction
- The Black Swan
- The Art of Thinking Clearly
- The Last Battle
- Prince Caspian
- A Mind for Numbers: How to Excel at Math and Science Even If You Flunked Algebra
- The Theory of Death: A Decker/Lazarus Novel

Você está na página 1de 20

Inference and Statistical Significance (1) Inferences about the population mean Population mean

Inference and Statistical Significance (2) Inferences about the difference Difference between two population means

between two population means

Inference and Statistical Significance (3) Inferences about the population The significance of linear regression Coefficients

regression coefficients

Inference and Statistical Significance (4) Inferences about the association Chi squared statistic

between categorical variables

The Chi Squared Statistic

Inference and Statistical Significance (5) Inferences about the population Regression

regression coefficients

CONTENTS Population mean

CHILDREN POP MEAN You may have seen or heard of the TV programme called 'Two Point Four Children'.

0 2.4 The premise of this programme was the Census information that for the UK population there

1 2.4 were on average 2.4 children per family unit.

4 2.4 Now look at the data in the A5:A20 range. Here we have the number of children in each of a

1 2.4 random sample of 16 family units. The average number of children for this sample was calculated

2 2.4 in A23 and is seen to be 1.5.

1 2.4 This illustrates the fundamental concept of Inference - namely what can we infer from this sample

0 2.4 value of 1.5 about the average number of children in the population as a whole? - which we believe to be 2.4.

4 2.4 There are two possibilities

3 2.4 a) things in the population have changed since the census and the population average (mean) number

2 2.4 of children is no longer 2.4.

3 2.4 b) the population average is in fact still 2.4 but the sample is not representative and by chance we

1 2.4 have obtained an inordinately low value for the sample average.

0 2.4 Now since we can never be certain about either of these possibilities we will have to use a

1 2.4 "balance of probabilities" approach. In other words we can calculate the probability of obtaining

0 2.4 a sample average of 1.5 from a population in which the actual average was in fact 2.4.

1 2.4 This is the essence of inference, since if we can calculate this probability it will allow us to

choose between the two possibilities in a logical manner. Put simply, if the probability of obtaining

AVERAGE a sample average of 1.5 from a population whose average was in fact 2.4 is 1% then we are

1.5 more likely to infer that the supposition is untrue than if this probability is 99%.

In the former case we would reject the presumed value of 2.4, but in the latter case we would

accept the presumed value.

To calculate this probability we use a data analysis routine called

t-Test: Paired Two Sample for Means

But before we can use this we must replicate the presumed population mean alongside

the sample data. This has been done in the B5:B20 range. Now select the option above from Data

Analysis. The variable 1 range is A4:A20, the variable 2 range is B4:B20, labels was checked and the

output range is selected to be A40. Keep the default alpha value of 0.05.

The results should be the same as those in the following link. Study the comments carefully

for an explanation of the meaning of the calculated statistics. Inference 1

PROCEED TO SHEET 2 (Difference between two population means)

CONTENTS Difference between two population means

A CHILDREN B CHILDREN The previous sheet used data from a single sample to make inferences about

0 5 a presumed population value.

1 3 Sometimes however we will want to take samples from two populations and use these to make

4 4 inferences about the difference between the two populations.

1 2 For example we might calculate the difference between the two sample means and use this to make

2 4 an inference about the difference between the means of the two populations from which they were

1 3 drawn. Usually the presumption will be that the difference between the population means is zero,

0 2 and this will be tested against any of three alternatives:

4 4 The difference is not equal to zero

3 3 The difference is greater than zero

2 2 The difference is less than zero

3 3 To see how this can be done, look at the data in the yellow area.

1 2 Here we have taken a random sample of size 16 from each of two regions (A and B) and recorded the

0 3 number of children in each family unit.

1 2 The question is - can we say that there is a significant difference in the mean number of children

0 4 between family units in the two regions?

1 1 We do this once again with the Data Analysis routine's option:

t-Test: Paired Two Sample for Means

The variable 1 range is A4:A20, the variable 2 range is B4:B20, labels was checked and the

output range is selected to be A32. Change the default alpha value from 0.05 to 0.01

The results should be the same as those in the following link. Study the comments carefully

for an explanation of the meaning of the calculated statistics. Inference 2

The calculated tStat of - 3.71 is so large that we have no difficulty in accepting either of the possible

alternative presumptions - viz - mean difference not equal to zero, mean difference less than zero.

That is we reject the presumption that the difference in the mean number of children is zero.

And accept (with 1-0.001 = 0.999 = 99.9%) confidence that the mean difference is less than zero.

Alternatively we can accept (with 1 - 0.002 =0.998 = 99.8%) confidence that the mean difference is

not equal to zero.

PROCEED TO SHEET 3 (The significance of linear regression Coefficients)

CONTENTS The significance of linear regression Coefficients

Look at the Regression Output below. It is the one that we obtained for the regression of sales on price and advertising expenditure in the Excel_4 file

The calculated coefficients in B31, B32 and B33 are Sample estimates of the unknown true Population

coefficients. This being the case we should make some inferences about their reliability as estimates.

To do this we presume that the true values of each of the coefficients are zero and we test each one

against the alternative presumption that they are not equal to zero.

Excel does this in the yellow area below and the meaning of each statistic is explained in the comments.

We should also note that in F26 under the heading Significance F, Excel has calculated the value to be

0.00000035307. This very small probability measures the chances that the true population value of

R squared is in fact zero. We conclude that it is very unlikely and so can be fairly certain that the

calculated R squared value of 0.93856 is significantly different from zero.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.97

R Square 0.94

Adjusted R Square 0.92

Standard Error 1030.04

Observations 12

ANOVA

df SS MS F Significance F

Regression 2 145881218.66 ### 68.75 0

Residual 9 9548781.34 1060975.7

Total 11 155430000

Intercept 19668.96 2728.32 7.21 0 13497.07 25840.86

ADVEX 0.53 0.05 10.18 0 0.41 0.65

PRICE -6.41 0.78 -8.17 0 -8.18 -4.63

PROCEED TO SHEET 4 (Chi squared statistic)

expenditure in the Excel_4 file

Using the CHI SQUARED Statistic to test the degree of association of a cross tabulation

Look at the cross tabulation reproduced below. It is the one we obtained in sheets 1&2 of the Excel_3 file

The question raised is whether on the basis of these observed frequencies we can make any inferences

about the degree of association between gender and verdict.

To do this requires that we have some idea of the frequencies that would be expected on a purely

random basis and then compare the actually observed frequencies with the expected ones.

To calculate the expected frequencies we argue as follows. 29/50 are Female, 24/50 fail so 29/50*24/50 =0.2784 of

the total of 50 = 13.92 would be expected to be Female and fail. Once we cancel the grand total (50) from top and

bottom we get: expected frequency = column total*row total/grand total =29*24/50 = 13.92.

To calculate all four expected frequencies in Excel we proceed as follows. First note we have named

the D24 cell as GT (grand total). Now in B26 enter:

=B$24*$D22/GT

and copy this cell down into B27 and then along into C22:C23. The expected frequencies will be calculated.

With the actual frequencies located in the B22:C23 range and the expected in the B26:C27 range we can now

use an Excel function known as =CHITEST to calculate the Chi Squared Statistic for these data.

The syntax for this function is =CHITEST(range of actual frequancies,range of expected frequencies)

So use B29 to contain: =CHITEST(B22:C23,B26:C27)

Count of VERDICT GENDER A figure of 0.963404 will be returned, but what does this tell us?

VERDICT F M Grand Total Well, it can be shown that if every observed frequency were equal to

FAIL 14 10 24 the expected frequency then chi squared would be calculated as zero.

PASS 15 11 26 In this case there is no association between the two variables - the actual

Grand Total 29 21 50 frequencies are just what would be expected by chance. However, as the

actual and the expected frequencies start to diverge then the calculated

13.92 10.08 value of chi squared increases and as it does so provides evidence of a degree of

15.08 10.92 CORRECT association between the variables. But how high does it have to become

before it becomes significant? The answer is to compute the significance

CHI SQUARED 0.96 CORRECT of the calculated chi squared statistic from Excel's =CHIDIST function.

The syntax is: =CHIDIST(Calculated Chi Squared Value,degrees of freedom)

SIGNIFICANCE 0.33 CORRECT Now note that in an R by C contingency (Pivot) table the Degrees of Freedom

are given by (R-1)*(C-1). So with 2 rows and two columns our table has

(2-1)*(2-1) = 1 degree of freedom. Now use B31 to contain:

=CHIDIST(B29,1)

A result of 0.326331 will be obtained and as with previous significance

tests we require a value of 0.05 or less to provide an acceptable level

CONTENTS Regression

Consider the data shown in the yellow highlighted area. Here, over 30 time periods, we have collected data

on the following variables:

QDS = the quantity of strawberries demanded PS = the price of strawberries

PR = the price of raspberries PP = the price of peaches

PIC = the price of ice cream PC = the price of cream

Use multiple regression to fit an equation of the form:

QDS = a + b*PS + c*PR + d*PP + e*PIC + f*PC

TIME QDS PS PR PP PIC PC

Period 1 5827.4 £1.40 £2.24 £2.12 £1.62 £1.56 Which of the calculated regression coefficients

Period 2 5727.4 £1.36 £1.56 £1.54 £2.03 £2.20 are significantly different from zero at the

Period 3 5842.1 £1.48 £2.29 £2.20 £1.58 £1.80 1% level of significance?

Period 4 5767.8 £1.50 £2.10 £1.70 £2.18 £1.92

Period 5 5765 £1.30 £2.04 £2.00 £2.10 £2.10 By inspecting the signs and magnitudes of the regression

Period 6 5793.2 £1.32 £1.96 £1.64 £1.49 £1.66 coefficients, which products can be said to be good

Period 7 5721.1 £1.38 £2.23 £1.46 £2.34 £1.86 substitutes for strawberries and which can be said to

Period 8 5719.5 £1.45 £1.43 £1.78 £2.12 £1.95 be good complements?

Period 9 5702.3 £1.45 £1.79 £1.62 £2.04 £1.66

Period 10 5794.9 £1.40 £1.77 £1.29 £1.49 £1.63 A suggested solution is contained in the following link

Period 11 5732.1 £1.52 £1.93 £1.87 £1.62 £1.88 Inference 3

Period 12 5774.3 £1.50 £2.37 £1.38 £1.94 £1.79

Period 13 5774.5 £1.48 £1.41 £1.93 £1.42 £1.95

Period 14 5805 £1.38 £2.40 £1.82 £1.88 £1.70

Period 15 5748.4 £1.38 £1.98 £1.28 £2.30 £1.48

Period 16 5763.3 £1.40 £2.39 £1.37 £1.95 £1.69

Period 17 5836.8 £1.49 £1.90 £1.88 £1.53 £1.36

Period 18 5726.3 £1.50 £1.85 £2.08 £1.82 £1.68

Period 19 5788.4 £1.53 £2.36 £1.59 £1.46 £1.79

Period 20 5733.3 £1.54 £1.41 £1.84 £1.86 £1.28

Period 21 5705 £1.49 £1.98 £1.63 £2.31 £1.82

Period 22 5800.8 £1.47 £1.96 £1.57 £1.92 £1.45

Period 23 5710.7 £1.40 £1.79 £1.92 £2.28 £1.87

Period 24 5825.5 £1.40 £2.09 £1.54 £1.50 £1.37

Period 25 5750.2 £1.35 £1.42 £2.18 £1.60 £1.41

Period 26 5753.1 £1.35 £1.87 £1.63 £1.91 £1.63

Period 27 5728.4 £1.40 £1.46 £1.96 £2.36 £1.50

Period 28 5795.3 £1.37 £2.17 £1.24 £1.69 £1.77

Period 29 5801.9 £1.32 £2.05 £1.52 £1.56 £1.98

Period 30 5755.4 £1.40 £2.26 £1.30 £2.32 £2.20

CONTENTS

0 2.4 t-Test: Paired Two Sample for Means

1 2.4

4 2.4 CHILDREN POP MEAN

1 2.4 Mean 1.5 2.4

2 2.4 Variance 1.87 0

1 2.4 Observations 16 16

0 2.4 Pearson Correlation 0

4 2.4 Hypothesized Mean Difference 0

3 2.4 df 15

2 2.4 t Stat -2.63

3 2.4 P(T<=t) one-tail 0.01

1 2.4 t Critical one-tail 1.75

0 2.4 P(T<=t) two-tail 0.02

1 2.4 t Critical two-tail 2.13

0 2.4

1 2.4 Sample Standard Deviation 1.37

Sample Standard Error 0.34

AVERAGE Mean difference -0.9

2.4 t Stat -2.63

CONTENTS

0 5

1 3 A CHILDREN B CHILDREN

4 4 Mean 1.5 2.94

1 2 Variance 1.87 1.13

2 4 Observations 16 16

1 3 Pearson Correlation 0.21

0 2 Hypothesized Mean Difference 0

4 4 df 15

3 3 t Stat -3.71

2 2 P(T<=t) one-tail 0

3 3 t Critical one-tail 2.6

1 2 P(T<=t) two-tail 0

0 3 t Critical two-tail 2.95

1 2

0 4

1 1

CONTENTS For a coefficient to be significant at the 1% level the figure in the P-value column

of the regression output must be less than or equal to 0.01. Those that meet

this requirement have been highlighted in green in the output below.

The positive coefficients for PR and PP indicate that they are substitutes for

strawberries - if PR or PP increases (decreases) the demand for strawberries

SUMMARY OUTPUT will increase (decrease). Since the coefficient of PR exceeds that for PP then

raspberries are a closer substitute than peaches.

Regression Statistics

Multiple R 0.84 The negative coefficients for PIC and PC indicate that they are complements for

R Square 0.7 strawberries - if PIC or PC increases (decreases) the demand for strawberries

Adjusted R Square 0.63 will decrease (increase). Since the coefficient of PIC exceeds that for PC then

Standard Error 24.55 ice cream is more closely tied to strawberry sales than cream.

Observations 30

Overall the regression is far from

ANOVA robust. Although the R squared is quite

df SS MS F Significance F high (69.7%) and significantly different

Regression 5 33309.52 6661.9 11.06 0 from zero, the fact that only two out of the

Residual 24 14461.14 602.55 five variable coefficients are significant

Total 29 47770.65 means that the model is unlikely to perform

well for prediction purposes.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% The highly significant intercept term

Intercept 5958.66 117.67 50.64 0 5715.81 6201.51 confirms this since it does not refer to

PS -95.95 68.72 -1.4 0.18 -237.79 45.88 an explanatory variable.

PR 63.89 15.69 4.07 0 31.52 96.27 Finally the fact that the coefficient

PP 12.37 17.87 0.69 0.5 -24.5 49.25 for the price of strawberries themselves

PIC -79.39 15.52 -5.12 0 -111.43 -47.36 is not significant should give us cause

PC -30.68 20.18 -1.52 0.14 -72.32 10.96 for concern.

Return to Regression

- AP Statistics Problems #15Enviado porldlewis
- posttes22Enviado porNabila Saribanun
- Applied Eco No Metrics With StataEnviado poraba2nb
- SPSSAnswer Key for ExercisesEnviado porJoymae Obejero Bayarcal
- Linear ModelsEnviado porCART11
- Wool Shrinkage PaperEnviado pordavidnjugunagithinji
- tabel Statistics.docxEnviado porOla Dwi Nanda
- A PREDICTIVE MODEL FOR OZONE UPLIFTING IN OBSTRUCTION PRONE ENVIRONMENTEnviado porIAEME Publication
- Ttest2008Enviado porkapil1248
- Overall DataEnviado porafif12
- Chapter 9 Correlation and RegressionEnviado porAnonymous mIBVSE
- ACADEMIC ACHIEVEMENT AMONG UPPER PRIMARY SCHOOL STUDENTS IN RELATION TO THEIR MENTAL PRESSURE AFTER KEDARNATH DISASTEREnviado porAnonymous CwJeBCAXp
- CMA Lecture 2 - Cost Behaviour, Cost Drivers and Cost Estimation (Student version)(4).pptxEnviado porMuhammad Omar
- Descriptive statistics.docxEnviado pormackie Delima
- PR BIOEnviado porSuci Desrianti
- PRACTICAL CONCEPT OF QCEnviado porCosmin Negrut
- Analytics.pdfEnviado porTimar Stay Focused Jackson
- RSM in CCDEnviado porSaranya Kannan
- Valuation of the External Cost Caused by the Environmental Pollution of Three Lakes in Northern GreeceEnviado porAnonymous kqqWjuCG9
- sail plEnviado porShantanu Vashishth
- 10 Inference for Regression Part2Enviado porRama Dulce
- DidOvaRijekaEnviado porFilip Rašković Rašo
- A Study on the Impact of Advertisements, Reference Group and Brand Perception in the Purchase Involvement of Customers in Chennai With Regard to Tvs TyresEnviado porSwaroop C Mathew
- Zhen Chpt IVEnviado porRasyidhakim
- Data Mining RegressionEnviado porpriyankarora
- Equation SheetEnviado porCristian Riosmata
- Fetkovich PAperEnviado porAnonymous moOnbb4wg
- 2b Multiple Linear RegressionEnviado porNamita Dey
- Rsh Qam11 Excel and Excel QM ExplsM2010Enviado porhlgonzalez
- Eco AssignmentEnviado porSweta Chandan

- Mca4020 Winter 2016 solved assignmentEnviado poraapki education
- Cover Page b.state Yang BaruEnviado porydean_nady
- True ExperimentEnviado porkeira niza
- NS06 Forecast MovingAEnviado porHong Hiro
- Discriminant Analysis ExampleEnviado porSteven Xu
- Elements of Statistics - Fergus Daly, Et AlEnviado pormach20_aardvark8064
- Ch. 10 Principal Components Analysis (PCA)Enviado porJosé António Pereira
- Stat ResearchEnviado porimanolkio
- CIV2037F Additional QuestionsEnviado porquikgold
- The Field of BiologyEnviado porJason D. Caande
- arens_aud16_inppt17.pptxEnviado porBella Fris
- CircStat a MATLAB Toolbox for Circular StatisticsEnviado porChristina Reyes
- Nguyên Lí Thông KêEnviado porLa Paloma
- 1983 - Bowman - Understanding and Conducting Event Studies - Journal of Business Finance & AccountingEnviado porJustinas Brazys
- ANOVA Model With One Qualitative VariableEnviado porsukma ali
- Li Xuemeng-Syllabus-Stats1-Fall2019.docxEnviado porPohuyist
- Measures of VariabilityEnviado pormathworld_0204
- HW3 MidtermEnviado porPi Ka Chu
- Ch.2 - STATA Code for WebsiteEnviado porVinicio Arcos
- Bayesian Course MainEnviado porWong Wai
- WHITE PAPER Multicollinearity in CSAT StudiesEnviado poranuj112358
- polynomial regression and step functionEnviado porapi-285777244
- Simple Linear EkekEnviado porkevinr_rowe23
- Fundamentals of Business Statistics - Hypothesis (1).pptxEnviado porSubhakant Das
- FS_04 Balut.xlsxEnviado porjarlo granada
- Jeffrey M Wooldridge Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data 2003Enviado porvanyta0201
- Cluster Analysis With SPSSEnviado porVignesh Anguraj
- ANOVA AssumptionsEnviado porBrijesh Trivedi
- wtermsEnviado porjasvindersinghsaggu
- Results of ANOVAEnviado poredgar_chie

## Muito mais do que documentos

Descubra tudo o que o Scribd tem a oferecer, incluindo livros e audiolivros de grandes editoras.

Cancele quando quiser.