Você está na página 1de 10

Math 360 Chapter 11 Statistical Inference Using Two Samples

Dr. Islam

In this chapter, we will compare two population proportions and means using two independent samples. Section 11.1: Two Sample proportions: Example 11.1.1 (Text Book Example 10.6, 10.7, 10.8, page 405-408): A new product is advertised using two different methods in two different areas. After spending an equal amount of money on advertising, the company wants to know the awareness of the product among customers in two areas. In one area, 631 customers out of 1000 randomly selected customers are aware about the product, whereas in other area, 798 customers out of 1000 randomly selected customers are aware of the product. We want to answer the following questions: a) Find a point estimate of the difference in the proportions of customer awareness about the product in two different areas. b) Find a 95% confidence interval for the difference in the proportion of customer awareness in two different areas. c) Test 0 : 1 = 2 against : 1 < 2 at 5% level of significance and interpret the result. d) Test 0 : 1 = 2 against : 1 2 at 5% level of significance and interpret the result. Methods: Two sample proportions Given two independent SRSs of size 1 and 2 from two separate populations with unknown proportions 1 and 2 , respectively, we wish to compare two population proportions. The point estimates of 1 and 2 are given by 1 = 2 = 1 1 2 2

where 1 and 2 are the number of successes in two samples. A point estimate of 1 2 is 1 2. The mean of 1 2 is 1 2 = 1 2 The standard deviation of 1 2 is 1 2 =
1 (11 ) 1

2 (12 ) 2

Math 360 -

Dr. Islam

When the samples are large, the sampling distribution of 1 2 is approximately normal distribution (1 2 ,
1 (11 ) 1

2 (12 ) 2

).

If 1 = 2 = , then the standard deviation 1 2 of is 1 2 = (1 ) ( + ),


1 2

which we need for Test of Hypothesis about two proportions. The 100(1 )% large-sample confidence interval for 1 2 is then given by

1 (1 1 ) 2 (1 2 ) 1 2 /2 + 1 2

where /2 is the critical value for the standard normal curve with area 1 between /2 and /2 . The test statistic to test 0 : 1 = 2 against either of the following alternative hypotheses: : 1 > 2 : 1 < 2 : 1 2

Math 360 is given by =


+
1

Dr. Islam

1 2 (1 ) ( 1 1 1 + 2 )

where = 1 +2 =
2

The p-value for testing 0 : 1 = 2 is computed by: = ( > ) : 1 > 2 ( ) = ( < ) : 1 < 2 ( ) = 2( ||) : 1 2 ( ) where is the test statistic and is the observed value of the test statistic given the sample. Now, let us go back to the example 10.1.1 Example 11.1.1: A new product is advertised using two different methods in two different areas. After a certain amount of advertising, the company wants to know the awareness of the product among customers in two advertised area. In one area, 631 customers out of 1000 randomly selected customers are aware about the product, whereas in other area, 798 customers out of 1000 randomly selected customers are aware of the product. a) Find a point estimate of the difference in the proportions of customer awareness about the product in two different areas. b) Find a 95% confidence interval for the difference in the proportion of customer awareness in two different areas. c) Test 0 : 1 = 2 against : 1 < 2 at 5% level of significance. d) Test 0 : 1 = 2 against : 1 2 at 5% level of significance. Solution: a) 1 2 = 1000 1000 = 0.631 0.798 = 0.167 b) 95% CI for the difference in the proportion is 1 (1 1 ) 2 (1 2 ) (1 2 0.025 + ) 1 2
631 798

Math 360 0.631(1 0.631) 0.798(1 0.798) (0.167 1.96 + ) 1000 1000 (0.167 1.96 0.0199) = (0.167 0.039) = (0.206, 0.128) c) The test statistic is = 1 2 (1 ) ( 1 1 1 + 2 )

Dr. Islam

where =

1 +2 1 +2

631+798 1000+1000

= 0.715, 1 2 = 0.167 =
0.167
1 1 0.715(10.715)( + ) 1000 1000

Then, the value of =

1 2
1 1 (1 )( + ) 1 2

0.167 0.02

= 8.35

= ( < 8.35) = 0.0000 Since p-value < = 0.05, we have strong evidence that 0 : 1 = 2 will be rejected in favor of : 1 < 2 . d) When : 1 2 , the p-value is = 2 ( > ||) = 2 ( > 8.35) = 0.00000. Since p-value < = 0.05, we have strong evidence that 0 : 1 = 2 will be rejected in favor of : 1 2 . Example 11.1.2: It is believed that 43% Americans buy dietary supplements with calcium. A researcher claims that the percent of Americans who buy dietary supplements in two large cities differ. In one city, 63 Americans out of 150 randomly selected Americans buy dietary supplements with calcium, whereas in the other city 67 out of 170 randomly selected Americans buy dietary supplements with calcium. a) Find a point estimate of the difference in the proportions of Americans buying dietary supplements with calcium in two cities. b) Find a 90% CI estimate of the difference in the proportions of Americans buying dietary supplements with calcium in two cities. c) Test 0 : 1 = 2 against : 1 2 at 5% level of significance.

Math 360 Solution: a) 1 2 = 150 170 = 0.42 0.39 = 0.03 b) 90% CI for the difference in the proportion is 1 (1 1 ) 2 (1 2 ) (1 2 0.05 + ) 1 2 (0.03 1.645 0.42(1 0.42) 0.39(1 0.39) + ) 150 170
63 67

Dr. Islam

(0.03 1.645 0.055) = (0.03 0.0905) = (0.0605,0.1205) c) The test statistic is = 1 2 (1 ) ( 1 1 + 1 2 )

where = 1 +2 = 0.41, 1 2 = 0.03


1 2

Then, the value of =

1 2
1 1 (1 )( + ) 1 2

0.03
1 1 0.41(10.41)( + ) 150 170

= 0.055 = 0.55

0.03

= 2( > 0.55) = 2 0.2912 = 0.5824 Since p-value > = 0.05, we have evidence on the basis of the sample data that 0 : 1 = 2 will be accepted against : 1 2 . Example 11.1.3: From an SRS of 2253 adult men aged between 19 and 25 years it is observed that 986 men are still living home with their parents. Another SRS of 2629 adult women in this age group reveals that 923 women are still living home with their parents. a) Estimate the proportions living at home with the parents in the population of adult men and women. b) Find a 95% confidence interval for 1 2 , the difference in the proportions of young men and young women living at home in the population of young adult men and women. c) Test 0 : 1 = 2 against : 1 > 2 at 5% level of significance and interpret your result.

Math 360 Solution: a) 1 2 = 2253 2629 = 0.438 0.351 = 0.087 b) 95% CI for the difference in the proportion is 1 (1 1 ) 2 (1 2 ) (1 2 0.025 + ) 1 2 (0.087 1.96 0.438(1 0.438) 0.351(1 0.351) + ) 2253 2629
986 923

Dr. Islam

(0.087 1.96 0.014) = (0.087 0.027) = (0.06,0.114) c) The test statistic is = 1 2 (1 ) ( 1 1 + 1 2 )

where = 1 +2 = 2253+2629 = 0.39, 1 2 = 0.087


1 2

986+923

Then, the value of =

1 2
1 1 (1 )( + ) 1 2

0.087
1 1 0.39(10.39)( + ) 2253 2629

= 0.014 = 6.21

0.087

= ( > 6.21) = 0.0000 Reject 0 : 1 = 2 since p-value < = 0.05. Interpretation: Young men seem to stay home more with parents than those of young women at 5% level of significance. Section 11.2: Compare two population means, Given two independent samples, one of size 1 and other of size 2 from two normal populations with unknown mean 1 and 2 respectively. We want to estimate the mean difference in two populations, namely, 1 2 . The point estimate of 1 2 is 1 2 . The mean of 1 2 is 1 2 = 1 2

Math 360
2
1

Dr. Islam
2
2

The standard deviation of 1 2 is 1 2 = 1 + 2


1
1

2 2 The standard deviation of 1 2 is 1 2 = 2 ( + ), when 1 = 2 = 2 , say.


2

2 2 A pooled estimator of 2 , when 1 = 2 = 2 is given by 2 =


2 +( 1)2 (1 1)1 2 2

1 +2 2

The test statistic to test 0 : 1 = 2 when the common variance 2 is unknown is given by: =
( 1 2 ) (1 2 )
2( + ) 1 2 1 1

, which has a T-distribution with 1 + 2 2 degrees of freedom.

The p-value for testing 0 : 1 = 2 is computed by: = ( > ) : 1 > 2 = ( < ) : 1 < 2 = 2( ||) : 1 2 where is the test statistic distributed as (1 + 2 2) and is the observed value of the test statistic. The computation of p-value is very easy with any technology. However, since we have to deal with the degrees of freedom for a T-distribution, it is not possible to compute the p-value manually. We will use the critical value approach to accept and reject the null hypothesis when dealing with a T-test.

Example 11.2.1: A research physician believes that the mean age of diagnosis of diabetics among male is higher than the mean age of diagnosis of diabetics among female patients. From two independent samples of size 35 and 40 respectively from male and female diabetic patients, the researchers found the mean age of diabetics diagnosis to be 44 and 41 years and standard deviations to be 10.5 and 8.5 respectively. Assume that 1 = 2 . a) Set up null and alternative hypothesis for this test and classify this test as left-tailed, right-tailed or two-tailed. b) Which form of T-test will you use for this test? c) Compute the value of the test statistic. 7

Math 360

Dr. Islam

d) What is your conclusion regarding acceptance or rejection of the null hypothesis at 5% level of significance? Solution: a) The null hypothesis is 0 : 1 = 2 and the alternative hypothesis is : 1 > 2 The test is right tailed. b) The test statistic is =
( 1 2 ) (1 2 )
2( + ) 1 2 1 1

2 =

2 2 (1 1)1 + (2 1)2 1 + 2 2

(35 1)10.5^2 + (40 1)8.5^2 = 89.95 35 + 40 21 (44 41) 3 = = = 1.36 2.2 1 1 89.95 ( + ) 35 40

( = 35 + 40 2, = 0.05) =1.67 Decision: Since the observed value of t<critical value, we accept the null hypothesis.

Practice Problems 11.1:[about two sample proportions]. A study was conducted to determine the possible differences in the proportion of female and male students who succeeded in the course. In a SRS of 34 female students 23 succeeded in the course, while in a SRS of 91 male students 58 succeed in the course. a) Estimate the difference in the proportions of male and female students succeeded in the course. b) Find an estimate of the standard deviation for your estimate in (a). c) Find a 95% confidence interval for 1 2 , the difference in the proportions of male and female students succeeded in the course. d) Test 0 : 1 = 2 against : 1 2 at 5% level of significance and interpret your result. 8

Math 360

Dr. Islam

11.2: [about two sample means].Two independent samples of the number of years of education of 38 police officers from City A and 30 police officers from City B are taken. The average year of education for the sample from City A is 15 years with a standard deviation of 2 years. The average year of education for the sample from City B is 14 years with a standard deviation of 2.5 years. A researcher believes that mean years of education is higher for city A than that of city B. Assume that 1 = 2 = , but is unknown and Test for the researchers claim at 5% level of significance.

11.3:[about two sample means].Two independent random samples of released prisoners in the fraud and firearms offense categories yielded the following information on time served, in months, which are approximately normally distributed.
Fraud 11.8 16.6 3.6 17.9 5.3 5.9 10.7 17.9 7 8.5 13.9

Firearms 25.5 10.4 18.4 19.6 20.9 23.8

21.9 13.3 16.1

You can use the fact that 1 =10.1, 1 =4.9; 2 =18.8, 2 =4.6 At the 5% significance level, do the data provide sufficient evidence to conclude that the mean time served for fraud is less than that for firearms offenses? In other words, test H 0 : 1 2 versus H a : 1 2 . Assume that the variances of two distributions are unknown and equal. 11.4:[about two sample means]. A psychologist has developed a set of activities to help students develop better reading skills. In a study of the effectiveness of these activities, one class of second grade children learns with the activities. Another class of second grade children serves as the control, and learned without the activities. After some period of time, the reading skills of all of these children were assessed. A summary of these data is:

The psychologist suspects that children who learn with activities have higher mean reading skill test scores than children that don't learn with activities. a) Set up appropriate null and alternative hypotheses for this test. b) Write out the test statistic and the observed value. 9

Math 360

Dr. Islam

c) What is your conclusion regarding acceptance or rejection of the test at 5% level of significance?

11.5:[about two sample means]. Is there a difference in the amount of airborne bacteria between carpeted and uncarpeted rooms? In order to answer this question, a researcher considers two SRSs, each of size 5, one from carpeted room and other from uncarpeted room. The rooms are similar in size and function. After a suitable period of time, the concentration of bacteria in the air was measured (in units of bacteria per cubic foot) in all of these rooms. The data and summaries are provided:

x
Carpeted Rooms: Uncarpeted Rooms: 184 172

s 27.0 17.9

The researcher wants to test whether carpet makes a difference (either increases or decreases) in the mean bacterial concentration in air. a) Set up appropriate null and alternative hypotheses for this test. b) Write out the test statistic and the observed value. c) What is your conclusion regarding acceptance or rejection of the test at 5% level of significance?

10

Você também pode gostar