Você está na página 1de 19

CASE 01: CASE 14 RISK AND RETURN (CHAPTER 10: SIMPLE LINEAR REGRESSION)

According to the Capital Asset Pricing Model (CAPM), the risk associated with a capital asset is
proportional to the slope 1 (or simply ) obtained by regressing the assets past returns with the
corresponding returns of the average portfolio called the market portfolio. (The return of the market
portfolio represents the return earned by the average investor. It is a weighted average of the returns from
all the assets in the market.) The larger the slope of an asset, the larger is the risk associated with that
asset. A of 1.00 represents average risk.
The returns from an electronics firms stock and the corresponding returns for the market portfolio for the
past 15 years are given below.
Market Return
(%)
16.02
12.17
11.48
17.62
20.01
14
13.22
17.79
15.46
8.09
11
18.52
14.05
8.79
11.6

Stocks Return
(%)
21.05
17.25
13.1
18.23
21.52
13.26
15.84
22.18
16.26
5.64
10.54
17.86
12.75
9.13
13.87

1. Carry out the regression and find the for the stock. What is the regression equation?
Independent variable (): Market Return
Dependent variable (): Stocks Return
Least-square estimator 0 , which estimates the intercept 0 of the model, is 1.090724
Least-square estimator 1 , which estimates the slope 1 of the model, is 1.166957
Regression equation: = 1.166957 1.090724

y = 1.167x - 1.0907
25

20

15

10

0
0

10

15

20

25

Simple Regression

CASE 14: Risk and Return

Stock's Return Market's Return


X
Y
Error
1
16.02
21.05
3.44607
2

12.17

17.25

4.13886

3
4
5
6
7
8
9
10
11
12
13
14
15

11.48
17.62
20.01
14
13.22
17.79
15.46
8.09
11
18.52
14.05
8.79
11.6

13.1
18.23
21.52
13.26
15.84
22.18
16.26
5.64
10.55
17.86
12.75
9.13
13.87

0.79406
-1.2411
-0.7401
-1.9867
1.50356
2.51056
-0.6904
-2.7099
-1.1958
-2.6613
-2.555
-0.0368
1.42403

Confidence Interval for Slope


1-a
(1-a) C.I. for b 1
95%

1.16696

+ or -

0.37405

Confidence Interval for Intercept


1-a
(1-a) C.I. for b 0
95%
-1.0907 + or - 5.38802
Prediction Interval for Y
1-a
X
(1-a) C.I. for Y given X
95%
10
10.5788 + or - 5.35692

r2
r
s(b 1 )

0.7775 Coefficient of Determination


0.8818 Coefficient of Correlation
0.17314 Standard Error of Slope

t 6.73986
p- value 0.0000
s(b 0 )

2.49403 Standard Error of Intercept

2.30593 Standard Error of prediction

Prediction Interval for E[Y|X]


1-a
X
(1-a) C.I. for E[Y | X ]
95%
10
10.5788 + or - 1.96969
ANOVA Table
Source
SS
Regn. 241.543
Error 69.125
Total 310.667

df
1
13
14

MS
F
F critical p-value
241.543 45.4257 4.66719 0.0000
5.3173

2. State your interpretation about the slope 1 of the model (Hint: Does the value of the slope indicate that

the stock has above-average risk? For the purposes of this case assume that the risk is average if the slope is
in the range 1 0.1, below average if it is less than 0.9, and above average if it is more than 1.1.)
Since the least-square estimator 1 , which estimates the slope 1 of the model, is 1.166957 (> 1.10), the
value of the slope indicate that the stock has above-average risk.
3. Give a 95% confidence interval for this . Can we say the risk is above average with 95% confidence?
Confidence Intervals for the Regression Parameters:
A (1 )100% confidence interval for 1 is: 1 (,2) (1 )
2

A 95% confidence interval for 1 is:


1 (0.025,152) (1 ) = 1.166957 (2.16)(0.17314) = [0.79291, 1.54101]
4. If the market portfolio return for the current year is 10%, what is the stocks return predicted by the
regression equation? Give a 95% confidence interval for this prediction.
If the market portfolio return for the current year is 10% ( = 10), the stocks return predicted by the
regression equation: = 1.166957 1.090724 = 1.166957(10) 1.090724 = 10.57884
Prediction Intervals
A (1 )100% prediction interval for is:
/2 1 +

1 ( )2
1 (10 13.988)2
+
= 10.57884 (2.16)(2.3059)1 +
+

15
177.3712
= [5.2219, 15.9358]

5. Construct a residual plot. Do the residuals appear random?


A Check for the Equality of Variance of the Errors
One of the assumptions in the regression model is the equality of variance of the errors. One of several ways
to test for the normality of the residuals is to use a residual plot of the residuals.
The residual plot is constructed as follows.

Residual Plot:

Error

Residual Plot
5
4
3
2
1
0
-1
-2
-3
-4
X

The residuals appear random.


A graph of the regression errors, the residuals, versus the independent variable X, will reveal whether the
variance of the errors is constant. The variance of the residuals is indicated by the width of the scatter plot
of the residuals as X increases. If the width of the scatter plot of the residuals either increases or decreases
as X increases, then the assumption of constant variance is not met. This problem is called
heteroscedasticity. When heteroscedasticity exists, we cannot use the ordinary least squares method for
estimating the regression and should use a more complex method, called generalized least squares. The
above figure shows a residual plot in a good regression, with no heteroscedasticity that the residuals appear
random.
6. Construct a normal probability plot. Do the residuals appear to be normally distributed?
The Normal Probability Plot
One of the assumptions in the regression model is that the errors are normally distributed. This assumption
is necessary for calculating prediction intervals and for hypothesis tests about the regression. One of several
ways to test for the normality of the residuals is to use a normal probability plot of the residuals.
The normal probability plot is constructed as follows.

Normal Probability Plot:

Normal Probability Plot of Residuals

The residuals appear to be normally distributed.


In this plot, the residual values are on the horizontal axis and the corresponding z values from the normal
distribution are on the vertical axis. If the residuals are normal, then they should align themselves along the
straight line that appears on the plot. To the extent the points deviate from this straight line, the residuals
deviate from a normal distribution. It is useful to recognize whether the assumption of normally distributed
errors holds on a normal probability plot. The above figure a case where the residuals are relatively normal,
but from the pattern of the points we can also infer that the distribution of the residuals is flatter than the
normal distribution.
7. (Optional) The risk-free rate of return is the rate associated with an investment that has no risk at all,
such as lending money to the government. Assume that for the current year the risk-free rate is 6%.
According to the CAPM, when the return from the market portfolio is equal to the risk-free rate, the
return from every asset must also be equal to the risk-free rate. In other words, if the market portfolio
return is 6%, then the stocks return should also be 6%. It implies that the regression line must pass
through the point (6, 6). Repeat the regression forcing this constraint. Comment on the risk based
on the new regression equation.
The Excel Solver Method for Regression
The Solver macro available in Excel can also be used to conduct a simple linear regression. The advantage
of using this method is that additional constraints can be imposed on the slope and the intercept. For
instance, if we want the intercept to be a particular value, or if we want to force the regression line to go
through a desired point, we can do that by imposing appropriate constraints.
As the given problem, consider a common type of regression carried out in the area of finance. The risk of
a stock (or any capital asset) is measured by regressing its returns against the market return (which is the
average return from all the assets in the market) during the same period. The Capital Asset Pricing Model
(CAPM) stipulates that when the market return equals the risk-free interest rate (such as the interest rate of
short-term Treasury bills), the stock will also return the same amount. In other words, if the market return
risk-free interest rate 6%, then the stocks return, according to the CAPM, will also be 6%. This means that
according to the CAPM, the regression line must pass through the point (6, 6). This can be imposed as a

constraint in the Solver method of regression. Note that forcing a regression line through the origin, (0, 0), is
the same as forcing the intercept to equal zero, and forcing the line through the point (0, 5) is the same as
forcing the intercept to equal 5. The criterion for the line of best fit by the Solver method is still the same as
beforeminimize the sum of squared errors (SSE).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

X
16.02
12.17
11.48
17.62
20.01
14
13.22
17.79
15.46
8.09
11
18.52
14.05
8.79
11.6

Y
21.05
17.25
13.1
18.23
21.52
13.26
15.84
22.18
16.26
5.64
10.55
17.86
12.75
9.13
13.87

SSE
69.1435
Error
3.4513
4.1079
0.7566
-1.2208
-0.6974
-2.0005
1.4824
2.5324
-0.6905
-2.7793
-1.2378
-2.6326
-2.5683
-0.0996
1.3877

CASE 14: Risk and Return


Intercept
Slope
b0
b1
-0.945353 1.157559

Prediction
X
6

Y
6

25

20

15

Regression Using the Solver

10

0
0

10

15

20

25

Without any constraint, the regression equation is = 1.166957 1.090724 (obtained from the
template for regular regression). For the market portfolio return of 6%, the predicted return of stock is
= 1.166957 1.090724 = 1.166957(6) 1.090724 = 5.911006
With the constraint, the regression equation changes as follows:
Least-square estimator 0 , which estimates the intercept 0 of the model, is 0.945353214
Least-square estimator 1 , which estimates the slope 1 of the model, is 1.157558869
Regression equation: = 1.157558869 0.945353214
Even though the risk of the regression model with the constraint (1 = 1.157558869) is lower than the
risk of the original regression model without any constraint (1 = 1.166957958), the value of the slope
still indicates that the stock has above-average risk.

CASE 02: SAIGON COOPMART


Logistics & Supply Chain plays an important role, if needed to say a critical factor for the success of Saigon
Coopmart. Most of supermarkets over the world follow the identical model in which a warehouse is placed
next to supermarket for stocks storage; and the size of warehouse is more or less equal to size of
supermarket. However, due to harsh competition, and weak finance, Saigon Coopmart decided to follow a
different model with very small size warehouse. This allows Saigon Coopmart to place more supermarkets;
but in exchange, stocks only enough for a day, or maximum two compared to ordinary model in which a
warehouse can store enough stocks for a week or more. As a consequence, Saigon Coopmart has to ship
much more frequency to its supermarkets than its competitors such as Big C.
Gaining trusted in customers over years, sale increased gradually. In late 2011, Logistics and Supply Chain
department received warning from some directors of Coopmart supermarkets (Saigon Coopmart has many
supermarkets, each supermarket is supervised by one director) that they suspected by the end of 2012, the
warehouse would no longer enough for a day sale. This means supermarket would not have enough
products to sell for customer. A logistics improvement project was conducted to solve the problem
temporary to spare time for BOD of Saigon Coopmart to come with a new and complete solution. One of
the sub-projects involved improving the unloading time (i.e. when trucks carrying products come to
supermarket, the products are then unloaded and moved to warehouse).
a. Indicator UWPM (Unloading Weight/minute) is used to measure the effectiveness of unloading product
management. From information provide (Excel data file Sheet Case 01 (a-b) Coopmart, what can you
tell about the unloading product management among four Coopmart supermarkets? (Important note: any
statistics test used HAVE TO comply with explanation/argument why using that statistics test).
ANOVA Test
The required assumptions of ANOVA:
1. We assume independent random sampling from each of the r populations.
2. We assume that the r populations under study are normally distributed, with means that may or may
not be equal, but with equal variances 2 .
The null and alternative hypotheses here are,
0 : = = =
1 :
ANOVA Table
ANOVA
UWPM
Sum of Squares

df

Mean Square

Between Groups

205089.155

68363.052

Within Groups

229702.540

476

482.568

Total

434791.695

479

F
141.665

Sig.
.000

As shown in the above table, the p-value is smaller than 0.05, we reject the null hypothesis. We may
conclude that, based on the testing results and our assumptions, it is likely that the four supermarkets

studied are not equal in terms of average UWPM. Which supermarkets are more effective than others?
This question will be answered when we return to the given problem in the next section.
The method we will discuss here is the Tukey method of pairwise comparisons of the population means.
The method is also called the HSD (honestly significant differences) test. This method allows us to
compare every possible pair of means by using a single level of significance, say = 0.05 (or a single
confidence coefficient, say, 1 0.05 = 0.95). The single level of significance applies to the entire set of
pairwise comparisons.
To compare the population mean vacationer responses for every pair of supermarkets, we use the following

set of hypothesis tests:


0 : =
1 :

0 : =
1 :

0 : =
1 :

0 : =
1 :

0 : =
1 :

0 : =
1 :

From these comparisons we determine that our data provide statistical evidence to conclude that is
different from ; is different from ; is different from ; is different from ;
and is different from . There are no other statistically significant differences at = 0.05.

b. Further investigation shows that measuring the effectiveness by mean value is not enough because there
might be a case in which two or more supermarkets having the same mean (weight/minute) but with
different variance. Then, the one with smaller variance turns out to be better. Construct the hypothesis
testing for two population variances matrix as follow:

From the result in that matrix and the result in question a, what is your conclusion?
The F Distribution and a Test for Equality of Two Population Variances
We assume independent random sampling from the four populations in question. We also assume that the
four populations are normally distributed. The possible hypotheses to be tested are the following:
Comparison between two population variance matrix
Cong Quynh
Cong Quynh

Dinh Tien Hoang

Ly Thuong Kiet

Hung Vuong

2
2
0 :
=
2
2
1 :

2
2
0 :
=
2
2
1 :

2
2
0 : =
2
2
1 :

2
2
0 :
=
2
2
1 :

2
2
0 : =
2
2
1 :

2
2
0 : =
2
2
1 :

Dinh Tien Hoang


Ly Thuong Kiet
Hung Vuong

(I) Coopmart

(J) Coopmart

Coopmart Cong Quynh

Coopmart Dinh Tien Hoang

1.93226

1.43485

.0003

Coopmart Ly Thuong Kiet

5.19545

1.43485

.0000

Coopmart Hung Vuong

1.02779

1.43485

.8814

Coopmart Cong Quynh

1.93226

1.43485

.0003

Coopmart Ly Thuong Kiet

2.64625

1.43485

.0000

Coopmart Hung Vuong

1.91024

1.43485

.0005

Coopmart Cong Quynh

5.19545

1.43485

.0000

Coopmart Dinh Tien Hoang

2.64625

1.43485

.0000

Coopmart Hung Vuong

5.05498

1.43485

.0000

Coopmart Cong Quynh

1.02779

1.43485

.8814

Coopmart Dinh Tien Hoang

1.91024

1.43485

.0005

Coopmart Ly Thuong Kiet

5.05498

1.43485

.0000

Coopmart Dinh Tien Hoang

Coopmart Ly Thuong Kiet

Coopmart Hung Vuong

Test Statistic

Critical

Sig.

2
From these comparisons we determine that our data provide statistical evidence to conclude that
is
2
2
2
2
2
2
2
different from ; is different from ; is different from ; is different from
;
2
2
and is different from . There are no other statistically significant differences at = 0.05.

c. (Data for question c is in sheet Case 01 (c) Coopmart) To improve the unloading products
management, indicator unloading weight per minute (UWPM) is selected. This means higher UWPM is
better. To improve UWPM, project manager need to know what are factors that affects to UWPM. A
sample of 240 times unloading products were recorded. It is suspected that UWPM has close relation to
two key factors. The first factor is the number of workers. The second factors is year of experience.
For the first factor, since different time the total weight unloading is different; hence an appropriate
indicator is total of worker involved/total weight (WIPW). For example, if 3,400kg of products need to
unload and the number of worker in the trial is 7, then WIPW is = 7/3,400 = 0.002051.
For the second factor, the average number of year experience of a group of workers (AvgYr) is used as an
indicator.
Construct a regression (Reg 1) in which UWPM is dependent variable, WIPW and AvgYr are independent
variables. What information that the project manager can withdraw from the regression (Reg 1) above.

Descriptive Statistics
Descriptive Statistics
Mean

Std. Deviation

UWPM

122.117

44.2590

240

WIPM

.003960

.0017123

240

AvgYr

3.6669

1.64448

240

The constructed multiple regression model in which UWPM is dependent variable and WIPW and AvgYr
are independent variables is given by
= 11.299 + 16,886.1851 + 11.9852 +
The estimated regression relationship is: = 11.299 + 16,886.1851 + 11.9852

F-Test
Is there a relationship between the dependent variable of UWPM and any of the explanatory,
independent variables, 1 and 2 , of WIPM and AvgYr suggested by the regression equation under
consideration?

A statistical hypothesis test for the existence of a linear relationship between and any of the 1 and 2 is:
0 : 1 = 2 = 0
1 : ( = 1,2)
ANOVAa
Model
1

Sum of Squares

df

Mean Square

Regression

278085.484

139042.742

Residual

190081.189

237

802.030

Total

468166.673

239

F
173.363

Sig.
.000b

a. Dependent Variable: UWPM


b. Predictors: (Constant), AvgYr, WIPM

As shown in the above table, since the p-value is small, we reject the null hypothesis that both slope
parameters 1 and 2 are zero, in favor of the alternative that the slope parameters are not both zero. There
is statistical evidence to conclude that, based on the testing results and our assumptions, a linear regression
relationship existing between UWPM and at least one of the independent variables, WIPM or AvgYr (or
both), proposed in the regression model.

Model Summary
Model Summaryb

Model
1

R
.771a

Adjusted R

Std. Error of the

Square

Estimate

R Square
.594

.591

28.3201

Durbin-Watson
1.978

a. Predictors: (Constant), AvgYr, WIPM


b. Dependent Variable: UWPM

In the above table, 2 = 0.594, which means that 59.4% of the variation in UWPM is explained by the
combination of the two independent variables, WIPM and AvgYr. Adjusted 2 is 0.591, which is very
close to the unadjusted measure. We conclude that the regression model fits the data very well since a high
percentage of the variation in UWPM is explained by WIPM and/or AvgYr

Coefficients
Hypothesis tests about individual regression slope parameters:
(1)
(2)

0 : 1 = 0
1 : 1 0
0 : 2 = 0
1 : 2 0

Model
1 (Constant)
WIPM
AvgYr

Unstandardized

Standardized

Collinearity

Coefficients

Coefficients

Statistics

B
11.299

Std. Error

Beta

Sig. Tolerance

VIF

6.319

1.788 .075

16886.185 1071.371

.653 15.761 .000

.997 1.003

.445 10.744 .000

.997 1.003

11.985

1.116

We start with the test for the significance of variable 1 as a prediction variable of WIPM. The hypothesis
test is 0 : 1 = 0 versus 1 : 1 0. As shown in the above table, since the p-value is small, we reject the
null hypothesis that the slope parameter 1 is zero. We therefore conclude that there is statistical evidence
that the slope of with respect to 1 , the population parameter 1 , is not zero. Variable of WIPM is shown
to have some explanatory power with respect to the dependent variable, UWPM.
The hypothesis test for 2 is 0 : 2 = 0 versus 1 : 2 0. This p-value, too, is small. We conclude that
2 of AvgYr is also an important variable in the regression equation.
Finally, we conclude that both independent variables, WIPM and AvgYr, have close relation to the
dependent variable, UWPM that positively affects UWPM. Both slope parameters, 1 and 2 , are positive,
which means that, everything else staying constant, the dependent variable of UWPM increases on average
as WIPM increases or AvgYr increases (or both).

Residual Plots

The above figure is a plot of the regression residuals against the dependent variable UWPM. As we
examine this figure carefully, we see that the spread of the residuals increases as UWPM increases. Thus,
the variance of the residuals is not constant. We have the situation called heteroscedasticitya violation of
the assumption of equal error variance.

The Normal Probability Plot

The above figure is the normal probability plot of the residuals. The residuals lie along and less deviate
from the diagonal lie in the plot, they less deviates from the normal distribution. In the figure, the deviations
appear to be significant, so we conclude that the model assumption that the population errors are
normally distributed with mean zero and standard deviation is valid.
Multicollinearity
Correlation
Correlations
UWPM
Pearson Correlation

Sig. (1-tailed)

WIPM

AvgYr

UWPM

1.000

.629

.410

WIPM

.629

1.000

-.053

AvgYr

.410

-.053

1.000

.000

.000

WIPM

.000

.205

AvgYr

.000

.205

UWPM

240

240

240

WIPM

240

240

240

AvgYr

240

240

240

UWPM

In the correlation matrix shown in the above figure, we see that the correlation between the independent
variables, WIPM and AvgYr, are not high (0.053). This means that the two variables do not represent the
same direction in space. Being lowly correlated with each other, the two variables do not contain the same

information about the dependent variable and therefore not cause multicollinearity when both are in the
regression equation.
Variance inflation factor
Collinearity
Statistics
Model

Tolerance

VIF

1 (Constant)
WIPM

.997 1.003

AvgYr

.997 1.003

The above figure shows the output for the current regression problem which contains the VIF values in the
last column. We note that the VIF for variables, WIPM and AvgYr, are not greater than 5 that does not
indicate the degree of multicollinearity existing with respect to the independent variables.
CASE 03: TON DUC THANG UNIVERSITY CONTINUOUS IMPROVEMENT IN
EDUCATION PROGRAM
Continuous improvement in education program is always one of the top strategic priority of Ton Duc
Thang University. Every period, TDT University always applies the new teaching methods for continuously
improving education programs. Recently, there is a suspect that the students perform better in the
experiment classes (the classes are applied the new teaching method) compared to the control classes (the
classes are applied the old teaching method).
a. Present the methodology on how much test that suspect (what is your argument and what is an
appropriate Statistics tests and why);
b. How do you conduct sample for Statistics test;
c. Present the result of your Statistics test;
d. What is your conclusion from Statistics test?.
Data
Experiment Class

Control Class

Experiment Class

Control Class

Students

Test 1

Test 2

Test 1

Test 2

Students

Test 1

Test 2

Test 1

Test 2

63

84

88

71

31

62

82

83

91

71

89

59

91

32

77

77

80

63

87

70

85

79

33

87

69

94

79

66

73

64

79

34

63

76

53

92

63

74

95

91

35

73

95

70

58

70

97

92

89

36

90

72

90

86

63

89

71

85

37

84

75

57

74

84

80

58

90

38

64

77

82

82

84

86

62

69

39

85

98

65

66

10

63

77

93

76

40

86

78

54

83

11

62

74

80

80

41

66

86

68

74

12

68

75

89

85

42

83

70

73

66

13

84

98

65

93

43

61

90

71

97

14

90

75

67

61

44

81

70

86

93

15

86

96

76

84

45

60

85

90

80

16

69

76

81

77

46

72

83

80

75

17

87

89

85

75

47

60

90

55

70

18

60

74

85

83

48

87

68

81

96

19

64

81

87

68

49

65

78

94

82

20

67

86

86

91

50

71

81

95

78

21

64

72

86

92

51

74

69

60

63

22

86

69

85

97

52

60

78

90

98

23

88

94

77

60

53

90

85

66

61

24

67

89

85

61

54

68

84

74

90

25

66

73

90

84

55

67

90

83

74

26

83

80

72

93

56

77

95

77

77

27

89

94

70

92

57

79

67

53

93

28

68

66

60

79

58

64

82

80

98

29

81

87

60

67

59

90

92

67

61

30

71

76

74

63

60

67

90

60

95

Students

Test 1

Test 2

Test 1

Test 2

Students

Test 1

Test 2

Test 1

Test 2

61

84

93

75

89

61

84

93

75

89

62

90

87

85

61

62

90

87

85

61

63

63

90

78

74

63

63

90

78

74

64

69

72

54

71

64

69

72

54

71

65

81

66

73

97

65

81

66

73

97

66

67

80

71

93

66

67

80

71

93

67

63

74

92

61

67

63

74

92

61

68

62

78

88

89

68

62

78

88

89

69

90

74

78

69

69

90

74

78

69

70

86

80

57

72

70

86

80

57

72

71

83

96

82

73

71

83

96

82

73

72

80

72

92

59

72

80

72

92

59

73

89

89

61

84

73

89

89

61

84

74

69

69

81

70

74

69

69

81

70

75

64

72

73

86

75

64

72

73

86

76

90

83

85

61

76

90

83

85

61

77

77

85

53

95

77

77

85

53

95

Experiment Class

Control Class

Experiment Class

Control Class

78

86

75

54

93

78

86

75

54

93

79

60

75

85

92

79

60

75

85

92

80

84

94

78

84

80

84

94

78

84

81

66

77

73

70

81

66

77

73

70

82

85

71

91

65

82

85

71

91

65

83

86

86

91

98

83

86

86

91

98

84

83

71

72

90

84

83

71

72

90

85

75

87

67

91

85

75

87

67

91

86

67

87

77

98

86

67

87

77

98

87

88

70

94

98

87

88

70

94

98

88

65

80

68

62

88

65

80

68

62

89

82

74

80

90

89

82

74

80

90

90

89

66

94

84

90

89

66

94

84

Descriptive Statistics
Descriptive Statistics
N

Mean

Std. Deviation

Test 1 (Experiment Class)

120

75.8000

10.01981

Test 2 (Experiment Class)

120

81.6833

8.78882

Test 1 (Control Class)

120

75.6167

12.51767

Test 2 (Control Class)

120

79.1833

11.91777

Valid N (listwise)

120

Pair Samples Statistics


Test 1

Test 2

Test 1

Test 2

Students

Ex. - Co.

Ex. - Co.

Students

Ex. - Co.

Ex. - Co.

-25

13

31

-21

-9

12

-2

32

-3

14

-9

33

-7

-10

-6

34

10

-16

-32

-17

35

37

-22

36

-14

-8

37

27

26

-10

38

-18

-5

22

17

39

20

32

10

-30

40

32

-5

11

-18

-6

41

-2

12

12

-21

-10

42

10

13

19

43

-10

-7

14

23

14

44

-5

-23

15

10

12

45

-30

16

-12

-1

46

-8

17

14

47

20

18

-25

-9

48

-28

19

-23

13

49

-29

-4

20

-19

-5

50

-24

21

-22

-20

51

14

22

-28

52

-30

-20

23

11

34

53

24

24

24

-18

28

54

-6

-6

25

-24

-11

55

-16

16

26

11

-13

56

18

27

19

57

26

-26

28

-13

58

-16

-16

29

21

20

59

23

31

30

-3

13

60

-5

Test 1

Test 2

Test 1

Test 2

Students

Ex. - Co.

Ex. - Co.

Students

Ex. - Co.

Ex. - Co.

61

91

16

62

26

92

63

-15

16

93

19

12

64

15

94

25

-7

65

-31

95

-6

-13

66

-4

-13

96

-12

22

67

-29

13

97

32

68

-26

-11

98

-4

69

12

99

15

14

70

29

100

13

71

23

101

14

-11

72

-12

13

102

-19

73

28

103

10

74

-12

-1

104

-1

22

75

-9

-14

105

-24

17

76

22

106

16

77

24

-10

107

-6

22

78

32

-18

108

30

17

79

-25

-17

109

32

14

80

10

110

-5

22

81

-7

111

-6

82

-6

112

-28

83

-5

-12

113

29

84

11

-19

114

-7

85

-4

115

16

17

86

-10

-11

116

29

15

87

-6

-28

117

-10

-4

88

-3

18

118

-4

89

-16

119

-9

90

-5

-18

120

-3

30

Descriptive Statistics
N

Mean

Std. Deviation

Test 1 (Ex.-Co.)

120

.1833

16.85728

Test 2 (Ex.-Co.)

120

2.5000

15.53797

Valid N (listwise)

120

For each test (Test 1 and Test 2), the hypothesis test involves two populations: the population of students
who study in the experiment class and the population of students who study in the control class. We want to
test the null hypothesis that the mean test score in both populations is equal versus the alternative
hypothesis that the mean for the experiment-class students is greater. Using the same students for the tests
and pairing their observations in an experiment-and-control (Ex.-Co.) way makes the test more precise than it
would be without pairing.
Under these circumstances, it is easy to see that the variable in which we are interested is the difference
between the test score of the students who study in the experiment class and that of the students who study
in the control class. The population parameter about which we want to draw an inference is the mean
difference between the two populations.
For Test 1, we denote the population parameter by . 1 , the mean difference. This parameter is
defined as . 1 = . 1 . 1 , where . 1 is the average test-1 score of the students
who study in the experiment class and . 1 is the average test-1 score of the students who study in the
control class. Our null and alternative hypotheses are, then,
0 : . 1 0
1 : . 1 > 0
For Test 2, we denote the population parameter by . 2 , the mean difference. This parameter is
defined as . 2 = . 2 . 2 , where . 2 is the average test-2 score of the students
who study in the experiment class and . 2 is the average test-2 score of the students who study in the
control class. Our null and alternative hypotheses are, then,
0 : . 2 0
1 : . 2 > 0
The only assumption we make when we use this test is that the populations of differences are normally

distributed.

Paired Samples Test


Paired Differences
95% Confidence

Mean
Pair 1

Test 1 (Experiment Class)


- Test 1 (Control Class)

Pair 2

Test 2 (Experiment Class)


- Test 2 (Control Class)

Std.

Interval

Std.

Error

of the Difference

Deviation

Mean

Lower

.18333 16.85728 1.53885

Upper
-

df

Sig.

(2-tailed) (R-tailed)

.119 119

.905

.453

2.50000 15.53797 1.41842 -.30861 5.30861 1.763 119

.081

.040

2.86375

3.23041

Sig.

As shown in the above table, for Test 1 (Pair 1), since the p-value is greater than levels of even larger than
0.10, we conclude that the test-1 scores of the students who study in the experiment class is not higher than
that of the students who in the control class.
However, for Test 2 (Pair 2), since the p-value is smaller than level of 0.05, we conclude that the test-2
scores of the students who study in the experiment class is higher than that of the students who in the
control class, but the testing result is not strongly significant that may change at different levels of .

------ THE END ------

Você também pode gostar