Escolar Documentos
Profissional Documentos
Cultura Documentos
According to the Capital Asset Pricing Model (CAPM), the risk associated with a capital asset is
proportional to the slope 1 (or simply ) obtained by regressing the assets past returns with the
corresponding returns of the average portfolio called the market portfolio. (The return of the market
portfolio represents the return earned by the average investor. It is a weighted average of the returns from
all the assets in the market.) The larger the slope of an asset, the larger is the risk associated with that
asset. A of 1.00 represents average risk.
The returns from an electronics firms stock and the corresponding returns for the market portfolio for the
past 15 years are given below.
Market Return
(%)
16.02
12.17
11.48
17.62
20.01
14
13.22
17.79
15.46
8.09
11
18.52
14.05
8.79
11.6
Stocks Return
(%)
21.05
17.25
13.1
18.23
21.52
13.26
15.84
22.18
16.26
5.64
10.54
17.86
12.75
9.13
13.87
1. Carry out the regression and find the for the stock. What is the regression equation?
Independent variable (): Market Return
Dependent variable (): Stocks Return
Least-square estimator 0 , which estimates the intercept 0 of the model, is 1.090724
Least-square estimator 1 , which estimates the slope 1 of the model, is 1.166957
Regression equation: = 1.166957 1.090724
y = 1.167x - 1.0907
25
20
15
10
0
0
10
15
20
25
Simple Regression
12.17
17.25
4.13886
3
4
5
6
7
8
9
10
11
12
13
14
15
11.48
17.62
20.01
14
13.22
17.79
15.46
8.09
11
18.52
14.05
8.79
11.6
13.1
18.23
21.52
13.26
15.84
22.18
16.26
5.64
10.55
17.86
12.75
9.13
13.87
0.79406
-1.2411
-0.7401
-1.9867
1.50356
2.51056
-0.6904
-2.7099
-1.1958
-2.6613
-2.555
-0.0368
1.42403
1.16696
+ or -
0.37405
r2
r
s(b 1 )
t 6.73986
p- value 0.0000
s(b 0 )
df
1
13
14
MS
F
F critical p-value
241.543 45.4257 4.66719 0.0000
5.3173
2. State your interpretation about the slope 1 of the model (Hint: Does the value of the slope indicate that
the stock has above-average risk? For the purposes of this case assume that the risk is average if the slope is
in the range 1 0.1, below average if it is less than 0.9, and above average if it is more than 1.1.)
Since the least-square estimator 1 , which estimates the slope 1 of the model, is 1.166957 (> 1.10), the
value of the slope indicate that the stock has above-average risk.
3. Give a 95% confidence interval for this . Can we say the risk is above average with 95% confidence?
Confidence Intervals for the Regression Parameters:
A (1 )100% confidence interval for 1 is: 1 (,2) (1 )
2
1 ( )2
1 (10 13.988)2
+
= 10.57884 (2.16)(2.3059)1 +
+
15
177.3712
= [5.2219, 15.9358]
Residual Plot:
Error
Residual Plot
5
4
3
2
1
0
-1
-2
-3
-4
X
constraint in the Solver method of regression. Note that forcing a regression line through the origin, (0, 0), is
the same as forcing the intercept to equal zero, and forcing the line through the point (0, 5) is the same as
forcing the intercept to equal 5. The criterion for the line of best fit by the Solver method is still the same as
beforeminimize the sum of squared errors (SSE).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
X
16.02
12.17
11.48
17.62
20.01
14
13.22
17.79
15.46
8.09
11
18.52
14.05
8.79
11.6
Y
21.05
17.25
13.1
18.23
21.52
13.26
15.84
22.18
16.26
5.64
10.55
17.86
12.75
9.13
13.87
SSE
69.1435
Error
3.4513
4.1079
0.7566
-1.2208
-0.6974
-2.0005
1.4824
2.5324
-0.6905
-2.7793
-1.2378
-2.6326
-2.5683
-0.0996
1.3877
Prediction
X
6
Y
6
25
20
15
10
0
0
10
15
20
25
Without any constraint, the regression equation is = 1.166957 1.090724 (obtained from the
template for regular regression). For the market portfolio return of 6%, the predicted return of stock is
= 1.166957 1.090724 = 1.166957(6) 1.090724 = 5.911006
With the constraint, the regression equation changes as follows:
Least-square estimator 0 , which estimates the intercept 0 of the model, is 0.945353214
Least-square estimator 1 , which estimates the slope 1 of the model, is 1.157558869
Regression equation: = 1.157558869 0.945353214
Even though the risk of the regression model with the constraint (1 = 1.157558869) is lower than the
risk of the original regression model without any constraint (1 = 1.166957958), the value of the slope
still indicates that the stock has above-average risk.
df
Mean Square
Between Groups
205089.155
68363.052
Within Groups
229702.540
476
482.568
Total
434791.695
479
F
141.665
Sig.
.000
As shown in the above table, the p-value is smaller than 0.05, we reject the null hypothesis. We may
conclude that, based on the testing results and our assumptions, it is likely that the four supermarkets
studied are not equal in terms of average UWPM. Which supermarkets are more effective than others?
This question will be answered when we return to the given problem in the next section.
The method we will discuss here is the Tukey method of pairwise comparisons of the population means.
The method is also called the HSD (honestly significant differences) test. This method allows us to
compare every possible pair of means by using a single level of significance, say = 0.05 (or a single
confidence coefficient, say, 1 0.05 = 0.95). The single level of significance applies to the entire set of
pairwise comparisons.
To compare the population mean vacationer responses for every pair of supermarkets, we use the following
0 : =
1 :
0 : =
1 :
0 : =
1 :
0 : =
1 :
0 : =
1 :
From these comparisons we determine that our data provide statistical evidence to conclude that is
different from ; is different from ; is different from ; is different from ;
and is different from . There are no other statistically significant differences at = 0.05.
b. Further investigation shows that measuring the effectiveness by mean value is not enough because there
might be a case in which two or more supermarkets having the same mean (weight/minute) but with
different variance. Then, the one with smaller variance turns out to be better. Construct the hypothesis
testing for two population variances matrix as follow:
From the result in that matrix and the result in question a, what is your conclusion?
The F Distribution and a Test for Equality of Two Population Variances
We assume independent random sampling from the four populations in question. We also assume that the
four populations are normally distributed. The possible hypotheses to be tested are the following:
Comparison between two population variance matrix
Cong Quynh
Cong Quynh
Ly Thuong Kiet
Hung Vuong
2
2
0 :
=
2
2
1 :
2
2
0 :
=
2
2
1 :
2
2
0 : =
2
2
1 :
2
2
0 :
=
2
2
1 :
2
2
0 : =
2
2
1 :
2
2
0 : =
2
2
1 :
(I) Coopmart
(J) Coopmart
1.93226
1.43485
.0003
5.19545
1.43485
.0000
1.02779
1.43485
.8814
1.93226
1.43485
.0003
2.64625
1.43485
.0000
1.91024
1.43485
.0005
5.19545
1.43485
.0000
2.64625
1.43485
.0000
5.05498
1.43485
.0000
1.02779
1.43485
.8814
1.91024
1.43485
.0005
5.05498
1.43485
.0000
Test Statistic
Critical
Sig.
2
From these comparisons we determine that our data provide statistical evidence to conclude that
is
2
2
2
2
2
2
2
different from ; is different from ; is different from ; is different from
;
2
2
and is different from . There are no other statistically significant differences at = 0.05.
c. (Data for question c is in sheet Case 01 (c) Coopmart) To improve the unloading products
management, indicator unloading weight per minute (UWPM) is selected. This means higher UWPM is
better. To improve UWPM, project manager need to know what are factors that affects to UWPM. A
sample of 240 times unloading products were recorded. It is suspected that UWPM has close relation to
two key factors. The first factor is the number of workers. The second factors is year of experience.
For the first factor, since different time the total weight unloading is different; hence an appropriate
indicator is total of worker involved/total weight (WIPW). For example, if 3,400kg of products need to
unload and the number of worker in the trial is 7, then WIPW is = 7/3,400 = 0.002051.
For the second factor, the average number of year experience of a group of workers (AvgYr) is used as an
indicator.
Construct a regression (Reg 1) in which UWPM is dependent variable, WIPW and AvgYr are independent
variables. What information that the project manager can withdraw from the regression (Reg 1) above.
Descriptive Statistics
Descriptive Statistics
Mean
Std. Deviation
UWPM
122.117
44.2590
240
WIPM
.003960
.0017123
240
AvgYr
3.6669
1.64448
240
The constructed multiple regression model in which UWPM is dependent variable and WIPW and AvgYr
are independent variables is given by
= 11.299 + 16,886.1851 + 11.9852 +
The estimated regression relationship is: = 11.299 + 16,886.1851 + 11.9852
F-Test
Is there a relationship between the dependent variable of UWPM and any of the explanatory,
independent variables, 1 and 2 , of WIPM and AvgYr suggested by the regression equation under
consideration?
A statistical hypothesis test for the existence of a linear relationship between and any of the 1 and 2 is:
0 : 1 = 2 = 0
1 : ( = 1,2)
ANOVAa
Model
1
Sum of Squares
df
Mean Square
Regression
278085.484
139042.742
Residual
190081.189
237
802.030
Total
468166.673
239
F
173.363
Sig.
.000b
As shown in the above table, since the p-value is small, we reject the null hypothesis that both slope
parameters 1 and 2 are zero, in favor of the alternative that the slope parameters are not both zero. There
is statistical evidence to conclude that, based on the testing results and our assumptions, a linear regression
relationship existing between UWPM and at least one of the independent variables, WIPM or AvgYr (or
both), proposed in the regression model.
Model Summary
Model Summaryb
Model
1
R
.771a
Adjusted R
Square
Estimate
R Square
.594
.591
28.3201
Durbin-Watson
1.978
In the above table, 2 = 0.594, which means that 59.4% of the variation in UWPM is explained by the
combination of the two independent variables, WIPM and AvgYr. Adjusted 2 is 0.591, which is very
close to the unadjusted measure. We conclude that the regression model fits the data very well since a high
percentage of the variation in UWPM is explained by WIPM and/or AvgYr
Coefficients
Hypothesis tests about individual regression slope parameters:
(1)
(2)
0 : 1 = 0
1 : 1 0
0 : 2 = 0
1 : 2 0
Model
1 (Constant)
WIPM
AvgYr
Unstandardized
Standardized
Collinearity
Coefficients
Coefficients
Statistics
B
11.299
Std. Error
Beta
Sig. Tolerance
VIF
6.319
1.788 .075
16886.185 1071.371
.997 1.003
.997 1.003
11.985
1.116
We start with the test for the significance of variable 1 as a prediction variable of WIPM. The hypothesis
test is 0 : 1 = 0 versus 1 : 1 0. As shown in the above table, since the p-value is small, we reject the
null hypothesis that the slope parameter 1 is zero. We therefore conclude that there is statistical evidence
that the slope of with respect to 1 , the population parameter 1 , is not zero. Variable of WIPM is shown
to have some explanatory power with respect to the dependent variable, UWPM.
The hypothesis test for 2 is 0 : 2 = 0 versus 1 : 2 0. This p-value, too, is small. We conclude that
2 of AvgYr is also an important variable in the regression equation.
Finally, we conclude that both independent variables, WIPM and AvgYr, have close relation to the
dependent variable, UWPM that positively affects UWPM. Both slope parameters, 1 and 2 , are positive,
which means that, everything else staying constant, the dependent variable of UWPM increases on average
as WIPM increases or AvgYr increases (or both).
Residual Plots
The above figure is a plot of the regression residuals against the dependent variable UWPM. As we
examine this figure carefully, we see that the spread of the residuals increases as UWPM increases. Thus,
the variance of the residuals is not constant. We have the situation called heteroscedasticitya violation of
the assumption of equal error variance.
The above figure is the normal probability plot of the residuals. The residuals lie along and less deviate
from the diagonal lie in the plot, they less deviates from the normal distribution. In the figure, the deviations
appear to be significant, so we conclude that the model assumption that the population errors are
normally distributed with mean zero and standard deviation is valid.
Multicollinearity
Correlation
Correlations
UWPM
Pearson Correlation
Sig. (1-tailed)
WIPM
AvgYr
UWPM
1.000
.629
.410
WIPM
.629
1.000
-.053
AvgYr
.410
-.053
1.000
.000
.000
WIPM
.000
.205
AvgYr
.000
.205
UWPM
240
240
240
WIPM
240
240
240
AvgYr
240
240
240
UWPM
In the correlation matrix shown in the above figure, we see that the correlation between the independent
variables, WIPM and AvgYr, are not high (0.053). This means that the two variables do not represent the
same direction in space. Being lowly correlated with each other, the two variables do not contain the same
information about the dependent variable and therefore not cause multicollinearity when both are in the
regression equation.
Variance inflation factor
Collinearity
Statistics
Model
Tolerance
VIF
1 (Constant)
WIPM
.997 1.003
AvgYr
.997 1.003
The above figure shows the output for the current regression problem which contains the VIF values in the
last column. We note that the VIF for variables, WIPM and AvgYr, are not greater than 5 that does not
indicate the degree of multicollinearity existing with respect to the independent variables.
CASE 03: TON DUC THANG UNIVERSITY CONTINUOUS IMPROVEMENT IN
EDUCATION PROGRAM
Continuous improvement in education program is always one of the top strategic priority of Ton Duc
Thang University. Every period, TDT University always applies the new teaching methods for continuously
improving education programs. Recently, there is a suspect that the students perform better in the
experiment classes (the classes are applied the new teaching method) compared to the control classes (the
classes are applied the old teaching method).
a. Present the methodology on how much test that suspect (what is your argument and what is an
appropriate Statistics tests and why);
b. How do you conduct sample for Statistics test;
c. Present the result of your Statistics test;
d. What is your conclusion from Statistics test?.
Data
Experiment Class
Control Class
Experiment Class
Control Class
Students
Test 1
Test 2
Test 1
Test 2
Students
Test 1
Test 2
Test 1
Test 2
63
84
88
71
31
62
82
83
91
71
89
59
91
32
77
77
80
63
87
70
85
79
33
87
69
94
79
66
73
64
79
34
63
76
53
92
63
74
95
91
35
73
95
70
58
70
97
92
89
36
90
72
90
86
63
89
71
85
37
84
75
57
74
84
80
58
90
38
64
77
82
82
84
86
62
69
39
85
98
65
66
10
63
77
93
76
40
86
78
54
83
11
62
74
80
80
41
66
86
68
74
12
68
75
89
85
42
83
70
73
66
13
84
98
65
93
43
61
90
71
97
14
90
75
67
61
44
81
70
86
93
15
86
96
76
84
45
60
85
90
80
16
69
76
81
77
46
72
83
80
75
17
87
89
85
75
47
60
90
55
70
18
60
74
85
83
48
87
68
81
96
19
64
81
87
68
49
65
78
94
82
20
67
86
86
91
50
71
81
95
78
21
64
72
86
92
51
74
69
60
63
22
86
69
85
97
52
60
78
90
98
23
88
94
77
60
53
90
85
66
61
24
67
89
85
61
54
68
84
74
90
25
66
73
90
84
55
67
90
83
74
26
83
80
72
93
56
77
95
77
77
27
89
94
70
92
57
79
67
53
93
28
68
66
60
79
58
64
82
80
98
29
81
87
60
67
59
90
92
67
61
30
71
76
74
63
60
67
90
60
95
Students
Test 1
Test 2
Test 1
Test 2
Students
Test 1
Test 2
Test 1
Test 2
61
84
93
75
89
61
84
93
75
89
62
90
87
85
61
62
90
87
85
61
63
63
90
78
74
63
63
90
78
74
64
69
72
54
71
64
69
72
54
71
65
81
66
73
97
65
81
66
73
97
66
67
80
71
93
66
67
80
71
93
67
63
74
92
61
67
63
74
92
61
68
62
78
88
89
68
62
78
88
89
69
90
74
78
69
69
90
74
78
69
70
86
80
57
72
70
86
80
57
72
71
83
96
82
73
71
83
96
82
73
72
80
72
92
59
72
80
72
92
59
73
89
89
61
84
73
89
89
61
84
74
69
69
81
70
74
69
69
81
70
75
64
72
73
86
75
64
72
73
86
76
90
83
85
61
76
90
83
85
61
77
77
85
53
95
77
77
85
53
95
Experiment Class
Control Class
Experiment Class
Control Class
78
86
75
54
93
78
86
75
54
93
79
60
75
85
92
79
60
75
85
92
80
84
94
78
84
80
84
94
78
84
81
66
77
73
70
81
66
77
73
70
82
85
71
91
65
82
85
71
91
65
83
86
86
91
98
83
86
86
91
98
84
83
71
72
90
84
83
71
72
90
85
75
87
67
91
85
75
87
67
91
86
67
87
77
98
86
67
87
77
98
87
88
70
94
98
87
88
70
94
98
88
65
80
68
62
88
65
80
68
62
89
82
74
80
90
89
82
74
80
90
90
89
66
94
84
90
89
66
94
84
Descriptive Statistics
Descriptive Statistics
N
Mean
Std. Deviation
120
75.8000
10.01981
120
81.6833
8.78882
120
75.6167
12.51767
120
79.1833
11.91777
Valid N (listwise)
120
Test 2
Test 1
Test 2
Students
Ex. - Co.
Ex. - Co.
Students
Ex. - Co.
Ex. - Co.
-25
13
31
-21
-9
12
-2
32
-3
14
-9
33
-7
-10
-6
34
10
-16
-32
-17
35
37
-22
36
-14
-8
37
27
26
-10
38
-18
-5
22
17
39
20
32
10
-30
40
32
-5
11
-18
-6
41
-2
12
12
-21
-10
42
10
13
19
43
-10
-7
14
23
14
44
-5
-23
15
10
12
45
-30
16
-12
-1
46
-8
17
14
47
20
18
-25
-9
48
-28
19
-23
13
49
-29
-4
20
-19
-5
50
-24
21
-22
-20
51
14
22
-28
52
-30
-20
23
11
34
53
24
24
24
-18
28
54
-6
-6
25
-24
-11
55
-16
16
26
11
-13
56
18
27
19
57
26
-26
28
-13
58
-16
-16
29
21
20
59
23
31
30
-3
13
60
-5
Test 1
Test 2
Test 1
Test 2
Students
Ex. - Co.
Ex. - Co.
Students
Ex. - Co.
Ex. - Co.
61
91
16
62
26
92
63
-15
16
93
19
12
64
15
94
25
-7
65
-31
95
-6
-13
66
-4
-13
96
-12
22
67
-29
13
97
32
68
-26
-11
98
-4
69
12
99
15
14
70
29
100
13
71
23
101
14
-11
72
-12
13
102
-19
73
28
103
10
74
-12
-1
104
-1
22
75
-9
-14
105
-24
17
76
22
106
16
77
24
-10
107
-6
22
78
32
-18
108
30
17
79
-25
-17
109
32
14
80
10
110
-5
22
81
-7
111
-6
82
-6
112
-28
83
-5
-12
113
29
84
11
-19
114
-7
85
-4
115
16
17
86
-10
-11
116
29
15
87
-6
-28
117
-10
-4
88
-3
18
118
-4
89
-16
119
-9
90
-5
-18
120
-3
30
Descriptive Statistics
N
Mean
Std. Deviation
Test 1 (Ex.-Co.)
120
.1833
16.85728
Test 2 (Ex.-Co.)
120
2.5000
15.53797
Valid N (listwise)
120
For each test (Test 1 and Test 2), the hypothesis test involves two populations: the population of students
who study in the experiment class and the population of students who study in the control class. We want to
test the null hypothesis that the mean test score in both populations is equal versus the alternative
hypothesis that the mean for the experiment-class students is greater. Using the same students for the tests
and pairing their observations in an experiment-and-control (Ex.-Co.) way makes the test more precise than it
would be without pairing.
Under these circumstances, it is easy to see that the variable in which we are interested is the difference
between the test score of the students who study in the experiment class and that of the students who study
in the control class. The population parameter about which we want to draw an inference is the mean
difference between the two populations.
For Test 1, we denote the population parameter by . 1 , the mean difference. This parameter is
defined as . 1 = . 1 . 1 , where . 1 is the average test-1 score of the students
who study in the experiment class and . 1 is the average test-1 score of the students who study in the
control class. Our null and alternative hypotheses are, then,
0 : . 1 0
1 : . 1 > 0
For Test 2, we denote the population parameter by . 2 , the mean difference. This parameter is
defined as . 2 = . 2 . 2 , where . 2 is the average test-2 score of the students
who study in the experiment class and . 2 is the average test-2 score of the students who study in the
control class. Our null and alternative hypotheses are, then,
0 : . 2 0
1 : . 2 > 0
The only assumption we make when we use this test is that the populations of differences are normally
distributed.
Mean
Pair 1
Pair 2
Std.
Interval
Std.
Error
of the Difference
Deviation
Mean
Lower
Upper
-
df
Sig.
(2-tailed) (R-tailed)
.119 119
.905
.453
.081
.040
2.86375
3.23041
Sig.
As shown in the above table, for Test 1 (Pair 1), since the p-value is greater than levels of even larger than
0.10, we conclude that the test-1 scores of the students who study in the experiment class is not higher than
that of the students who in the control class.
However, for Test 2 (Pair 2), since the p-value is smaller than level of 0.05, we conclude that the test-2
scores of the students who study in the experiment class is higher than that of the students who in the
control class, but the testing result is not strongly significant that may change at different levels of .