Escolar Documentos
Profissional Documentos
Cultura Documentos
Feb-Apr, 2016
Defeat is not bitter unless you swallow it. - Joe Clark
Feb-Apr, 2016
1 / 287
Welcome
Feb-Apr, 2016
2 / 287
Announcements
Lectures:
cuts across, UG, MBA, EMBA, WMBA
Oce
Surgery hours
Website
: sakai https://sites.google.com/site/oasare/
Feb-Apr, 2016
3 / 287
Grading Policy
Total 50%
Feb-Apr, 2016
4 / 287
Feb-Apr, 2016
5 / 287
Session 1 Overview
Denition
Data is a collection of observations
This session seeks to explain the dierence between categorical
and numerical data, distinguish among nominal, ordinal, interval
and ratio scale of measurement and provide examples for each.
Feb-Apr, 2016
6 / 287
Session 1 Outline
Feb-Apr, 2016
7 / 287
Reading List
of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Statistics for Business and Economics (11th ed.). Sounth-Western Cengage
Learning
Pages 40 - 45 of Lane, D. (2003). Online Statistics Education: A
Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),
Proceedings of World Conference on Educational Multimedia, Hypermedia
and Telecommunications 2003 (pp. 1317-1320). Chesapeake, VA:
Association for the Advancement of Computing in Education
(AACE).Retrieved January 28, 2015 from http://www.editlib.org/p/14001
Chapter 1 - 3
Feb-Apr, 2016
8 / 287
Data (1)
Feb-Apr, 2016
9 / 287
Data (2)
Feb-Apr, 2016
10 / 287
Categorical Data
Feb-Apr, 2016
11 / 287
labels
or
names
used to
Feb-Apr, 2016
12 / 287
Feb-Apr, 2016
13 / 287
Feb-Apr, 2016
14 / 287
Your class rank in school (can be coded): excellent, good & then
poor
Such responses to questions coded from a scale of 1 to 5 as
strongly dislike, dislike, neutral, like & strongly like
Here, what does the rating of 5 indicate?
Feb-Apr, 2016
15 / 287
Levels of Measurement
Feb-Apr, 2016
16 / 287
Feb-Apr, 2016
17 / 287
& 80
3 students with SAT math scores of 620, 550, & 470 can be
ranked or ordered in terms of best performance to poorest
performance.
Feb-Apr, 2016
18 / 287
Feb-Apr, 2016
19 / 287
Feb-Apr, 2016
20 / 287
Feb-Apr, 2016
21 / 287
Class Exercise 1
Identify the type of data & measurement scale described in each
of the
following
examples.
An opinion poll was taken asking people which party they would
vote for in a general election.
A market researcher stops you in Spintex Road and asks you to
rate between 1 (disagree strongly) and 5 (agree strongly) your
response to opinions presented to you.
Incomes of Ghanaians musicians.
Feb-Apr, 2016
22 / 287
Class Exercise 2
categorical numerical
or
Feb-Apr, 2016
23 / 287
Feb-Apr, 2016
24 / 287
Feb-Apr, 2016
25 / 287
Feb-Apr, 2016
26 / 287
Feb-Apr, 2016
27 / 287
Foreign Aairs
Feb-Apr, 2016
28 / 287
Feb-Apr, 2016
29 / 287
Session 2 Overview
Feb-Apr, 2016
30 / 287
Session 2 Overview
Feb-Apr, 2016
31 / 287
Reading List
of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Statistics for Business and Economics (11th ed.). Sounth-Western Cengage
Learning
Pages 344-379 of Lane, D. (2003). Online Statistics Education: A
Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),
Proceedings of World Conference on Educational Multimedia, Hypermedia
and Telecommunications 2003 (pp. 1317-1320). Chesapeake, VA:
Association for the Advancement of Computing in Education
(AACE).Retrieved January 28, 2015 from http://www.editlib.org/p/14001
Chapter 9
Feb-Apr, 2016
32 / 287
What is a hypothesis?
Feb-Apr, 2016
33 / 287
H0
& alternative
Ha
hypotheses
H0 if
Feb-Apr, 2016
34 / 287
H0
H0
H 0:
there is no
Feb-Apr, 2016
35 / 287
Ha
Ha
H 0.
Feb-Apr, 2016
36 / 287
Ha
alternative format (
).
.
And you have to write the
Ha
H0
Feb-Apr, 2016
37 / 287
Tail of test
The form of
Ha
<
6=
not equal to, not dierent from, diers from, the same as,
does not vary from, on, of , was.
Feb-Apr, 2016
38 / 287
H 0 : H
Ha : <H
Upper Tail Test
H 0 : H
Ha : >H
Feb-Apr, 2016
39 / 287
H 0 : 3
Ha : <3
MaxFlight uses a high-technology manufacturing process to
produce golf balls with a mean driving distance of 295 yards.
___________
___________
Feb-Apr, 2016
40 / 287
According to the CEO, the new BMW car can run more than 24
miles per gallon
___________
___________
Feb-Apr, 2016
41 / 287
Feb-Apr, 2016
42 / 287
Feb-Apr, 2016
43 / 287
Suppose that we want to test the hypothesis that the climate has
changed since industrializatoin. If the mean temperature
throughout history is not as improved as 50 degrees, what is the
null & alternative hypotheses?
H 0 : 50
Ha : > 50
Feb-Apr, 2016
44 / 287
Session 3 Overview
Feb-Apr, 2016
45 / 287
Session 3 Outline
Feb-Apr, 2016
46 / 287
Reading List
Chap 9 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Statistics for Business and Economics (11th ed.). Sounth-Western Cengage
Learning
Chap 16 Buglear, John, 2005, Quantitative Methods for Business: The A-Z
of QM
Chap 9 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for
Business and Economics, 8/E, Pearson
Feb-Apr, 2016
47 / 287
t=
x
s
,
z=
Feb-Apr, 2016
48 / 287
Use
if sample is large (
30)
or
< 30).
if sample is small (
df = n 1
tdf =,n1
Feb-Apr, 2016
49 / 287
p-Value Approach
t calculated or z calculated
p value .
Reject H 0 if p value <
Use the
to compute the
Feb-Apr, 2016
50 / 287
tdf =,n1
Feb-Apr, 2016
51 / 287
Z or t? Decision Rule
Ho ,
Ho
when the
P value <
Feb-Apr, 2016
52 / 287
t table
t Table
cum. prob
one-tail
two-tails
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
80
100
1000
.50
.75
.80
.85
.90
.95
.975
.99
.995
.999
.9995
0.50
1.00
0.25
0.50
0.20
0.40
0.15
0.30
0.10
0.20
0.05
0.10
0.025
0.05
0.01
0.02
0.005
0.01
0.001
0.002
0.0005
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.681
0.679
0.678
0.677
0.675
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
0.851
0.848
0.846
0.845
0.842
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.050
1.045
1.043
1.042
1.037
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.296
1.292
1.290
1.282
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.671
1.664
1.660
1.646
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.000
1.990
1.984
1.962
31.82
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.374
2.364
2.330
63.66
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.660
2.639
2.626
2.581
318.31
22.327
10.215
7.173
5.893
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.610
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
3.307
3.232
3.195
3.174
3.098
636.62
31.599
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.551
3.460
3.416
3.390
3.300
0.000
0.674
0.842
1.036
1.282
1.645
1.960
2.326
2.576
3.090
0%
50%
60%
70%
80%
90%
95%
Confidence Level
98%
99%
99.8%
0.001
3.291
Feb-Apr, 2016
99.9%
53 / 287
Feb-Apr, 2016
54 / 287
p value
of 0.005 (read on
Feb-Apr, 2016
55 / 287
Applied example 1
Suppose that you are thinking of taking over an SME. The
current owner claims the weekly turnover of each existing SME is
not dierent from
GH5000
26
GH4900
SMEs chosen at
GH280
with
Feb-Apr, 2016
56 / 287
and
= 0.05,
Feb-Apr, 2016
57 / 287
t=
t=
x
s
n
49005000
280
= 1.82
26
This is the
Now, nd
Feb-Apr, 2016
58 / 287
Two-tail (see
tn1,
Feb-Apr, 2016
59 / 287
p value ,
use
t cal. = |1.82|.
pool
Figure falls between 1.812 & 1.833
Either way, you get 0.1 (at the top) as its a two-tail
CONCLUDE
reject
H0
: As p-value of 0.1
>
t table .
Feb-Apr, 2016
60 / 287
Feb-Apr, 2016
61 / 287
3.6
4.2
3.8
2.7
4.0
4.8
2.7
3.9
4.2
4.5
Feb-Apr, 2016
62 / 287
H0 : 4kg
H0 : < 4kg
Feb-Apr, 2016
63 / 287
Signicance level
= 5% = 0.05
Feb-Apr, 2016
64 / 287
and
= 0.05
t=
3.844
0.69
= 0.73
10
This is the
t calculated
or
t test
statistic
Feb-Apr, 2016
65 / 287
is the
t tabulated
Feb-Apr, 2016
66 / 287
Since the
t statistic
of |0.73| is <
t critical
of 1.833 we
Feb-Apr, 2016
67 / 287
p value ,
use
t cal. = |0.73|.
Search from
CONCLUDE
p value = 0.25
5% sig. level.
: As
reject H0 at the
>=
0 05, we do not
Feb-Apr, 2016
68 / 287
Practice Question 1
And(pg357,28)
Feb-Apr, 2016
69 / 287
using
Feb-Apr, 2016
70 / 287
Practice Question 2
A travel magazine wants to classify transatlantic gateway airports
using mean rating for the population of travelers. A scale with a
low score of 0 & a high score of 10 is used & airports with a
population mean rating above 7 will be designated as superior
airports. The magazine sta sampled 16 travelers at each
airport. The sample for London's Heathrow Airport provided a
mean rating of 7.25 with a standard deviation of 1.052. Should
Heathrow be designated as superior airport?
( = 10%).
Feb-Apr, 2016
71 / 287
Practice Question 3
According to the label on packets of popcorns there should be 25
g of popcorns in every packet. The standard deviation of the
weight of popcorns per packet is known to be 2.2 g & the
weights are normally distributed. The mean weight of popcorns
in a random sample of 15 packets is 23.5 g. Test the hypothesis
that the information on the label is valid using a 1% level of
condence.
Feb-Apr, 2016
72 / 287
Session 4 Overview
We extend the 1 sample/population analysis to a 2-sample study,
when the dierence between the 2 population means is
important. .
For example, we may want to test for the eect of customer
training workshop on the sales of salespersons in a company or
the impact of a reform in an industry. Policy prescriptions may
be oered based on the ndings.
Feb-Apr, 2016
73 / 287
Session 4 Overview
Feb-Apr, 2016
74 / 287
Reading List
Chap 10 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Statistics for Business and Economics (11th ed.). Sounth-Western Cengage
Learning
Pages 344-379 of Lane, D. (2003). Online Statistics Education: A
Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),
Proceedings of World Conference on Educational Multimedia, Hypermedia
and Telecommunications 2003 (pp. 1317-1320).
Chap 17 Buglear, John, 2005, Quantitative Methods for Business: The A-Z
of QM
Chap 10 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for
Business and Economics, 8/E, Pearson
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
Feb-Apr, 2016
75 / 287
Feb-Apr, 2016
76 / 287
Nonparametric
One-sample t-test
Binomial
Paired t-test (dept)
Wilcoxon signed-rank test
Paired t-test (dept)
McNemar's Chi-square test
Independent t-test
Mann-Whitney U or Wil. ranksum
Pearson's correlation
Spearman's corr (xy)
Pearson's correlation
Kendall tau rank corr (xyz)
ANOVA (>2 indep. grps)
Kruskal-Wallis test
Repeated meas. ANOVA
Friedman Test, Cochran Q
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
Feb-Apr, 2016
77 / 287
Paired/Dependent Samples
(d )
normally distributed
Feb-Apr, 2016
78 / 287
H0
Ha
H0 : d = 0 Ha : d 6= 0 (not equal)
H0 : d 0 Ha : d > 0 (greater than)
H0 : d 0 Ha : d < 0 (less than)
Find tn1, =?? i.e. t-tabulated
Type of test
Two-sided
One-sided
One-sided
Feb-Apr, 2016
79 / 287
d
S
d
n
Sd =
(dd)
n1
where
Feb-Apr, 2016
80 / 287
Decision Rule
Reject
Reject
Reject
If
H0
H0
H0
H0
if
if
if
p value <
t calculated > t tabulated
t statistic > critical value
between 2 samples
Feb-Apr, 2016
81 / 287
t=
r
& Sd =
(dd)
n1
di
n
S
d
n
d
Sd / n
Feb-Apr, 2016
82 / 287
After - Before
Feb-Apr, 2016
83 / 287
Feb-Apr, 2016
84 / 287
6
20
3
0
4
4
6
2
0
0
- 2
-14
- 1
0
- 4
-21
d = ni
= - 4.2
Sd =
(d d)
n 1
= 5.67
Feb-Apr, 2016
85 / 287
t table
t Table
cum. prob
one-tail
two-tails
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
80
100
.50
.75
.80
.85
.90
.95
.975
.99
.995
.999
.9995
0.50
1.00
0.25
0.50
0.20
0.40
0.15
0.30
0.10
0.20
0.05
0.10
0.025
0.05
0.01
0.02
0.005
0.01
0.001
0.002
0.0005
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.681
0.679
0.678
0.677
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
0.851
0.848
0.846
0.845
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.050
1.045
1.043
1.042
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.296
1.292
1.290
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.671
1.664
1.660
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.000
1.990
1.984
31.82
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.374
2.364
63.66
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.660
2.639
2.626
318.31
22.327
10.215
7.173
5.893
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.610
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
3.307
3.232
3.195
3.174
636.62
31.599
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.551
3.460
3.416
3.390
0.001
Feb-Apr, 2016
86 / 287
t=
t=
d
Sd / n
4.
2
5.67/ 5
1 66
t = . i.e. t-calculated
tn1, = t51,0.05 = .
i.e.
2 776
t-tabulated
Feb-Apr, 2016
87 / 287
Use
1 66|
t=| .
p value = 0.10
Feb-Apr, 2016
88 / 287
p = 0.10
t statistic of |1.66|
= 0.05) we don't
>
is <
t critical
of 2.776 (or
signicance level.
No signicant dierence between complaints before & after.
Training was bogus. Any dierence was by chance.
Feb-Apr, 2016
89 / 287
Summary of results
Has the training made a difference in the number of
d = - 4.2
Reject
Reject
/2
- 2.776
2.776
- 1.66
Test Statistic:
t=
d
4.2
=
= 1.66
sd/ n 5.67/ 5
Feb-Apr, 2016
90 / 287
Feb-Apr, 2016
91 / 287
Practice Question 1
A new therapy has been devised which is supposed to
lower blood pressure. The systolic blood pressure of 10
patients were taken before and after completing the course
(see below). Does this therapy work? Use a significance
level of 0.05.
Before After Difference
120
130 10
131
125 -6
136
128 -8
122
124 2
138
129 -9
139
130 -9
131
132 1
123
129 6
125
130 5
Feb-Apr, 2016
92 / 287
Practice Question 2
Tweaa is the VC of a large manufacturing company. He recently
noticed an increase in absenteeism that he thinks is related to
the general health of employees. Four years ago, in an attempt
to improve the situation, he began a tness program in which
employees exercise during their lunch hour. To evaluate the
program, he randomly samples some participants & found the
number of days each was absent. Below are the results. At 0.05
signicance level, did the program reduce absenteeism?
Feb-Apr, 2016
93 / 287
Feb-Apr, 2016
94 / 287
Feb-Apr, 2016
95 / 287
Session 5 Overview
In some practical applications normality axiom is not tenable
especially when we have a wide range of distributions of the
parent population.
In such a case, we use nonparametric tests or distribution free
tests.
In this session we nonparametric tests for testing equality of
means/medians of 2 population distributions
Feb-Apr, 2016
96 / 287
Session 5 Overview
Feb-Apr, 2016
97 / 287
Reading List
Chap 19 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Statistics for Business and Economics (11th ed.). Sounth-Western Cengage
Learning
Pages 344-379 of Lane, D. (2003). Online Statistics Education: A
Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),
Proceedings of World Conference on Educational Multimedia, Hypermedia
and Telecommunications 2003 (pp. 1317-1320).
Chap 10 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for
Business and Economics, 8/E, Pearson
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
Feb-Apr, 2016
98 / 287
Nonparametric
One-sample t-test
Binomial
Paired t-test (dept)
Wilcoxon signed-rank test
Paired t-test (dept)
McNemar's Chi-square test
Independent t-test
Mann-Whitney U or Wil. ranksum
Pearson's correlation
Spearman's corr (xy)
Pearson's correlation
Kendall tau rank corr (xyz)
ANOVA (>2 indep. grps)
Kruskal-Wallis test
Repeated meas. ANOVA
Friedman Test, Cochran Q
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
Feb-Apr, 2016
99 / 287
Feb-Apr, 2016
100 / 287
The main idea is to test whether 2 samples come from the same
population (i.e., if 2 populations have the same shape) by
comparing the ranks or ordinal values of the observations.
Some investigators interpret this test as comparing the medians
between the 2 populations.
Feb-Apr, 2016
101 / 287
H 0 : m1 m2 = 0
Ha : m1 m2 6= 0
If median (m) of sample
Ha : m1 m2 > 0 or
Ha : m1 > m2
1 is > sample 2,
Ha
becomes
Feb-Apr, 2016
102 / 287
H0 :
Ha :
Example
H0 :
Ha :
banks.
Feb-Apr, 2016
103 / 287
Median class size for Math is larger than median class size for
English. Write
H0: MedianM
H 0 & Ha ____
MedianE (Math
English median)
HA: MedianM > MedianE (Math median is larger)
Feb-Apr, 2016
104 / 287
Median class size for Math is not at least that of the median
class size for English. Write Ho and Ha ____
H0: MedianM
MedianE
Feb-Apr, 2016
105 / 287
Feb-Apr, 2016
106 / 287
Feb-Apr, 2016
107 / 287
Plot
&,E
^
E
Feb-Apr, 2016
108 / 287
n1
may be >
n2 .
(n1 = n2 = 5),
H 0 : m1 m2 = 0 ; Ha : m1 m2 6= 0
Feb-Apr, 2016
109 / 287
(n = 10)
Feb-Apr, 2016
110 / 287
Feb-Apr, 2016
111 / 287
R1 =
R2 =
= 37
drug) = 18
1)
(R1 + R2 ) = n(n+
2
37 + 18 = (10 11) /2 = 55
Check if:
Feb-Apr, 2016
112 / 287
The Mann-Whitney
U1 = R1
U2 = R2
n1 (n1 +1)
2
n2 (n2 +1)
2
Feb-Apr, 2016
113 / 287
Details of U
n1 (n1 +1)
2
if
all
observations in sample 2.
n2 (n2 +1)
2
if
all
observations in sample 1.
Feb-Apr, 2016
114 / 287
Results
Feb-Apr, 2016
115 / 287
n1
n2
Un ,n ,
1
U tab
Feb-Apr, 2016
116 / 287
U1 Un1,n2,
One-sided test:
if
reject
H0
if
U2
U2 Un1,n2,
tabulated U, reject
Feb-Apr, 2016
117 / 287
Back to example
U1 = R1
U2 = R2
n1 (n1 +1)
2
n2 (n2 +1)
2
= 37
= 18
5(5+1)
2
5(5+1)
2
= 22
=3
(smaller)
U1 + U2 = n1 n2
22 + 3 = 5 5 = 25
Feb-Apr, 2016
118 / 287
Decision
Umin Un1,n2,/2
Umin = 3> U5,5,0.05 = 2
We can't reject Ho because 3 >
Reject H0 if
2.
Feb-Apr, 2016
119 / 287
Conclusion
= 0.05,
are
Feb-Apr, 2016
120 / 287
1st: Arrange data long (vertical, with 1,0 for next column)
wm <-read.delim('clipboard') ## load data
wm ## view data
boxplot(wm, col="green")## quick boxplot
wilcox.test(drug ~ group, conf.int = TRUE, paired = FALSE,
data=wm)## compute mann-whitney test
Feb-Apr, 2016
121 / 287
Feb-Apr, 2016
122 / 287
Practice Assignment 1
A random sample of starting monthly salaries for graduates from
2 Ghanaian universities are below (GHc1000s):
UG KNUST
30
28.5
35
38
29
30.5
37.5
26
32
37
40
29
33
32
Feb-Apr, 2016
123 / 287
Feb-Apr, 2016
124 / 287
Session 6 Overview
Feb-Apr, 2016
125 / 287
Session 6 outline
Feb-Apr, 2016
126 / 287
Reading List
Chap 13 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Statistics for Business and Economics (11th ed.). Sounth-Western Cengage
Learning
Pages 493-549 of Lane, D. (2003). Online Statistics Education: A
Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),
Proceedings of World Conference on Educational Multimedia, Hypermedia
and Telecommunications 2003 (pp. 1317-1320).
Chap 15 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for
Business and Economics, 8/E, Pearson
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
Feb-Apr, 2016
127 / 287
T-Test
Feb-Apr, 2016
128 / 287
ANOVA
Feb-Apr, 2016
129 / 287
MANOVA
Feb-Apr, 2016
130 / 287
Nonparametric
One-sample t-test
Binomial
Paired t-test (dept)
Wilcoxon signed-rank test
Paired t-test (dept)
McNemar's Chi-square test
Independent t-test
Mann-Whitney U or Wil. ranksum
Pearson's correlation
Spearman's corr (xy)
Pearson's correlation
Kendall tau rank corr (xyz)
ANOVA (>2 indep. grps)
Kruskal-Wallis test
Repeated meas. ANOVA
Friedman Test, Cochran Q
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
Feb-Apr, 2016
131 / 287
Feb-Apr, 2016
132 / 287
ANOVA
Feb-Apr, 2016
133 / 287
Feb-Apr, 2016
134 / 287
Feb-Apr, 2016
135 / 287
Feb-Apr, 2016
136 / 287
Examples:
Feb-Apr, 2016
137 / 287
ANOVA (hypotheses)
H0 : 1 = 2 = 3 = ... = K
All population means are equal
i.e., no variation in means between groups
HA : i 6= j
i, j
pair
Feb-Apr, 2016
138 / 287
ANOVA (3)
Feb-Apr, 2016
139 / 287
ANOVA (4)
Feb-Apr, 2016
140 / 287
ANOVA chart
Feb-Apr, 2016
141 / 287
SS = Sum of Squares
df = degrees of freedom
MS = Mean Squares
n = sum of the sample sizes
K = number of groups
Feb-Apr, 2016
142 / 287
Feb-Apr, 2016
143 / 287
SST
SST =
ni
K X
X
(xij x)
i=1 j=1
Feb-Apr, 2016
144 / 287
SSW
SSW =
ni
K X
X
(xij xi )2
i=1 j=1
Feb-Apr, 2016
145 / 287
SSG
SSG =
K
X
ni (
xi x)
i=1
Feb-Apr, 2016
146 / 287
Feb-Apr, 2016
147 / 287
df1 = K 1
df2 = n K
H0
if
F > FK 1,nK ,
Feb-Apr, 2016
148 / 287
Decision: ANOVA
Feb-Apr, 2016
149 / 287
Example of ANOVA
Feb-Apr, 2016
150 / 287
Feb-Apr, 2016
151 / 287
ANOVA Table
Feb-Apr, 2016
152 / 287
ANSWER
H 0 : 1 = 2 = 3
Ha : i 6= j for at least
i, j = 1, 2, 3
Feb-Apr, 2016
153 / 287
Fcritical = FK 1,NK ,
K = number of groups , df1 = K 1, row
N = total sample from all groups , df2 = n K
Ftab = FK 1,NK , = F31,93,0.05 = F2,6,0.05
Ftab = 5.14
, column
Feb-Apr, 2016
154 / 287
ANOVA Table
Feb-Apr, 2016
155 / 287
Recall data
Feb-Apr, 2016
156 / 287
x1 =
x2 =
x3 =
xi,
6
=2
3
15
=5
3
24
=8
3
Feb-Apr, 2016
157 / 287
(xij )
N
1+2+3+4+5+6+7+8+9
=5
9
Grand mean:
x=
x=
SSG =
K
X
ni (
xi x)
i=1
Feb-Apr, 2016
158 / 287
SSW =
ni
K X
X
(xij xi )2
i=1 j=1
Feb-Apr, 2016
159 / 287
SSW2
SSW2
SSW3
SSW3
= (4 5 + (5 5) + (6 5)2
=2
= (7 8)2 + (8 8)2 + (9 8)2
=2
Feb-Apr, 2016
160 / 287
BG
WG
SS df MS F
54
Feb-Apr, 2016
161 / 287
BG
WG
SSW
NK
SS df MS F
54
27
Feb-Apr, 2016
162 / 287
MSG
MSW
27
= 27
1
BG
WG
SS df MS F
54
27
27
H0
at 5% since
Feb-Apr, 2016
163 / 287
Feb-Apr, 2016
164 / 287
Feb-Apr, 2016
165 / 287
Feb-Apr, 2016
166 / 287
Example 2: Attempt
Feb-Apr, 2016
167 / 287
Feb-Apr, 2016
168 / 287
x1 = 249.2
n1 = 5
x2 = 226.0
n2 = 5
x3 = 205.8
n3 = 5
x = 227.0
n = 15
K=3
F=
2358.2
= 25.275
93.3
Feb-Apr, 2016
169 / 287
Feb-Apr, 2016
170 / 287
Feb-Apr, 2016
171 / 287
Feb-Apr, 2016
172 / 287
Count
Sum
Average
Variance
Club 1
1246
249.2
108.2
Club 2
1130
226
77.5
Club 3
1029
205.8
94.2
ANOVA
Source of
Variation
SS
df
MS
Between
Groups
4716.4
2358.2
Within
Groups
1119.6
12
93.3
Total
5836.0
14
F
25.275
P-value
4.99E-05
F crit
3.89
Feb-Apr, 2016
173 / 287
Feb-Apr, 2016
174 / 287
Feb-Apr, 2016
175 / 287
in excel
anovaclub
## view data
Feb-Apr, 2016
176 / 287
Feb-Apr, 2016
177 / 287
Results
Conclusion?? As
Feb-Apr, 2016
178 / 287
Nonparametric
One-sample t-test
Binomial
Paired t-test (dept)
Wilcoxon signed-rank test
Independent t-test
Mann-Whitney U or Wil. ranksum
Pearson's correlation
Spearman's corr (xy)
Pearson's correlation
Kendall tau rank corr (xyz)
ANOVA (>2 indep. grps)
Kruskal-Wallis test
Repeated meas. ANOVA
Friedman Test
Feb-Apr, 2016
179 / 287
RMBA
9
7
11
9
12
10
WMBA
13
20
14
13
EMBA
10
9
15
14
15
Feb-Apr, 2016
180 / 287
Feb-Apr, 2016
181 / 287
Feb-Apr, 2016
182 / 287
Feb-Apr, 2016
183 / 287
Assignment cont.
Feb-Apr, 2016
184 / 287
Assignment cont.
Feb-Apr, 2016
185 / 287
Assignment cont.
Feb-Apr, 2016
186 / 287
Session 7 Overview
Research projects and managerial decisions often involve the
linkages between two or more variables. For instance, what is the
relationship between blood pressure and a person's weight? Do
other factors age, stress, diet, exercise etc - apart from weight
aect BP? How do you control these? How do you use each of
them to predict BP? What assumptions must be in place for the
modelling of such an association to exist? This session examines
association via correlation and causality via regression.
Feb-Apr, 2016
187 / 287
Session 7 outline
Feb-Apr, 2016
188 / 287
Reading List
Chap 14 & 15 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Statistics for Business and Economics (11th ed.). Sounth-Western Cengage
Learning
Chap 2-8 of Gujarati D. (2003), Basic Econometrics, 4th ed
Chap 11 & 12 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics
for Business and Economics, 8/E, Pearson
Chap 1-7 of Wooldridge, J.M. (2013), Introductory Econometrics: A
Modern Approach, 5th ed
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
Feb-Apr, 2016
189 / 287
What is Correlation?
Feb-Apr, 2016
190 / 287
Feb-Apr, 2016
191 / 287
Feb-Apr, 2016
192 / 287
Correlation Coecient
Correlation coecient is a statistic that quanties a relation
between two variables
Falls between -1.00 and 1.00
The absolute value of the number (not the sign) indicates the
strength of the relation
Feb-Apr, 2016
193 / 287
Feb-Apr, 2016
194 / 287
What is Regression?
Regression analysis is a statistical technique used to analyze the
1
1 several
simple multiple
nexus between a
variable &
variables, leading to
) predictor/independent/explanatory
(
Feb-Apr, 2016
195 / 287
Simple regression:
yi = + xi + ui
Multiple model:
yi = 0 + 1 Ai + 2 Bi + 3 Ci + 4 Di + 5 Ei + ui
i = 1, ..., n = individual (group, country) index .
This is for a cross-sectional data set
Feb-Apr, 2016
196 / 287
Feb-Apr, 2016
197 / 287
Feb-Apr, 2016
198 / 287
Assumptions
Linearity :
N 0, 2I
Feb-Apr, 2016
199 / 287
Feb-Apr, 2016
200 / 287
ROAi =
Feb-Apr, 2016
201 / 287
Example
Brukutu Ventures, a local gin distillery is concerned about the
demand for its favourite gin bitters. Demand in most of its retail
shops has been hit hard due to new entrants into the market.
Management is concerned & wants to determine which
Feb-Apr, 2016
202 / 287
Feb-Apr, 2016
203 / 287
Example cont'
location of retail shop (where LU=1 if location is urban or 0 if
rural area); dominant occupation (teaching, shing & trading)
around a retail shop (where OT=1 if the occupation is teaching
or 0 otherwise & OTR=1 if the occupation is trading or 0
otherwise); & nally, the dominant religion (Christian, Muslims
& Buddhists) of the people (where RC=1 if the people are
Christians or 0 otherwise & RM=1 if the people are Muslims or 0
otherwise). Use the correlation matrix & (dummy) regression
output to answer the questions that follow.
Feb-Apr, 2016
204 / 287
Feb-Apr, 2016
205 / 287
Feb-Apr, 2016
206 / 287
Feb-Apr, 2016
207 / 287
Feb-Apr, 2016
208 / 287
Feb-Apr, 2016
209 / 287
Correlation questions
Feb-Apr, 2016
210 / 287
Feb-Apr, 2016
211 / 287
OT & RC (0.05)
Q & OT (-0.11)
From matrix, identify 2 very good predictors
P (-0.87)
OTR (-0.78)
Feb-Apr, 2016
212 / 287
Feb-Apr, 2016
213 / 287
Regression output
Feb-Apr, 2016
214 / 287
Feb-Apr, 2016
215 / 287
Feb-Apr, 2016
216 / 287
Feb-Apr, 2016
217 / 287
R2
that is
RSS
ESS
R2 = TSS
= 1 TSS
0 < R2 < 1
R
Y
If
Also, Multiple
R=
R2
is explained by
Feb-Apr, 2016
218 / 287
Adjusted R 2 (1)
R2
never
R2
&
has motivated an
model.
Feb-Apr, 2016
219 / 287
Adjusted R 2 (2)
R =1
ESS/(nK 1)
TSS/(n1)
n1
R = 1 (1 R 2 ) nK
1
With the
, as
rises,
RSS
&
DoF
both fall
Feb-Apr, 2016
220 / 287
j
S.E .(^ )
j
j tnK 1, Sj
TSS = ESS + RSS
N = TSSdf + 1
2 = RSS
TSS
Feb-Apr, 2016
221 / 287
Multiple
R=
MSR =
MSE =
RSS
K
ESS
NK 1
R2
N1
R = 1 (1 R 2 ) NK
1
F = MSR
MSE
FK ,NK 1,
Feb-Apr, 2016
222 / 287
RSS
R 2 = TSS
=
R 2 = 0.99
34,197.35
34600
= 0.99
D = N = TSSdf + 1 = 39 + 1 = 40
E = RSSdf = 8
F = TSSdf = 8 + 31 = 39
Feb-Apr, 2016
223 / 287
Feb-Apr, 2016
224 / 287
Fill in (2)
C =
R = 1 (1 R 2 )
N1
NK 1
C = 1-(1-0.99)(39)/(31) = 0.99
H = MSR=RSS/K = 34197.35/8= 4274.67
I = MSE=ESS/N-K-1 = 402.65/31 = 12.99
J = F=MSR/MSE = 4274.67/12.99 = 329.07
Feb-Apr, 2016
225 / 287
Feb-Apr, 2016
226 / 287
Fill in (3)
tj =
K =
L =
j
S.E .(^ )
j
coeff
SE
t = SE
= 419.92.43 = 3.95
N, Read N =| 1.35 |, you N = 0.20
O, Read O =| 8.58 |, you N = 0.00
M =
Feb-Apr, 2016
227 / 287
Feb-Apr, 2016
228 / 287
Q = 0 + 1 P + 2 C + 3 M + 4 LU + 5 OT
+6 OTR + 7 RC + 8 RM + u
Write the estimated regression equation.
Q = 280.27 - 3.16P - 0.11C - 0.98M - 5.11LU - 19.43OT 46.47OTR + 15.55RC - 23.01RM
Feb-Apr, 2016
229 / 287
Feb-Apr, 2016
230 / 287
p value = 0.01
<
= 0.05
Feb-Apr, 2016
231 / 287
Feb-Apr, 2016
232 / 287
Feb-Apr, 2016
233 / 287
Test
ANSWER
a two-tail)
Feb-Apr, 2016
234 / 287
Feb-Apr, 2016
235 / 287
ANSWER CONTINUES..
t cal (1 ) = coe/s.e. = -2.01 2.01
t tab = tnK 1, = t4081,0.05 = t31,0.05
= 2.042
Feb-Apr, 2016
236 / 287
Fitness of Model
Is the regression model well t? Interpret the apt measure used
ANSWER
R2
= 0.99 = 99%
Since
R 2 > 50%,
model is t!!!
Feb-Apr, 2016
237 / 287
Feb-Apr, 2016
238 / 287
ANSWER
H0
If
p value <
Feb-Apr, 2016
239 / 287
H0
if
F =
MSR
MSE
> FK ,nK 1,
= 8,31,0.05
i.e. if
= 2.26
Feb-Apr, 2016
240 / 287
Feb-Apr, 2016
241 / 287
j tnK 1,/2 Sj
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
3 t4081,/2 S
= ??
Feb-Apr, 2016
242 / 287
CI : 2.51 3 0.55
Feb-Apr, 2016
243 / 287
VIF =
1
1Rj2
Feb-Apr, 2016
244 / 287
where
Rj2
is the
R2
independent variables
If there is no collinearity between X1 & X2, then VIF = 1.
As a rule of thumb, VIF > 10 indicates high collinearity
Feb-Apr, 2016
245 / 287
Feb-Apr, 2016
246 / 287
Feb-Apr, 2016
247 / 287
Question on multicollinearity
Using only the VIF, which one of the pairs of variables selected
to be multicollinear may be deleted? Justify?
ANSWER
Feb-Apr, 2016
248 / 287
Feb-Apr, 2016
249 / 287
Feb-Apr, 2016
250 / 287
Feb-Apr, 2016
251 / 287
Feb-Apr, 2016
252 / 287
Feb-Apr, 2016
253 / 287
Comparing Predictions..
Feb-Apr, 2016
254 / 287
Feb-Apr, 2016
255 / 287
Feb-Apr, 2016
256 / 287
Feb-Apr, 2016
257 / 287
Feb-Apr, 2016
258 / 287
Feb-Apr, 2016
259 / 287
= 0.00387 < .
Qd data is
not normal
plot(density(gin$Q)) #density plot for Qd of gin bitters
Feb-Apr, 2016
260 / 287
Scatter plot of Qd vs P
Feb-Apr, 2016
261 / 287
OR
Feb-Apr, 2016
262 / 287
Feb-Apr, 2016
263 / 287
Do a simple regression
Feb-Apr, 2016
264 / 287
Do Multiple Regression in R
ginmulti = lm(Q ~ P+C+M+LU+OT+OTR+RC+RM,
data=gin) #run OLS. multiple reg with coded data
ginmulti = lm(Q ~ P+C+M+loca+occu+rel, data=gin) #run
OLS with the nominal data
summary(ginmulti) ## display results
round(connt(ginmulti),2) ## Condende Interval using proled
log-likelihood & correct to 2 d.p.
Feb-Apr, 2016
265 / 287
Do Multiple Regression in R
Feb-Apr, 2016
266 / 287
Feb-Apr, 2016
267 / 287
Feb-Apr, 2016
268 / 287
Feb-Apr, 2016
269 / 287
Feb-Apr, 2016
270 / 287
Feb-Apr, 2016
271 / 287
Feb-Apr, 2016
272 / 287
Feb-Apr, 2016
273 / 287
Feb-Apr, 2016
274 / 287
Feb-Apr, 2016
275 / 287
Feb-Apr, 2016
276 / 287
Feb-Apr, 2016
277 / 287
Feb-Apr, 2016
278 / 287
Feb-Apr, 2016
279 / 287
Feb-Apr, 2016
280 / 287
Reg Output
Feb-Apr, 2016
281 / 287
4 Is the estimate of
advert
price
Feb-Apr, 2016
282 / 287
Test
if the coecient of
income
Feb-Apr, 2016
283 / 287
Feb-Apr, 2016
284 / 287
References
Feb-Apr, 2016
285 / 287
References
Feb-Apr, 2016
286 / 287
References
Feb-Apr, 2016
287 / 287