Research Methods SESSIONS STUDENTS Abeeku PDF

RESEARCH METHODS
ANALYSIS OF QUANTITATIVE DATA FOR RESEARCH

Dr Kwaku Ohene-Asare & Mr. Abeeku E. Edu
Website:https://sakai.ug.edu.gh/portal
Email:kohene-asare@ug.edu.gh; asedu@ug.edu.gh
The University of Ghana Business School, Dept of OMIS
Feb-Apr, 2016
Defeat is not bitter unless you swallow it. - Joe Clark
Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)
RESEARCH METHODS (slide 1)
Feb-Apr, 2016
1 / 287
Welcome
Welcome to the journey through the valley of the

shadows of multivariate data analysis
"A little learning is a dang'rous thing; Drink deep, or
taste not the Pierian spring." Alexander Pope
Feb-Apr, 2016
2 / 287
Announcements
Lectures:
cuts across, UG, MBA, EMBA, WMBA
Oce
Surgery hours
Website
: UGBS, Roof Top, Room 16

: By appointment
: sakai https://sites.google.com/site/oasare/
Feb-Apr, 2016
3 / 287
Grading Policy
Final Examination 35%

Group assignments & IAs 10%
Class Partipation 5%
Total 50%
My teaching style is interactive
Feb-Apr, 2016
4 / 287
Strategies for Success
Be of good courage! You can master this material!!

Some defeat themselves. No!!! Be positive.
Let me know what is & what isn't working for you
Put in the time, eort & energy! You can pass!!!
Anywhere is a walking distance if you have time
Feb-Apr, 2016
5 / 287
Session 1 Overview
Denition
Data is a collection of observations
This session seeks to explain the dierence between categorical
and numerical data, distinguish among nominal, ordinal, interval
and ratio scale of measurement and provide examples for each.
Feb-Apr, 2016
6 / 287
Session 1 Outline
The key topics to be covered in the session are as follows:

Categorical/Qualitative Data
Numerical/Quantitative Data
Scales of measurement
Feb-Apr, 2016
7 / 287
Reading List
of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Statistics for Business and Economics (11th ed.). Sounth-Western Cengage
Learning
Pages 40 - 45 of Lane, D. (2003). Online Statistics Education: A
Multimedia Course of Study. In D. Lassner & C. McNaught (Eds.),
Proceedings of World Conference on Educational Multimedia, Hypermedia
and Telecommunications 2003 (pp. 1317-1320). Chesapeake, VA:
Association for the Advancement of Computing in Education
(AACE).Retrieved January 28, 2015 from http://www.editlib.org/p/14001
Chapter 1 - 3
Feb-Apr, 2016
8 / 287
Data (1)
Before undertaking a specic quantitative research,

data must be obtained. The data type and the scale of
measurement help relate the type of statistics the
analyst can use to examine the data.
Feb-Apr, 2016
9 / 287
Data (2)
The scale of measurement determines the amount of

information contained in the data & indicates the most
appropriate data summarization & statistical analyses
Feb-Apr, 2016
10 / 287
Categorical Data
A data is categorical if the observation in it can be grouped into

categories
An observation is from a set of non-overlapping categories

Selecting a bad category can mess the research outcome
2 levels of measurement exist under categorical data
Feb-Apr, 2016
11 / 287
Categorical Data - Nominal
When the data for a variable consist of
labels
or
names
used to
identify an attribute of the element, the scale of measurement is

considered a nominal scale
There is no intrinsic order
Feb-Apr, 2016
12 / 287
Nominal Data Examples
Sex, Employment, Marital Status, Gender

In a dataset, males could be coded as 0, females as 1. Here, the
scale of measurement is still nominal even though the data
appear as numeric value. Why?
Martial status could be coded as D if divorced, M if married, S if
single, & W if widowed. Order is useless.
Feb-Apr, 2016
13 / 287
Categorical Data - Ordinal
The measurement scale for a data is ordinal if the data exhibits

the properties of nominal data & the order or rank or rating of
the data is meaningful
Ranking could be done in ascending or descending order. e.g.
attitudes on a likert scale
Feb-Apr, 2016
14 / 287
Ordinal Data Examples
Your class rank in school (can be coded): excellent, good & then
poor
Such responses to questions coded from a scale of 1 to 5 as
strongly dislike, dislike, neutral, like & strongly like
Here, what does the rating of 5 indicate?
Feb-Apr, 2016
15 / 287
Levels of Measurement
Feb-Apr, 2016
16 / 287
Numerical Data - Interval
A variable is an interval scale if the data have all the properties

of ordinal data & the interval between values is expressed in
terms of a xed unit of measure.
Scores on an interval scale can be added and subtracted but can
not be meaningfully multiplied or divided. Always numeric.
Feb-Apr, 2016
17 / 287
Interval Data Examples
Dierence between a temperature of 100 degrees & 90 degrees is

the same dierence as between 90
& 80
3 students with SAT math scores of 620, 550, & 470 can be
ranked or ordered in terms of best performance to poorest
performance.
Feb-Apr, 2016
18 / 287
Numerical Data - Ratio
A data is ratio if the values/observations belonging to it may

take on any value within a nite or innite interval.
They are numeric variable with absolute (non-arbitrary) 0. You
can count, order and measure
Feb-Apr, 2016
19 / 287
Ratio Data Examples
income, amount of sugar in an orange

number of employees, age
distance, height, weight
time needed to run a mile
Feb-Apr, 2016
20 / 287
Measurement Levels: Summary
Feb-Apr, 2016
21 / 287
Class Exercise 1
Identify the type of data & measurement scale described in each
of the
following
examples.
An opinion poll was taken asking people which party they would
vote for in a general election.
A market researcher stops you in Spintex Road and asks you to
rate between 1 (disagree strongly) and 5 (agree strongly) your
response to opinions presented to you.
Incomes of Ghanaians musicians.
Feb-Apr, 2016
22 / 287
Class Exercise 2
The Wall Street Journal (WSJ) subscriber survey (October 13,

2003) asked 46 questions about subscriber characteristics and
interests. State whether each of the following questions provided
categorical numerical
or
data and indicate the measurement
scale appropriate for each.
Feb-Apr, 2016
23 / 287
Solution to Exercise 2 (1)
What is your age?

numerical &
ratio
Are you male or female?
categorical &
nominal
Feb-Apr, 2016
24 / 287

When did you rst start reading Daily Graphic? High school,
college, early career, midcareer, late career, or retirement?
categorical &
ordinal
How long have you been in your job or position?
numerical &
ratio
Feb-Apr, 2016
25 / 287
What type of car do you want to next buy? 9 responses included

bmw, benz, toyota, hyundai, ford etc.
categorical &
nominal
Feb-Apr, 2016
26 / 287
What grade did you get in research methods? Is it A, F, D, B+

or B?
categorical &
ordinal
Feb-Apr, 2016
27 / 287
Practice Questions (1)
Foreign Aairs
magazine conducted a survey to develop a prole
of its subscribers (Foreign Aairs website, February 23, 2008).

The following questions were asked. Comment on whether each
question provides categorical or quantitative data and indicate
the level of measurement.
Feb-Apr, 2016
28 / 287
Practice Questions (2)
How many nights have you stayed in a hotel?

Where do you purchase books? Three options were listed:
Bookstore, Internet, and Book Club.
Do you own a car?
For foreign trips taken in the past three years, what was your
destination? Seven international destinations were listed.
Feb-Apr, 2016
29 / 287
Session 2 Overview
Whether in our personal conversations, business life or academic

research, we usually make a tentative assumption about the
whole. But making a statement is one thing. Testing its
authenticity is another thing.
This session examines the tools needed to hypothesize &
conclude given there is only 1 sample.
Feb-Apr, 2016
30 / 287
Session 2 Overview
null & alternative hypotheses

t-test statistic, critical value, p-value
level of signicance
statistically & practical signicance
Feb-Apr, 2016
31 / 287
Reading List
of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Learning
Pages 344-379 of Lane, D. (2003). Online Statistics Education: A
and Telecommunications 2003 (pp. 1317-1320). Chesapeake, VA:
Association for the Advancement of Computing in Education
(AACE).Retrieved January 28, 2015 from http://www.editlib.org/p/14001
Chapter 9
Feb-Apr, 2016
32 / 287
What is a hypothesis?
A statement, claim or assumption about a population parameter

such as the average, variance, standard deviation
A hypothesis can be a statement or a question:
Income has a positive eect on consumption
Are socially responsible rms are more protable?
Feb-Apr, 2016
33 / 287
STEPS in Hypothesis Testing

1 Write null
H0
& alternative
Ha
hypotheses
2 Compute the test statistic, i.e. the calculated value

3 Find the critical value, (i.e. tabulated value) from the table using
the degrees of freedom (d.f.)
4 Find the p-value using the calculated value

5 Compare the
6 Compare
calculated vs. tabulated . Reject

p value vs. . Reject H0 if p <
H0 if
cal > tab
7 Conclude both statistically & practically!

Feb-Apr, 2016
34 / 287
What is the null hypothesis Ho ?

The
H0
is the populative, authoritative statement put forward by
a maker (or inventor, researcher etc) either because it is believed

to be true or because it is to be used as a basis for argument, but
has not been proved.
e.g. in a clinical trial of a new drug,
H0
might be: new drug is
no better, on average, than the current drug.i.e.
H 0:
there is no
dierence between the 2 drugs on average.
Feb-Apr, 2016
35 / 287
What is the alternative hypothesis Ha?
Ha
is the alternative, opposite statement which reects that
there will be an observed eect for our trial.
Ha
is more or less, the direct mirror image of
H 0.
Feb-Apr, 2016
36 / 287
Key note about the Ha
At times, the statement given in the story is already in the
Ha
alternative format (
).
.
And you have to write the
Ha
before knowing the
H0
Feb-Apr, 2016
37 / 287
Tail of test
The form of
Ha
can be either one-tailed or two-tailed.
Indicators of one-tailed test: or > & or
<
greater than, below, beyond, not fewer than,not above,

better than, worse than, at least, at most, not at least,
not younger than, above.
Indicators of two-tailed test: = or
6=
not equal to, not dierent from, diers from, the same as,
does not vary from, on, of , was.
Feb-Apr, 2016
38 / 287
Sign for one-tailed test

Lower Tail Test
H 0 : H
Ha : <H
Upper Tail Test
H 0 : H
Ha : >H
Feb-Apr, 2016
39 / 287
Let's nd the H0 & Ha in the statements

Nescafe Ghana claims that since the population mean lling
weight is at least 3 lb/can, consumers' rights are protected.
H 0 : 3
Ha : <3
MaxFlight uses a high-technology manufacturing process to
produce golf balls with a mean driving distance of 295 yards.
___________
___________
Feb-Apr, 2016
40 / 287
Find the H0 & Ha (2)
According to the CEO, the new BMW car can run more than 24
miles per gallon
___________
___________
Feb-Apr, 2016
41 / 287
Ghana Telecom manager thinks that customer monthly cell

phone bill average not at most Ghc52 per month.
H 0 : 52 average is not over Ghc52 per month

Ha : > 52 average is over Ghc52 per month
Feb-Apr, 2016
42 / 287
The minimum wage in Ghana this 2015 is not at most GHc 6.

___________
___________
Feb-Apr, 2016
43 / 287
Suppose that we want to test the hypothesis that the climate has
changed since industrializatoin. If the mean temperature
throughout history is not as improved as 50 degrees, what is the
null & alternative hypotheses?
H 0 : 50
Ha : > 50
Feb-Apr, 2016
44 / 287
Session 3 Overview
We are ready to test the authenticity of claims or hypotheses.

The session demonstrates the process of hypothesizing, testing &
concluding given a single sample.
Feb-Apr, 2016
45 / 287
Session 3 Outline
real life example: SME

t-test statistic
critical value, p-value
level of signicance,
statistically & practical signicance
Feb-Apr, 2016
46 / 287
Reading List
Chap 9 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Learning
Chap 16 Buglear, John, 2005, Quantitative Methods for Business: The A-Z
of QM
Chap 9 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics for
Business and Economics, 8/E, Pearson
Feb-Apr, 2016
47 / 287
, & test statistic
The level of signicance
is the probability of making a Type I
error when the H0 is true as an equality.

Examples of
0.01 (1%), 0.05 (5%), or 0.10 (10%)
Compute the test statistic:
t=
x
s

,
z=
Feb-Apr, 2016
48 / 287
z or t, df & critical value
Use
if sample is large (
30)
or
Determine the degrees of freedom:

Determine the critical value:
< 30).
if sample is small (
df = n 1
tdf =,n1
Feb-Apr, 2016
49 / 287
Compute p-value & compare
p-Value Approach
t calculated or z calculated
p value .
Reject H 0 if p value <
Use the
to compute the
Feb-Apr, 2016
50 / 287
Compute critical value & compare
Critical Value Approach
t test , use & df to nd

also called t tabulated .
For z , use only
Reject H 0 if t cal > t tab
Under
the critical value,
tdf =,n1
Feb-Apr, 2016
51 / 287
Z or t? Decision Rule
Always reject the
Ho ,
if the calculated (test statistic) is >
tabulated (critical value).

This implies rejecting
Ho
when the
P value <
Feb-Apr, 2016
52 / 287
t table
t Table
cum. prob
one-tail
two-tails
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
80
100
1000
.50
.75
.80
.85
.90
.95
.975
.99
.995
.999
.9995
0.50
1.00
0.25
0.50
0.20
0.40
0.15
0.30
0.10
0.20
0.05
0.10
0.025
0.05
0.01
0.02
0.005
0.01
0.001
0.002
0.0005
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.681
0.679
0.678
0.677
0.675
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
0.851
0.848
0.846
0.845
0.842
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.050
1.045
1.043
1.042
1.037
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.296
1.292
1.290
1.282
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.671
1.664
1.660
1.646
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.000
1.990
1.984
1.962
31.82
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.374
2.364
2.330
63.66
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.660
2.639
2.626
2.581
318.31
22.327
10.215
7.173
5.893
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.610
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
3.307
3.232
3.195
3.174
3.098
636.62
31.599
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.551
3.460
3.416
3.390
3.300
0.000
0.674
0.842
1.036
1.282
1.645
1.960
2.326
2.576
3.090
0%
50%
60%
70%
80%
90%
95%
Confidence Level
98%
99%
99.8%
0.001
3.291
Feb-Apr, 2016
99.9%
53 / 287
Getting t critical & p value from t table (1)
t calculated to get the p value for the test.

And use the df to nd the t tabulated .
For e.g., if t calculated = 2.66, to get the p value , scan
the body of the t-table for the 2.66
Always use the
Feb-Apr, 2016
54 / 287
Getting t critical & p value from t table (2)
This corresponds to 2.660 leading to
p value
of 0.005 (read on
top. don't forget if it is one-tail or two-tail)

Do the same for the 1% & 10% for either two-tail or one-tail.
Feb-Apr, 2016
55 / 287
Applied example 1
Suppose that you are thinking of taking over an SME. The
current owner claims the weekly turnover of each existing SME is
not dierent from
GH5000
and at this level you are willing to
take on the SME. You would be more cautious if the turnover is

below this gure. You examine the books of
26
GH4900
SMEs chosen at
random and nd that the average turnover was

standard deviation
GH280
with
. What would you do?
Feb-Apr, 2016
56 / 287
Applied example 1 sol
= 5000, x = 4900, n = 26, = 280,

H0 : = 5000
Ha : 6= 5000
and
= 0.05,
Do you use t-stat or z-stat?
Feb-Apr, 2016
57 / 287
Applied example 1 sol..
t=
t=
x
s
n
49005000
280
= 1.82
26
t calculated or t test statistic

the t tabulated (t-critical) & compare
This is the
Now, nd
Feb-Apr, 2016
58 / 287
Applied e.g. 1 sol using critcal value

Ha ). Read df = n 1 &
= t25,0.05 = 2.060 is get t tabulated
Two-tail (see
tn1,
What is your conclusion?

As the
at 5%
t cal. of |1.82| is < t tab. of 2.06 we don't reject H0

& conclude that weekly turnover is GHc5000.
Nothing prevents you from buying the SME!!
Feb-Apr, 2016
59 / 287
Applied e.g. 1 sol: using p-value

For
p value ,
use
t cal. = |1.82|.
Scan body of table, the
pool
Figure falls between 1.812 & 1.833
Either way, you get 0.1 (at the top) as its a two-tail
CONCLUDE
reject
H0
: As p-value of 0.1
>
t table .
alpha of 0.05 we do not
at the 0 05 signicance level.
Nothing prevents you from buying the SME!!
Feb-Apr, 2016
60 / 287
Example 2: one sample test

Diwoeasem, a tomato grower has developed a new variety of
tomato. He claim that the average yield per plant is at least 4kg
of fruit. A gardening magazine tests this claim by growing some
plants, & measuring the yield, obtained 0.69 standard deviation
(see below). Does this data support Diwoeasem's claim?
Formulate the hypotheses. Use both the p-value and critical
value approaches.
Feb-Apr, 2016
61 / 287
Example 2: one sample test..
3.6
4.2
3.8
2.7
4.0
4.8
2.7
3.9
4.2
4.5
Feb-Apr, 2016
62 / 287
Example 2: Sol (hypotheses)
What are the hypotheses?? ____
H0 : 4kg
H0 : < 4kg
Feb-Apr, 2016
63 / 287
Example 2: Sol (, z/t, tails)
Signicance level
= 5% = 0.05
Use 5% (default) even if alpha is not given

Which test is appropriate. . . .Why?
Is this a one tail or a two tail test?
Feb-Apr, 2016
64 / 287
Example 2: Sol (compute)
= 5000, x = 4900, n = 26, = 280,

t = x
s
and
= 0.05
t=
3.844
0.69
= 0.73
10
This is the
t calculated
or
t test
statistic
Feb-Apr, 2016
65 / 287
Example 2: Sol (critical value)
tn1, = t0.05,9 = 1.833

is the
t tabulated
Feb-Apr, 2016
66 / 287
Example 2: Sol (critical value & conclusion)
Since the
t statistic
of |0.73| is <
t critical
of 1.833 we
don't reject the H0 at the 5% signicance level.

Data strongly suggests that the true average yield is indeed at
least 4 kg.
Gardening magazine must support diwoeasem's claim.
Dierence of 4-3.84=0.16kg is just by chance.
Feb-Apr, 2016
67 / 287
Example 2: Sol p-value approach

To nd the
p value ,
use
t cal. = |0.73|.
Search from
swimming pool of table

Falls between 0.727 & 0.741. What tail?
Either way, you get 0.25 (at the top), a one-tail.
CONCLUDE
p value = 0.25
5% sig. level.
: As
reject H0 at the
>=
0 05, we do not
True average yield is not below 4 kg of fruit.
Feb-Apr, 2016
68 / 287
Practice Question 1
A shareholders' group, in lodging a protest, claimed that the

mean tenure for the CEO was not below 9 years. A survey of
companies reported in The Wall Street Journal found a sample
mean tenure of 7 27 years for CEOs with a standard deviation of
6 38 years (The Wall Street Journal, January 2, 2007).
And(pg357,28)
Feb-Apr, 2016
69 / 287
Practice Question 1 cont...
1 Formulate the hypothesis that can be used to challenge the

validity of the claim made by the shareholders' group.
2 Assume 20 companies were included in the sample. What is
t tabulated & p value for your hypothesis test?

At = 1% , what is your statistical & practical conclusion
the
using
p-value & critical value approaches?
Feb-Apr, 2016
70 / 287
Practice Question 2
A travel magazine wants to classify transatlantic gateway airports
using mean rating for the population of travelers. A scale with a
low score of 0 & a high score of 10 is used & airports with a
population mean rating above 7 will be designated as superior
airports. The magazine sta sampled 16 travelers at each
airport. The sample for London's Heathrow Airport provided a
mean rating of 7.25 with a standard deviation of 1.052. Should
Heathrow be designated as superior airport?
( = 10%).
Feb-Apr, 2016
71 / 287
Practice Question 3
According to the label on packets of popcorns there should be 25
g of popcorns in every packet. The standard deviation of the
weight of popcorns per packet is known to be 2.2 g & the
weights are normally distributed. The mean weight of popcorns
in a random sample of 15 packets is 23.5 g. Test the hypothesis
that the information on the label is valid using a 1% level of
condence.
Feb-Apr, 2016
72 / 287
Session 4 Overview
We extend the 1 sample/population analysis to a 2-sample study,
when the dierence between the 2 population means is
important. .
For example, we may want to test for the eect of customer
training workshop on the sales of salespersons in a company or
the impact of a reform in an industry. Policy prescriptions may
be oered based on the ndings.
Feb-Apr, 2016
73 / 287
Session 4 Overview
hypotheses on dierence between 2 population means using

bivariate paired data
hypotheses on dierence between 2 population means using
independent samples
draw appropriate conclusions
Feb-Apr, 2016
74 / 287
Reading List
Learning
and Telecommunications 2003 (pp. 1317-1320).
Chap 17 Buglear, John, 2005, Quantitative Methods for Business: The A-Z
of QM
Feb-Apr, 2016
75 / 287
Two Sample Tests
Feb-Apr, 2016
76 / 287
Parametric vs. Nonparametric tests

Parametric test
Nonparametric
One-sample t-test
Binomial
Paired t-test (dept)
Wilcoxon signed-rank test
McNemar's Chi-square test
Independent t-test
Mann-Whitney U or Wil. ranksum
Pearson's correlation
Spearman's corr (xy)
Kendall tau rank corr (xyz)
ANOVA (>2 indep. grps)
Kruskal-Wallis test
Repeated meas. ANOVA
Friedman Test, Cochran Q
Feb-Apr, 2016
77 / 287
Paired/Dependent Samples
It is used to test the dierences between population means using

2 related/paired/matched/before & after samples
Population dierence =
Assumption: data or dierences
(d )
between paired values are
normally distributed
Feb-Apr, 2016
78 / 287
Types of hypotheses for mean of population of dierences
H0
Ha
H0 : d = 0 Ha : d 6= 0 (not equal)
H0 : d 0 Ha : d > 0 (greater than)
H0 : d 0 Ha : d < 0 (less than)
Find tn1, =?? i.e. t-tabulated
Type of test
Two-sided
One-sided
One-sided
Feb-Apr, 2016
79 / 287
paired t-test for repeated samples

r
t=
d
S
d
n
Sd =
(dd)
n1
where
d = ndi mean of sample dierences

Sd = sample standard dev. of dierences
n = the sample size (number of pairs)
Feb-Apr, 2016
80 / 287
Decision Rule
Reject
Reject
Reject
If
H0
H0
H0
H0
if
if
if
p value <
t calculated > t tabulated
t statistic > critical value
is rejected, there must be a signicant dierence (eect)
between 2 samples
Feb-Apr, 2016
81 / 287
Matched Pairs Example 1

Assume you send your salespeople to a customer service
training workshop. Has the training made a dierence in the
number of complaints? You collect the following data:
t=
r
& Sd =
(dd)
n1
di
n
S
d
n
d
Sd / n
Feb-Apr, 2016
82 / 287
Example 1: Sol, hypothesis
After - Before
H0 : d = 0, no dierence in No. of complaints; training is bogus

Ha : d 6= 0, training is eective/postive
Feb-Apr, 2016
83 / 287
Example 1: Sol nd d & Sd
Feb-Apr, 2016
84 / 287
Example 1: Sol nd d & Sd ..
Number of Complaints: (2) - (1)

Salesperson Before (1) After (2) Difference, di
C.B.
T.F.
M.H.
R.K.
M.O.
6
20
3
0
4
4
6
2
0
0
- 2
-14
- 1
0
- 4
-21
d = ni
= - 4.2
Sd =
(d d)
n 1
= 5.67
Feb-Apr, 2016
85 / 287
t table
t Table
cum. prob
one-tail
two-tails
df
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
40
60
80
100
.50
.75
.80
.85
.90
.95
.975
.99
.995
.999
.9995
0.50
1.00
0.25
0.50
0.20
0.40
0.15
0.30
0.10
0.20
0.05
0.10
0.025
0.05
0.01
0.02
0.005
0.01
0.001
0.002
0.0005
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
0.816
0.765
0.741
0.727
0.718
0.711
0.706
0.703
0.700
0.697
0.695
0.694
0.692
0.691
0.690
0.689
0.688
0.688
0.687
0.686
0.686
0.685
0.685
0.684
0.684
0.684
0.683
0.683
0.683
0.681
0.679
0.678
0.677
1.376
1.061
0.978
0.941
0.920
0.906
0.896
0.889
0.883
0.879
0.876
0.873
0.870
0.868
0.866
0.865
0.863
0.862
0.861
0.860
0.859
0.858
0.858
0.857
0.856
0.856
0.855
0.855
0.854
0.854
0.851
0.848
0.846
0.845
1.963
1.386
1.250
1.190
1.156
1.134
1.119
1.108
1.100
1.093
1.088
1.083
1.079
1.076
1.074
1.071
1.069
1.067
1.066
1.064
1.063
1.061
1.060
1.059
1.058
1.058
1.057
1.056
1.055
1.055
1.050
1.045
1.043
1.042
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.363
1.356
1.350
1.345
1.341
1.337
1.333
1.330
1.328
1.325
1.323
1.321
1.319
1.318
1.316
1.315
1.314
1.313
1.311
1.310
1.303
1.296
1.292
1.290
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.796
1.782
1.771
1.761
1.753
1.746
1.740
1.734
1.729
1.725
1.721
1.717
1.714
1.711
1.708
1.706
1.703
1.701
1.699
1.697
1.684
1.671
1.664
1.660
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
2.201
2.179
2.160
2.145
2.131
2.120
2.110
2.101
2.093
2.086
2.080
2.074
2.069
2.064
2.060
2.056
2.052
2.048
2.045
2.042
2.021
2.000
1.990
1.984
31.82
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.718
2.681
2.650
2.624
2.602
2.583
2.567
2.552
2.539
2.528
2.518
2.508
2.500
2.492
2.485
2.479
2.473
2.467
2.462
2.457
2.423
2.390
2.374
2.364
63.66
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
3.106
3.055
3.012
2.977
2.947
2.921
2.898
2.878
2.861
2.845
2.831
2.819
2.807
2.797
2.787
2.779
2.771
2.763
2.756
2.750
2.704
2.660
2.639
2.626
318.31
22.327
10.215
7.173
5.893
5.208
4.785
4.501
4.297
4.144
4.025
3.930
3.852
3.787
3.733
3.686
3.646
3.610
3.579
3.552
3.527
3.505
3.485
3.467
3.450
3.435
3.421
3.408
3.396
3.385
3.307
3.232
3.195
3.174
636.62
31.599
12.924
8.610
6.869
5.959
5.408
5.041
4.781
4.587
4.437
4.318
4.221
4.140
4.073
4.015
3.965
3.922
3.883
3.850
3.819
3.792
3.768
3.745
3.725
3.707
3.690
3.674
3.659
3.646
3.551
3.460
3.416
3.390
0.001
Feb-Apr, 2016
86 / 287
Example 1: Sol nd t & t-critical
t=
t=
d
Sd / n
4.
2
5.67/ 5
1 66
t = . i.e. t-calculated
tn1, = t51,0.05 = .
i.e.
2 776
t-tabulated
Feb-Apr, 2016
87 / 287
Example 1: Sol nd the p-value
Use
1 66|
t=| .
to nd p-value. Scan the body of the table
Remember it is a two-tailed test

Value is 1.660 &
p value = 0.10
Feb-Apr, 2016
88 / 287
Example 1: Sol compare & conclude
Has the training made a dierence? Due to chance?

Since the
p = 0.10
t statistic of |1.66|
= 0.05) we don't
>
is <
t critical
of 2.776 (or
reject the H0 at the 5%
signicance level.
No signicant dierence between complaints before & after.
Training was bogus. Any dierence was by chance.
Feb-Apr, 2016
89 / 287
Summary of results
Has the training made a difference in the number of
complaints (at the = 0.05 level)?

H0: x y = 0
H1: x y 0
= .05
d = - 4.2
Critical Value = 2.776

d.f. = n 1 = 4
Reject
Reject
/2
- 2.776
2.776
- 1.66
Decision: Do not reject H0

(t stat is not in the reject region)
Test Statistic:
t=
d
4.2
=
= 1.66
sd/ n 5.67/ 5
Feb-Apr, 2016
90 / 287
Doing paired t-test in R software

Arrange data the normal way from excel. You can ignore the
names of salespersons
complaint=read.delim('clipboard')#load data from excel
complaint #view the data. feel it
boxplot(complaint$after,complaint$before, col="darkgreen")
#can check boxplot of the data
t.test(complaint$after,complaint$before, paired=TRUE)#run
the paired t test
Learn the independent t-test for independent samples on your
own
Feb-Apr, 2016
91 / 287
Practice Question 1
A new therapy has been devised which is supposed to
lower blood pressure. The systolic blood pressure of 10
patients were taken before and after completing the course
(see below). Does this therapy work? Use a significance
level of 0.05.
Before After Difference
120
130 10
Before After Difference

129
129 0
131
125 -6
136
128 -8
122
124 2
138
129 -9
139
130 -9
131
132 1
123
129 6
125
130 5
Feb-Apr, 2016
92 / 287
Practice Question 2
Tweaa is the VC of a large manufacturing company. He recently
noticed an increase in absenteeism that he thinks is related to
the general health of employees. Four years ago, in an attempt
to improve the situation, he began a tness program in which
employees exercise during their lunch hour. To evaluate the
program, he randomly samples some participants & found the
number of days each was absent. Below are the results. At 0.05
signicance level, did the program reduce absenteeism?
Feb-Apr, 2016
93 / 287
Abseenteeism before & after tness prog
Feb-Apr, 2016
94 / 287
Practice Question 2 cont'

a) State the null & alternative hypotheses.
b) What is the critical value of the test statistic?
c) What is the value of the test statistic?
d) Decide using the critical value approach.
e) What is the p-value?
g) Did the tness program reduce absenteeism at the 5%
signicance level? Conclude practically.
Feb-Apr, 2016
95 / 287
Session 5 Overview
In some practical applications normality axiom is not tenable
especially when we have a wide range of distributions of the
parent population.
In such a case, we use nonparametric tests or distribution free
tests.
In this session we nonparametric tests for testing equality of
means/medians of 2 population distributions
Feb-Apr, 2016
96 / 287
Session 5 Overview

Mann-Whitney-U test or Wilcoxon rank sum test
Draw appropriate conclusions
Feb-Apr, 2016
97 / 287
Reading List
Learning
Feb-Apr, 2016
98 / 287

Parametric test
Nonparametric
One-sample t-test
Binomial
Independent t-test
Kruskal-Wallis test
Feb-Apr, 2016
99 / 287
Nonparametric Mann-Whitney U test
The Mann-Whitney U test or the Wilcoxon Rank Sum test is for

testing the equality of means in 2 independent / unpaired
samples. Mann-Whitney-Wilcoxon (MWW) test.
Feb-Apr, 2016
100 / 287
Nonparametric Mann-Whitney U test
The main idea is to test whether 2 samples come from the same
population (i.e., if 2 populations have the same shape) by
comparing the ranks or ordinal values of the observations.
Some investigators interpret this test as comparing the medians
between the 2 populations.
Feb-Apr, 2016
101 / 287
H 0 & Ha for Mann-Whitney U (1)
H 0 : m1 m2 = 0
Ha : m1 m2 6= 0
If median (m) of sample
Ha : m1 m2 > 0 or
Ha : m1 > m2
1 is > sample 2,
Ha
becomes
Feb-Apr, 2016
102 / 287
H0 :
Ha :
The 2 populations are identical

The 2 populations are dierent
Example
H0 :
Ha :
eciency do not dier between foreign & domestic banks.

foreign banks have higher eciency scores than domestic
banks.
Feb-Apr, 2016
103 / 287
Median class size for Math is larger than median class size for
English. Write
H0: MedianM
H 0 & Ha ____
MedianE (Math
median is not greater than
English median)
HA: MedianM > MedianE (Math median is larger)
Feb-Apr, 2016
104 / 287
Median class size for Math is not at least that of the median
class size for English. Write Ho and Ha ____
H0: MedianM
MedianE
HA: MedianM < MedianE
Feb-Apr, 2016
105 / 287
Example 1: Mann-Whitney U (1)

Korle Bu hospital is undertaking a clinical trial designed to
investigate the eectiveness of a new drug to reduce symptoms
of asthma in children. Participants are asked to record the
number of episodes of shortness of breath (dyspnea) over a 1
week period following receipt of the assigned treatment. The
non-normally distributed data are shown below. Is there a
dierence in the number of episodes of shortness of breath over
the 1 week period in participants receiving the new drug as
compared to those receiving the placebo?
Feb-Apr, 2016
106 / 287
By inspection, it appears that participants receiving the placebo

have more episodes of shortness of breath. i.e. suer more from
dyspnea
BUT, is this statistically signicant?
Feb-Apr, 2016
107 / 287
Plot
&,E
^
E
Feb-Apr, 2016
108 / 287
In a normal research, check skewness 1st.

Note:
n1
may be >
n2 .
Check normality: histogram above
Given non-normality & small sample size
(n1 = n2 = 5),
nonparametric test is OK.

Hypotheses??
H 0 : m1 m2 = 0 ; Ha : m1 m2 6= 0
Feb-Apr, 2016
109 / 287
Step 1: Assign ranks
Order the whole data
(n = 10)
from smallest to largest, ignoring
which group they belong to.

Assign ranks from 1 to 10.
Also, track the group assignments in the total sample.
Feb-Apr, 2016
110 / 287
MWW U test procedure
Feb-Apr, 2016
111 / 287
Step 2: Assign ranks
R1 =
R2 =
= 37
drug) = 18
sum of the ranks in group 1 (placebo)

sum of the ranks in group 2 (new
1)
(R1 + R2 ) = n(n+
2
37 + 18 = (10 11) /2 = 55
Check if:
Feb-Apr, 2016
112 / 287
Test Statistic for the Mann Whitney U Test
The Mann-Whitney
U1 = R1
U2 = R2
calculated is the test statistic
n1 (n1 +1)
2
n2 (n2 +1)
2
Feb-Apr, 2016
113 / 287
Details of U
n1 (n1 +1)
2
if
all
sum of the ranks of the observations that would result
the observations in the sample 1 were smaller than the
observations in sample 2.
n2 (n2 +1)
2
if
all
sum of the ranks of the observations that would result
the observations in the sample 2 were smaller than the
observations in sample 1.
Feb-Apr, 2016
114 / 287
Results
Feb-Apr, 2016
115 / 287
Finding critical values
The critical value or U tabulated is

Find
Find
n1
n2
Un ,n ,
1
in group 1 (smallest group) along the top of the chart.

in group 2 (largest group) along the side of the chart.
Consider carefully when choosing the
U tab
Feb-Apr, 2016
116 / 287
Decision Rule for U

Two-sided test:
One-sided test:
if
U1 Un1,n2,
One-sided test:
if
(HA : m1 6= m2) reject H0 if Umin Un1,n2,

(HA : m1 < m2) reject H0 if U1 is too small i.e.
(HA : m1 > m2)
reject
H0
if
U2
is too small i.e.
U2 Un1,n2,
In summary, If the smaller calculated U is
tabulated U, reject
Ho & conclude there is a dierence!
Feb-Apr, 2016
117 / 287
Back to example
U1 = R1
U2 = R2
n1 (n1 +1)
2
n2 (n2 +1)
2
= 37
= 18
5(5+1)
2
5(5+1)
2
= 22
=3
(smaller)
Check always IF:
U1 + U2 = n1 n2
22 + 3 = 5 5 = 25
Feb-Apr, 2016
118 / 287
Decision
Recall: For a two-tail,
Umin Un1,n2,/2
Umin = 3> U5,5,0.05 = 2
We can't reject Ho because 3 >
Reject H0 if
2.
Feb-Apr, 2016
119 / 287
Conclusion
Conclude: we have statistically signicant evidence at
= 0.05,
to show that the two medians of numbers of episodes of

shortness of breath are equal.
That is, no dierence in the medians . Our example is unique.
On the surface, sample data suggest a dierence, but the
are
too small to conclude that there is a statistically sig.dierence.
Feb-Apr, 2016
120 / 287
R for MWU & Wilcoxon
1st: Arrange data long (vertical, with 1,0 for next column)
wm <-read.delim('clipboard') ## load data
wm ## view data
boxplot(wm, col="green")## quick boxplot
wilcox.test(drug ~ group, conf.int = TRUE, paired = FALSE,
data=wm)## compute mann-whitney test
Feb-Apr, 2016
121 / 287
R interpretation for U (optional)

## result: W = 22, p-value = 0.05855 alternative
hypothesis: true location shift is not equal to 0.
So there is no difference. The two samples are the
same. We can't reject H0 as p-value >
wilcox.test(drug ~ group, alternative =
"two.sided", conf.int = TRUE, paired = TRUE,
data=wm)## compute Wilcoxon signed rank test with
continuity correction
Feb-Apr, 2016
122 / 287
Practice Assignment 1
A random sample of starting monthly salaries for graduates from
2 Ghanaian universities are below (GHc1000s):
UG KNUST
30
28.5
35
38
29
30.5
37.5
26
32
37
40
29
33
32
Feb-Apr, 2016
123 / 287
Practice Assignment 1 cont.
Is there enough evidence at the 5% signicance level to say that

the median starting salary for UG graduates is higher than the
median starting salary for KNUST graduates? Solve by hand
&/or any software (e.g. R)
Feb-Apr, 2016
124 / 287
Session 6 Overview
We expand the samples to 3 or more & explores if signicant

dierence exist among them.
This is achieved via ANOVA (dependent and independent), &
their nonparametric analogues Friedman test & Kruskal Wallis
test
Feb-Apr, 2016
125 / 287
Session 6 outline
null & alternative hypotheses in ANOVA

repeated ANOVA & independent ANOVA
Within & Between variations, F statistic,
ANOVA in excel & R
Feb-Apr, 2016
126 / 287
Reading List
Learning
Feb-Apr, 2016
127 / 287
T-Test
Feb-Apr, 2016
128 / 287
ANOVA
Feb-Apr, 2016
129 / 287
MANOVA
Feb-Apr, 2016
130 / 287

Parametric test
Nonparametric
One-sample t-test
Binomial
Independent t-test
Kruskal-Wallis test
Feb-Apr, 2016
131 / 287
Assumptions for t-test, anova, manova

Normality
Continuous or scale or ratio data
Categorical groupings
Independent variables across groups
Independent random sampling
Homogeneity of variance for dependent variable
Feb-Apr, 2016
132 / 287
ANOVA
One-way ANOVA: One factor e.g. smoking status (never,

former, current)
Two-way ANOVA: Two factors e.g. gender (M/F) & smoking
status (never, former, current)
Three-way ANOVA: Three factors e.g. gender, smoking &
beer consumption
Feb-Apr, 2016
133 / 287
ANOVA test - dependent/repeated measures

One-way repeated (correlated) measures: single group on which
you have measured something a few times.
It's the nonparametric analogue of the dependent Friedman test
For example, in research class, I give a test at the start of a
topic, at the end of the topic and at the end of the subject.
I can use a one-way dependent measures ANOVA to see if
student test performance changed over time
Feb-Apr, 2016
134 / 287
ANOVA test - Independent

In an independent groups test, the subjects in the groups are
dierent people.
In a dependent/repeated measures case, the same subjects are
being tested under dierent conditions. They are the same
people.
(It doesn't have to be people; they could be owers, car engines,
rms or even barrels of beer)
Feb-Apr, 2016
135 / 287
Is this dependent or independent sample data

Compare 5 subjects each tested with 4 drugs. drug is the
repeated variable here
Feb-Apr, 2016
136 / 287
ANOVA (independent sample)
One-Way Analysis of Variance measures the dierence among the

means of three or more groups
Examples:
ROA of foreign, domestic & state banks

Expected mileage for 5 brands of tires
Feb-Apr, 2016
137 / 287
ANOVA (hypotheses)
H0 : 1 = 2 = 3 = ... = K
All population means are equal
i.e., no variation in means between groups
HA : i 6= j
for at least one
i, j
pair
At least 1 population mean is dierent

i.e., there is variation between groups
Does not mean
are dierent (some pairs may be the same)
Feb-Apr, 2016
138 / 287
ANOVA (3)
Feb-Apr, 2016
139 / 287
ANOVA (4)
Feb-Apr, 2016
140 / 287
ANOVA chart
Feb-Apr, 2016
141 / 287
Details of ANOVA chart (1)
SS = Sum of Squares
df = degrees of freedom
MS = Mean Squares
n = sum of the sample sizes
K = number of groups
from all groups
Feb-Apr, 2016
142 / 287
Explaning Table Elements
Feb-Apr, 2016
143 / 287
SST
SST =
ni
K X
X
(xij x)
i=1 j=1
ni = number of observations in group i

xij =Pj th observation from group i
x = Nni xi overall sample mean
2
2
2
SST = (x11 x) + (x12 x) + ... + (xKnK x)
MST = SST
n1
Mean Square Total = SST/df
Feb-Apr, 2016
144 / 287
SSW
SSW =
ni
K X
X
(xij xi )2
i=1 j=1
ni = sample size from group i

xi = sample mean from group i
xij = j th observation in group i
SSW = (x11 x1 )2 + (x12 x1 )2 + ... + (xKnK xK )2
MSW = SSW
nK
Mean Square Within = SSW/df
Feb-Apr, 2016
145 / 287
SSG
SSG =
K
X
ni (
xi x)
i=1
ni = sample size from group i

xi = sample mean from group i
x = grand mean (mean of all data values)
2
2
2
SSG = n1 (
x1 x) + n12 (
x2 x) + ... + nK (
xK x)
MSG = KSSG
1
Mean Square Between Groups = SSG /df
Feb-Apr, 2016
146 / 287
ANOVA F-Test Statistic
Feb-Apr, 2016
147 / 287
Interpreting the F Statistic
The F statistic is the ratio of the between estimate of variance

and the within estimate of variance
The ratio must always be positive
df1 = K 1
df2 = n K
will typically be small

will typically be large
Decision Rule: Reject
H0
if
F > FK 1,nK ,
Feb-Apr, 2016
148 / 287
Decision: ANOVA
Feb-Apr, 2016
149 / 287
Example of ANOVA
Below are the salaries in thousands of Ghana cedis for three

dierent MBA groups:
RMBA WMBA EMBA

1
Feb-Apr, 2016
150 / 287
ANOVA E.G.: MBA groups Questions

i. Dene null and alternative hypotheses.
ii. Find the critical value of test statistic
iii. Find the value of the test statistic.
iv. What is your decision?
v. Is there a statistically signicant dierence between the mean
salaries of these MBA groups? Prove.
vii. If so, which group earns the least? Conclude practically.
Feb-Apr, 2016
151 / 287
ANOVA Table
Feb-Apr, 2016
152 / 287
MBA groups Questions (1)
i. Dene null and alternative hypotheses.
ANSWER
H 0 : 1 = 2 = 3
Ha : i 6= j for at least
one pair, where
i, j = 1, 2, 3
ii. Find the critical value of F test statistic
Feb-Apr, 2016
153 / 287
Fcritical = FK 1,NK ,
K = number of groups , df1 = K 1, row
N = total sample from all groups , df2 = n K
Ftab = FK 1,NK , = F31,93,0.05 = F2,6,0.05
Ftab = 5.14
, column
Feb-Apr, 2016
154 / 287
ANOVA Table
Feb-Apr, 2016
155 / 287
Recall data
RMBA WMBA EMBA

1
Feb-Apr, 2016
156 / 287
iii. Find the value of the test statistic.

First: Find
x1 =
x2 =
x3 =
xi,
then grand mean
6
=2
3
15
=5
3
24
=8
3
Feb-Apr, 2016
157 / 287

PN
(xij )
N
1+2+3+4+5+6+7+8+9
=5
9
Grand mean:
x=
x=
SSG =
K
X
ni (
xi x)
i=1
SSG = 3(5 2)2 + 3(5 5) + 3(5 8)2

SSG = 54
Feb-Apr, 2016
158 / 287
SSW =
ni
K X
X
(xij xi )2
i=1 j=1
SSW1 = (1 2)2 + (2 2)2 + (3 2)2

SSW1 = 2
Feb-Apr, 2016
159 / 287
SSW2
SSW2
SSW3
SSW3
= (4 5 + (5 5) + (6 5)2
=2
= (7 8)2 + (8 8)2 + (9 8)2
=2
Feb-Apr, 2016
160 / 287
SSW = SSW1 + SSW2 + SSW3

SSW = 2 + 2 + 2
SSW = 6
BG
WG
SS df MS F
54
Feb-Apr, 2016
161 / 287

MSG = KSSG
1 and MSW =
MSG = 54
= 27
2
MSW = 66 = 1
BG
WG
SSW
NK
SS df MS F
54
27
Feb-Apr, 2016
162 / 287

F =
F =
MSG
MSW
27
= 27
1
BG
WG
SS df MS F
54
27
27
iv. What is your decision?

Decision Rule: Reject
H0
at 5% since
F = 27 > Ftab = 5.14
Feb-Apr, 2016
163 / 287
v. Is there a statistically signicant dierence between the mean

salaries of these MBA groups? Prove.
YES since we rejected H0, it means there is a dierence
vii. If so, which group earns the least? Conclude practically.
RMBA with lowest mean of 2
Feb-Apr, 2016
164 / 287
one-way ANOVA in excel
For independent samples:

Data Analysis > ANOVA: Single Factor
For dependent/repeated measures samples:
Data Analysis > ANOVA: Two- Factor Without Replication
Feb-Apr, 2016
165 / 287
MBA groups Excel
Feb-Apr, 2016
166 / 287
Example 2: Attempt
Feb-Apr, 2016
167 / 287
Example 2: Guide (1)
Feb-Apr, 2016
168 / 287

Club 1 Club 2 Club 3
254
234
200
263
218
222
241
235
197
237
227
206
251
216
204
x1 = 249.2
n1 = 5
x2 = 226.0
n2 = 5
x3 = 205.8
n3 = 5
x = 227.0
n = 15
K=3
SSG = 5 (249.2 227)2 + 5 (226 227)2 + 5 (205.8 227)2 = 4716.4

SSW = (254 249.2)2 + (263 249.2)2 ++ (204 205.8)2 = 1119.6
MSG = 4716.4 / (3-1) = 2358.2
MSW = 1119.6 / (15-3) = 93.3
F=
2358.2
= 25.275
93.3
Feb-Apr, 2016
169 / 287
Feb-Apr, 2016
170 / 287
Example 2: Guide in excel
For independent samples:

Data Analysis > ANOVA: Single Factor
For dependent/repeated measures samples:
Data Analysis > ANOVA: Two- Factor Without Replication
Feb-Apr, 2016
171 / 287
Example 2: Guide steps in excel
Feb-Apr, 2016
172 / 287
Example 2: Guide: Excel Output

EXCEL: data | data analysis | ANOVA: single factor
SUMMARY
Groups
Count
Sum
Average
Variance
Club 1
1246
249.2
108.2
Club 2
1130
226
77.5
Club 3
1029
205.8
94.2
ANOVA
Source of
Variation
SS
df
MS
Between
Groups
4716.4
2358.2
Within
Groups
1119.6
12
93.3
Total
5836.0
14
F
25.275
P-value
4.99E-05
F crit
3.89
Feb-Apr, 2016
173 / 287
Arrange data for anova in R
Feb-Apr, 2016
174 / 287
one-way ANOVA in R (1)
## Arrange data in excel with 1st column being ratio variables

(where 1st group numbers comes rst followed by next & so on).
This is for independent sample.
## The 2nd column has group names, with 1st group names
rst.
Feb-Apr, 2016
175 / 287

anovaclub = read.delim(0 clipboard 0 )
## This loads the anova
data from clipboard
save(anovaclub, file = data)# save loaded data

str (anovaclub) ## put data in order or 3 groups as
in excel
(I've done already done this in excel)
anovaclub
## view data
Feb-Apr, 2016
176 / 287
plot(distance ~ club, data=anovaclub) ## Create side-by-side

boxplots of the data. You'll see the variances don't have a
constant mean
results=aov(distance~club,data=anovaclub)## t ANOVA
models or table. # distance is the dep. var. # club = indep var.s
summary(results) ## show your answer
Feb-Apr, 2016
177 / 287
Results
Conclusion?? As
p value = 4.99E 05,
there is a sig. di. in
mean distances of 3 clubs
Feb-Apr, 2016
178 / 287
Summary: Parametric vs Nonparametric tests

Parametric test
Nonparametric
One-sample t-test
Binomial
Independent t-test
Kruskal-Wallis test
Friedman Test
Feb-Apr, 2016
179 / 287
Practice Assigment 1 on ANOVA

Below are the salaries in thousands of Ghana cedis for three
dierent MBA groups:
RMBA
9
7
11
9
12
10
WMBA
13
20
14
13
EMBA
10
9
15
14
15
Feb-Apr, 2016
180 / 287
Practice Assigment 1 on ANOVA cont.
a. Formulate the null and alternative hypothesis

b. What is the decision rule, given the 5% signicance level?
c. Find the critical value of the test statistic
d. Compute the value of the test statistic
Feb-Apr, 2016
181 / 287
Practice Assigment 1 on ANOVA cont..
e. Conclude if there a statistically signicant dierence in the

mean salaries between these MBA groups? If so, which group
earns the least? Conclude practically.
f. Determine what is driving the dierence in means by
performing multiple tests for each pairwise dierence
g. Which Group's mean salary is mostly driving the dierence
Feb-Apr, 2016
182 / 287
Practice Assigment 2 on ANOVA

Dr. Asuo had the students in his research class rate his
performance as Excellent, Good, Fair, or Poor. The rating (i.e.
the treatment) a student gave the doctor was matched with his
or her course grade, which could range from 0 to 100. The
sample information is reported below. Is there a dierence in the
mean score/grade of the students in each of the four rating
categories? Use the .05 signicance level.
Feb-Apr, 2016
183 / 287
Assignment cont.
Feb-Apr, 2016
184 / 287
Assignment cont.
a. Formulate the null and alternative hypothesis

b. What is the decision rule
c. Calculate the critical value of the test statistic
d. Compute the value of the test statistic
Feb-Apr, 2016
185 / 287
Assignment cont.
e. Is there a statistically signicant mean score of the students in

each of the four rating categories? If so, which category of
students scored the most? Conclude practically.
f. Determine what is driving the dierence in means by
performing multiple tests for each pairwise dierence
Feb-Apr, 2016
186 / 287
Session 7 Overview
Research projects and managerial decisions often involve the
linkages between two or more variables. For instance, what is the
relationship between blood pressure and a person's weight? Do
other factors age, stress, diet, exercise etc - apart from weight
aect BP? How do you control these? How do you use each of
them to predict BP? What assumptions must be in place for the
modelling of such an association to exist? This session examines
association via correlation and causality via regression.
Feb-Apr, 2016
187 / 287
Session 7 outline
correlation, simple regression & multiple regression

predictors & outcome variable, scatter plots, error term,
Goodness-of-t, signicant tests, condence intervals, p-value,
assumptions: linearity, normality, heteroskedasticity,
autocorrelation, heterogeneity, multicollinearity etc
Feb-Apr, 2016
188 / 287
Reading List
Chap 14 & 15 of Anderson, D.R., Sweeney, D.J., & Williams, T.A. (2011).
Learning
Chap 2-8 of Gujarati D. (2003), Basic Econometrics, 4th ed
Chap 11 & 12 of Newbold, P., Carlson, W. & Thorne, B (2013) Statistics
for Business and Economics, 8/E, Pearson
Chap 1-7 of Wooldridge, J.M. (2013), Introductory Econometrics: A
Modern Approach, 5th ed
Feb-Apr, 2016
189 / 287
What is Correlation?
Bivariate Correlation: a way of representing the relationship

between 2 variables
Correlations are viewed as strong or weak
brother or sister; how about cousins?

Correlations are viewed also as positive or negative
consumption & income ; price & quantity demanded
Feb-Apr, 2016
190 / 287
Is the link positive of negative?
Feb-Apr, 2016
191 / 287
Is this link positive of negative?
Feb-Apr, 2016
192 / 287
Correlation Coecient
Correlation coecient is a statistic that quanties a relation
between two variables
Falls between -1.00 and 1.00
The absolute value of the number (not the sign) indicates the
strength of the relation
Pearson's product Moment Correlation Coecient ( ) is a

measure of Correlation
Feb-Apr, 2016
193 / 287
Types of Correlations: incl curvilinear
Feb-Apr, 2016
194 / 287
What is Regression?
Regression analysis is a statistical technique used to analyze the
1
1 several
simple multiple
nexus between a
variable &
outcome/ dependent/ explained/ response
variables, leading to
) predictor/independent/explanatory
(
) regression. It deals with
causation whilst correlation deals with association. Variables

must be metric. In certain situations, non-metric (qualitative)
IVs can be incorporated using transformations in a dummy
variable regression e.g.: education's eect on income, impact
ofhours of study on students' grades
Feb-Apr, 2016
195 / 287
Simple vs. Multiple Regression Models
Simple regression:
yi = + xi + ui
Multiple model:
yi = 0 + 1 Ai + 2 Bi + 3 Ci + 4 Di + 5 Ei + ui
i = 1, ..., n = individual (group, country) index .
This is for a cross-sectional data set
Feb-Apr, 2016
196 / 287
Variables in a regression model
yi = response variable for observation i

A, B, ...E = values of k regressors for observation i
ui = random noise or idiosyncratic error for observation i
Feb-Apr, 2016
197 / 287
Real examples: Regression Model vs. Equation
wagei = 0 + 1 educi + 2 experi + 3 posi + 4 abilityi + ui

BPi = 0 + 1 agei + 2 weighti + 3 stressi + 4 pulsei + ui
Qid = 0 + 1 Pi + 2 Yi + 3 Ai + 4 Ci + ui
wage = 150 + 0.85educ + 2.4exp + 0.25pos + 0.05abili
Feb-Apr, 2016
198 / 287
Assumptions
Linearity :
has a linear dependence on
Normality & Homoscedasticity: Errors are stochastic & variances

are homogenous
N 0, 2I
No Autocorrelation: errors are independent / orthogonal

No high multicollinearity, i.e. we need exogeneity of IVs.
Feb-Apr, 2016
199 / 287
Dummy Variables Regression
A dummy variable is a categorical independent variable with 2

levels:
If more than 2 levels, the number of dummy variables needed is
= number of levels - 1
Also called categorical or dichotomous or binary or 0-1 variable.
Feb-Apr, 2016
200 / 287
Interpret dummies: ROA of rms
ROAi =
500 + 20Profit + 30Type + 5Assets

i
where Type is 1 if the rm is domestic or 0 if foreign

Interpret the coecient of prot, assets & type.
On average, ROA was 30 units greater when the rm was
domestic than foreign, given prot & assets.
Feb-Apr, 2016
201 / 287
Example
Brukutu Ventures, a local gin distillery is concerned about the
demand for its favourite gin bitters. Demand in most of its retail
shops has been hit hard due to new entrants into the market.
Management is concerned & wants to determine which
predictors , better explain the quantity demanded (Q) of gin

bitters, apart from the price (P) of gin. Of the determinants of
demand, the following were identied: price of complements (C),
average real income of customers (M) ...
Feb-Apr, 2016
202 / 287
Data: cross-sectional or panel?
Feb-Apr, 2016
203 / 287
Example cont'
location of retail shop (where LU=1 if location is urban or 0 if
rural area); dominant occupation (teaching, shing & trading)
around a retail shop (where OT=1 if the occupation is teaching
or 0 otherwise & OTR=1 if the occupation is trading or 0
otherwise); & nally, the dominant religion (Christian, Muslims
& Buddhists) of the people (where RC=1 if the people are
Christians or 0 otherwise & RM=1 if the people are Muslims or 0
otherwise). Use the correlation matrix & (dummy) regression
output to answer the questions that follow.
Feb-Apr, 2016
204 / 287
Data analysis in excel
Feb-Apr, 2016
205 / 287
How to generate a descriptive statistics
In excel choose: Data Tab, Data Analysis>Descriptive

Statistics> OK> Input Range
Copy all data including headings.
Tick Labels in First Range
Select Output Range & where the results must be
Tick summary statistics>OK
Feb-Apr, 2016
206 / 287
Do descriptive statistics in excel
Feb-Apr, 2016
207 / 287
Pearson: Data A>Correlation>ok>input range
Feb-Apr, 2016
208 / 287
Pearson Correlations: Use it to Ans below
Feb-Apr, 2016
209 / 287
Correlation questions
What's the correlation coecient for A & P, P & S?

Which 3 pairs of variables are perfectly correlated?
Which 2 pairs of variables are mostly strongly correlated with the
response variable?
Q & P (-0.87)
Q & OTR (-0.78)
Feb-Apr, 2016
210 / 287
Correlation questions cont'

Which 3 pairs of variables are mostly multicollinear?
P & OTR (0.76)
P & C (-0.74)
M & RM (-0.73)
Which 3 pairs of variables are least correlated?
M & OT (-0.03)
Feb-Apr, 2016
211 / 287
Correlation questions cont'
OT & RC (0.05)
Q & OT (-0.11)
From matrix, identify 2 very good predictors
P (-0.87)
OTR (-0.78)
Feb-Apr, 2016
212 / 287
Regression in excel: Data Analysis > Regression

>OK>Input Y Range> Input X Range
Feb-Apr, 2016
213 / 287
Regression output
Feb-Apr, 2016
214 / 287
Session 8 Overview: Regression continues
This session continues with an question worth asking and

answering in multiple regression analysis
Feb-Apr, 2016
215 / 287
Regression output specially designed
Feb-Apr, 2016
216 / 287
Goodness of t: Is the model wise?
ESS = Error Sum of Squares: measures the residual variation in

the data that is not explained by the independent variables
RSS = Regression Sum of Squares: measures amount of
variation explained by regression
TSS = Total Sum of Squares
Feb-Apr, 2016
217 / 287
Is model well-t? Coecient of determination R 2
R2
is the fraction of the total sample variation in
that is
explained by the OLS regression line.
RSS
ESS
R2 = TSS
= 1 TSS
0 < R2 < 1
R
Y
If
is large, much variability in
Also, Multiple
R=
R2
is explained by
is the multiple correlation coecient
Feb-Apr, 2016
218 / 287
Adjusted R 2 (1)
When explanatory variables are added to the model, the
R2
never
decreases. But it should, because of the 'curse of dimensionality'.

The wish to penalize models with large
adjusted
Both
R2
&
has motivated an
dened by adjusting for the degrees of freedom.
measure the strength or tness of the regression
model.
Feb-Apr, 2016
219 / 287
Adjusted R 2 (2)
R =1
ESS/(nK 1)
TSS/(n1)
n1
R = 1 (1 R 2 ) nK
1
With the
, as
rises,
RSS
&
DoF
both fall
Feb-Apr, 2016
220 / 287
Some Formulae (1)

K =
tj =
No. of indep. variables (excl intercept)
j
S.E .(^ )
j
j tnK 1, Sj
TSS = ESS + RSS
N = TSSdf + 1
2 = RSS
TSS
Feb-Apr, 2016
221 / 287
Some Formulae (2)
Multiple
R=
MSR =
MSE =
RSS
K
ESS
NK 1
R2
N1
R = 1 (1 R 2 ) NK
1
F = MSR
MSE
FK ,NK 1,
Feb-Apr, 2016
222 / 287
Find missing numbers denoted by letters

G= RSS = TSS - ESS = 34600 - 402.65 = 34197.35
B =
A =
RSS
R 2 = TSS
=
R 2 = 0.99
34,197.35
34600
= 0.99
D = N = TSSdf + 1 = 39 + 1 = 40
E = RSSdf = 8
F = TSSdf = 8 + 31 = 39
Feb-Apr, 2016
223 / 287
Feb-Apr, 2016
224 / 287
Fill in (2)
C =
R = 1 (1 R 2 )
N1
NK 1
C = 1-(1-0.99)(39)/(31) = 0.99
H = MSR=RSS/K = 34197.35/8= 4274.67
I = MSE=ESS/N-K-1 = 402.65/31 = 12.99
J = F=MSR/MSE = 4274.67/12.99 = 329.07
Feb-Apr, 2016
225 / 287
Feb-Apr, 2016
226 / 287
Fill in (3)
tj =
K =
L =
j
S.E .(^ )
j
coeff
SE
= t SE = 1.57 2.01 = 3.16

0.98
SE = t =
1.30 = 0.75
t = SE
= 419.92.43 = 3.95
N, Read N =| 1.35 |, you N = 0.20
O, Read O =| 8.58 |, you N = 0.00
M =
Feb-Apr, 2016
227 / 287
Feb-Apr, 2016
228 / 287
Regression Q&A (1)
Specify Brukutu Ventures' regression model
Q = 0 + 1 P + 2 C + 3 M + 4 LU + 5 OT
+6 OTR + 7 RC + 8 RM + u
Write the estimated regression equation.
Q = 280.27 - 3.16P - 0.11C - 0.98M - 5.11LU - 19.43OT 46.47OTR + 15.55RC - 23.01RM
Feb-Apr, 2016
229 / 287
Feb-Apr, 2016
230 / 287
Regression Q&A (2)
Using just the
p value , is the estimate of price of complements
statistically dierent from 0? Why or why not?

.ANSWER
YES. Its coecient is statistically dierent from 0 (i.e. is
signicant) because
p value = 0.01
<
= 0.05
Feb-Apr, 2016
231 / 287
Feb-Apr, 2016
232 / 287
More regression Q&A on signicance

Using just the p-values, which variables' coecients are
signicantly dierent from zero?
C (0.01)
OT (0.00)
OTR (0.00)
RC (0.00)
RM (0.00)
Feb-Apr, 2016
233 / 287
Regression Q&A (3)
Test
if the coecient of price of gin is statistically signicant.
ANSWER
H0 : 1 = 0 i.e. 6 is not signicant

HA : 1 6= 0 i.e. 6 is signicant (always
a two-tail)
Feb-Apr, 2016
234 / 287
Feb-Apr, 2016
235 / 287
Regression Q&A (4)
ANSWER CONTINUES..
t cal (1 ) = coe/s.e. = -2.01 2.01
t tab = tnK 1, = t4081,0.05 = t31,0.05
= 2.042
Decide & Conclude!!! Recall Rule

As the cal=2.01 < tab=2.04, we don't reject H0 at 5% level, &
conclude that coecient of price of gin is not signicant
Feb-Apr, 2016
236 / 287
Fitness of Model
Is the regression model well t? Interpret the apt measure used
ANSWER
R2
= 0.99 = 99%
Since
R 2 > 50%,
model is t!!!
99% of the variation in the Qd of gin bitters is explained by all

the indep. vaiables
Feb-Apr, 2016
237 / 287
Feb-Apr, 2016
238 / 287
Joint signicance, without hypotheses
Are the coecients jointly signicant?
ANSWER
Use only the signicance F, i.e. p-value for F test.

Reject
H0
If
p value <
As p-v=0.00 < 0.05, all estimates/coecients are jointly

signicant.
Feb-Apr, 2016
239 / 287
Test if coecients are jointly signicant

H0 : 1 = 2 = ... = 8 = 0 (all the coe.s are insig.)
HA : at least one i 6= 0, i = 1 6 (at least 1 X aects Y )
Find calculated F & tabulated F
Reject
H0
if
F =
MSR
MSE
> FK ,nK 1,
Reject H0, since F-cal = 329.07 > F-tab
= 8,31,0.05
cal > tab

= FK ,NK 1,
i.e. if
= 2.26
Feb-Apr, 2016
240 / 287
Feb-Apr, 2016
241 / 287
Predictions & Condence Interval

How much higher is the Q predicted to be if a price rises by
GHc8?
4BP = 8 3.16 = 25.28

Form a 95% condence interval for the eect of changes in real
income on Q.
That is,
t-tab * std error of
j tnK 1,/2 Sj
3 t4081,/2 S
= ??
Feb-Apr, 2016
242 / 287
CI for real income
0.98 t31,0.05 0.75

0.98 2.042 0.75
0.98 1.5315
CI= -2.51 and 0.55
CI : 2.51 3 0.55
Feb-Apr, 2016
243 / 287
Variance Ination Factors - VIF

(Multicollinearity) Multicollinear variables are 2 independent
variables that are highly associated (linked) with each other
through their correlation coecient. In this case, you don't
include the dependent/outcome/response variable. It is a curse.
VIF measures how much the variance of your coecients is
inated" by multicollinearity
VIF =
1
1Rj2
Feb-Apr, 2016
244 / 287
Variance Ination Factors - VIF cont'
where
Rj2
is the
R2
when we regress Xj on the remaining
independent variables
If there is no collinearity between X1 & X2, then VIF = 1.
As a rule of thumb, VIF > 10 indicates high collinearity
Feb-Apr, 2016
245 / 287
Recall Regression output
Feb-Apr, 2016
246 / 287
Recall Pearson Correlations
Feb-Apr, 2016
247 / 287
Question on multicollinearity
Using only the VIF, which one of the pairs of variables selected
to be multicollinear may be deleted? Justify?
ANSWER
P & OTR, drop P with higher VIF of 24.15

P & C, drop P with higher VIF of 24.15
M & RM, drop RM with a higher VIF of 20.00
Feb-Apr, 2016
248 / 287
Interpretations of coecients: changing factors
Interpret the coecient of income

1 unit rise in average real income of customers will reduce Qd of
Brukutu gin bitters by 0.98 units.
Also, interpret the coecient of price & complements
1 unit rise in the price of C of gin will reduce Qd of Brukutu gin
bitters by 0.11 units.
Feb-Apr, 2016
249 / 287
Interpretations of coecients: dummies

Interpret the coecient of teaching occupation
On the average, holding all other factors constant, Qd of
Brukutu gin bitters is expected to be 19.43 units less if the
people are teachers than shers.
Interpret the coecient of urban location
Brukutu gin bitters is expected to be 5.11 units less if a shop is
located in an urban than a rural area.
Feb-Apr, 2016
250 / 287
Interpretations of coecients: more dummies
Interpret the coecient of christian religion

Brukutu gin bitters is expected to be 15.55 units more if the
people are Christains than Buddhists.
Interpret the coecient of muslim religion
Feb-Apr, 2016
251 / 287
Predictions in dummy regressions

What is the predicted quantity demanded of gin bitters that sells
for GHc10, has a complement price of GHc50, is sold in a shing,
rural shopping area where the Christian population, on average,
earn an income of GHc5?
Q = 280.27 - 3.16(10) - 0.11(50) - 0.98(5) - 5.11(0) - 19.43(0) 46.47(0) + 15.55(1) - 23.01(0)
Q = 253.82
Feb-Apr, 2016
252 / 287
Predictions in dummy regressions..

What is the predicted quantity demanded of free gin bitters that
has a complement price of GHc30, is located in an urban,
teaching area where the Buddhists population, albeit unemployed
they get gifts from friends?
Q = 280.27 - 3.16(0) - 0.11(30) - 0.98(0) - 5.11(1) - 19.43(1) 46.47(0) + 15.55(0) - 23.01(0)
Q = 252.43
Feb-Apr, 2016
253 / 287
Comparing Predictions..
Compare the predicted quantity demanded of gin bitters made by

some Muslim traders in an urban with that of quantity of gin
bitters made by some Buddhist shers in a rural area given that
both gin bitters cost GHc15, have a complement price of GHc15,
& average real income of customers is GHc20? Which factor(s)
appears to be causing a comparative dierence (if any)?
Feb-Apr, 2016
254 / 287
Comparing predictions 1st scenario..

Q = 280.27 - 3.16(15) - 0.11(15) - 0.98(20) - 5.11(1) - 19.43(0)
- 46.47(1) + 15.55(0) - 23.01(1)
Q = 136.79
Feb-Apr, 2016
255 / 287
Comparing predictions 2nd scenario..

Q = 280.27 - 3.16(15) - 0.11(15) - 0.98(20) - 5.11(0) - 19.43(0)
- 46.47(0) + 15.55(0) - 23.01(0) >
Q = 211.62
Feb-Apr, 2016
256 / 287
Conclusion based on 2 scenario..
Qd in 2nd scenario is 74.83 higher (= 211.62 - 136.79) than in

1st scenario
Factors causing the comparative dierence are dierences in
location, occupation & religion
Feb-Apr, 2016
257 / 287
Load data into R & describe it
gin=read.delim("clipboard") #load Qd of gin bitters data from

excel to R. Use only the uncoded categorical variables: e.g.:
loca (urban, rural), occu (teaching, shing & trading) etc. In
excel, hide the variable columns - LU, OT, OTR, RC, & RM
gin #look at your data in R console
summary(gin) #see the summary statistics of the data
Feb-Apr, 2016
258 / 287
Use psych package to plot histogram etc

require(psych) #To do the next analyses, choose `Install
Packages under packages. Then, type `psych' and click install to
install it. Then type
describe(gin) #Get more descriptive or summary statistics .
What's the sample size? mean for complement?
qqnorm(gin$Q, col="blue", lwd=4);qqline(gin$Q, col="red",
lwd=3) #normal probability plot is a graphical tool for
comparing a data set with the normal distribution.
Feb-Apr, 2016
259 / 287
Test for normality & with graphs

hist(gin$Q, col="blue" # draw histogram of Qd of gin bitters &
colour it blue
shapiro.test(gin$Q) #test of normality. H0 is data is normal.:
#H0: samples come from normal distribution". Reject H0 if
p-value <= 0.05. In ours, p-value
= 0.00387 < .
Qd data is
not normal
plot(density(gin$Q)) #density plot for Qd of gin bitters
Feb-Apr, 2016
260 / 287
Scatter plot of Qd vs P
plot(gin$Q, gin$P, main="Qd of gin bitters & P", xlab="P",

ylab="Q", pch=19, col="red", lwd=4) ## where xlab is x-axis
lable & pch= plot character: box, dot, star etc. And "col"=color
to red. Try blue, lightblue, green. pch=21 plots an open circle,
pch=19 plots a solid circle. Try others.
Feb-Apr, 2016
261 / 287
Plots & scatter plot matrix & correlations

pairs(gin) #Create scatterplot matrix of whole data
pairs(~Q+P+C+M+loca+rel, col="red", data=gin)
#scatterplot matrix of some variables. Can change colors.
cor(gin)#do pearson correlation. You can only do this with the
coded data set
OR
cor(gin[,-which(names(gin) %in% c("loca","occu","rel"))]) #do

after removing any nominal value X rst
Feb-Apr, 2016
262 / 287
Generate correlations with/without coded data set

round(cor(gin),2) #Generate correlation matrix correct to 2 d.p.
with coded data set OR
round(cor(gin[,-which(names(gin) %in%
c("loca","occu","rel"))]),2) #remove nominal variables rst
library(Hmisc) #install & load library (Hmisc) rst. Then,
rcorr(as.matrix(gin)) #pearson correlation with p-values. OR
rcorr(as.matrix(gin[,-which(names(gin) %in%
c("loca","occu","rel"))]),2) #remove nominal variables rst
Feb-Apr, 2016
263 / 287
Do a simple regression
ginsimple = lm(Q ~ P, data=gin) #simply regress Q on P.

Q=dep var.; lm=linear model
plot(Q ~ P, col="blue",lwd=2, data=gin,main="my scatterplot
of Q & P & tted line"); abline (ginsimple,lwd=4,col="red") #
add tted reg line to the scatterplot
summary(ginsimple) #show answer of this 2by2
Feb-Apr, 2016
264 / 287
Do Multiple Regression in R
ginmulti = lm(Q ~ P+C+M+LU+OT+OTR+RC+RM,
data=gin) #run OLS. multiple reg with coded data
ginmulti = lm(Q ~ P+C+M+loca+occu+rel, data=gin) #run
OLS with the nominal data
summary(ginmulti) ## display results
round(connt(ginmulti),2) ## Condende Interval using proled
log-likelihood & correct to 2 d.p.
Feb-Apr, 2016
265 / 287
Do Multiple Regression in R
anova(ginmulti) ## anova table

library(car) #load the package car rst
round(vif(ginmulti), 10) ## variance ination factor for multico.
2 d.p.
sqrt(vif(ginmulti)) > 10 # which ones are problematic?
Feb-Apr, 2016
266 / 287
Practice Assignment (1)

The cross-sectional data was extracted from the 1974 Motor
Trend US magazine, and comprises fuel consumption and 10
aspects of automobile design and performance for 32 automobiles
(197374 models). It can be found in: Henderson and Velleman
(1981), Building multiple regression models interactively.
Biometrics, 37, 391411. The data frame is 32 observations on
11 variables. I've dropped the 8th variable, so, you've 10.
Feb-Apr, 2016
267 / 287
Feb-Apr, 2016
268 / 287
Practice Assignment (3), mtcars variables
Miles per gallon is a measurement of fuel economy in

automobiles. The more number of cylinders, the more power you
can make. But, some 4 cylinder engines make more power than
V8 engines. The power an engine produces is called horsepower.
Displacement is determined from the bore and stroke of an
engine's cylinders. Use R software to do your assignment.
Feb-Apr, 2016
269 / 287
Practice Assignment (4), variables

[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (lb/1000)
Feb-Apr, 2016
270 / 287
[, 7] qsec 1/4 mile time

[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors; regulates the ow of air &
gasoline into the engine cylinders.
Feb-Apr, 2016
271 / 287

1. How many variables are categorical & how many are
numerical?
2. Display the full descriptive statistics of the data in R. Why do
you think some variables are starred (*)?
3. What is the median for mpg, standard deviation for
horsepower , skewness for weight & the kurtosis for
displacement?
4. Is the mpg variable normally distributed? (use any
appropriate plot). Test if mpg is normal.
Feb-Apr, 2016
272 / 287

5. Create a blue scatterplot matrix of mpg, disp, hp, & drat, wt,
qsec. From your graph, is the correlation between mpg & weight
positive or negative?
6. What's the pearson correlation coecient between mpg &
weight and between horsepower & displacement correct to 2
d.p
7. Generate a simple regression plot of mpg on wt with a red
tted line.
8. Run the whole multiple regression model (label the equation
as "regcar") & use it to answer other questions below
Feb-Apr, 2016
273 / 287

9. Specify the regression model for the whole data. You may
shortcut some of the variable names.
10. In one sentence, which variables are signicant and which are
not? Justify
11. Interpret the coecient of displacement, cylsix, ammanual,
gearthree, & carbSix.
12. Is the regression model well t? Explain
Feb-Apr, 2016
274 / 287

13. Compare the mpg of a car with four cylinders, 180
displacement, 115 horsepower, 4.5 drat, 5kg weight, 20 qsec, is
manual, has four gearbox and two carburators with that of the
mpg of a car with six cylinders, 200 displacement, 80 horsepower,
4 drat, 3kg weight, 20 qsec, is automatic, has three gearbox and
three carburators. Which factor(s) appears to be causing a
comparative dierence (if any)?
Feb-Apr, 2016
275 / 287
Just Attempt This Work (1)

Golden Tulip Hotel claims it is still the best in Ghana despite sti
competition from Fiesta Royal, La Palm, Robinhood etc. It has
embarked on price restructuring and adverts to boost demand for
the coming year. It has a number of outlets (including Golden
Tulip Kumasi city), each having data on the number of meals
served (Q) which is regressed on average price per meal (P in
GH), .......
Feb-Apr, 2016
276 / 287
Just Attempt This Work (2).
.... competitor's price (Pc in GH), adverts each outlet (A in

GH), & the average income per household in each outlet's
immediate service area (Y in GH). Ordinary least squares
estimation of the regression equation based on the data led to
the following table. Use it to answer the questions that follow.
Feb-Apr, 2016
277 / 287
Just Attempt This Work (3). corr matrix
Feb-Apr, 2016
278 / 287
Just Attempt This Work (4)

Which 3 pairs of variables are perfectly correlated?
Which 3 pairs of variables may be mostly strongly correlated with
the response variable?
Which 3 pairs of variables may be mostly multicollinear
Which 3 pairs of variables are least multicollinear?
Which 3 pairs of variables are least correlated?
Find the missing values in the regression output
Feb-Apr, 2016
279 / 287
Practice Exercise corr matrix
Feb-Apr, 2016
280 / 287
Reg Output
Feb-Apr, 2016
281 / 287
Practice Exercise (3)

1 Specify the regression model for the whole data.
2 Write the estimated regression equation.
3 Interpret the coecient of price. Is it consistent with
expectation?
4 Is the estimate of
advert
statistically dierent from 0?
5 Verify the condence interval for the estimate of
price
Feb-Apr, 2016
282 / 287
Test
if the coecient of
income
is statistically signicant. State
the null and alternative hypotheses both mathematically and in

words, give the test statistic and explain your real-world
conclusions.
Is the regression model well t?
Are the coecients jointly signicant?
Feb-Apr, 2016
283 / 287

Golden Tulip wants to construct an approximate 95% prediction
interval for the number of meals served in an outlet given that
price is GH4, competitor's price is GH5, advertising is GH2
and nothing (i.e. no change) for income. As a consultant to
Golden Tulip, can you assist them?
Which one of the pairs of multicollinear variables selected earlier
could be deleted from the regression? Why?
Feb-Apr, 2016
284 / 287
References
Paul Newbold, William Carlson, Betty Thorne 2010, Statistics for

Business and Economics, 7th edition, Publisher: Prentice Hall,
ISBN: 978-0-13-608536-2.
Wood, M. (2003) Making Sense of Statistics: A
Non-Mathematical Approach, Basingstoke and New York:
Palgrave.
Feb-Apr, 2016
285 / 287
References
Levin, J. Fox, J.A., and Forde, D.R. (2010) Elementary Statistics

in Social Research, Pearson Education 11/E
Bryman, A. and Bell, E. (2007) Business Research Methods
(second edition), Oxford
Wisniewski M. (2006) Quantitative Methods for Decision Makers
Feb-Apr, 2016
286 / 287
References
Thomas, R. (1997) Quantitative Methods for Business Studies,

Hemel Hempstead: Prentice Hall.
Morris, C. (2002) Quantitative Approaches in Business Studies,
sixth edition, Harlow, Essex: Financial Times Prentice Hall.
See course outline for more references.
Feb-Apr, 2016
287 / 287

Research Methods SESSIONS STUDENTS Abeeku PDF

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Research Methods SESSIONS STUDENTS Abeeku PDF

Enviado por

Direitos autorais:

Formatos disponíveis

RESEARCH METHODS

ANALYSIS OF QUANTITATIVE DATA FOR RESEARCH

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 1)

Welcome to the journey through the valley of the

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 2)

: UGBS, Roof Top, Room 16

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 3)

Final Examination 35%

My teaching style is interactive

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 4)

Strategies for Success

Be of good courage! You can master this material!!

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 5)

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 6)

The key topics to be covered in the session are as follows:

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 7)

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 8)

Before undertaking a specic quantitative research,

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 9)

The scale of measurement determines the amount of

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 10)

A data is categorical if the observation in it can be grouped into

An observation is from a set of non-overlapping categories

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 11)

Categorical Data - Nominal

When the data for a variable consist of

identify an attribute of the element, the scale of measurement is

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 12)

Nominal Data Examples

Sex, Employment, Marital Status, Gender

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 13)

Categorical Data - Ordinal

The measurement scale for a data is ordinal if the data exhibits

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 14)

Ordinal Data Examples

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 15)

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 16)

Numerical Data - Interval

A variable is an interval scale if the data have all the properties

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 17)

Interval Data Examples

Dierence between a temperature of 100 degrees & 90 degrees is

Dr K. Ohene-Asare & Mr. Abeeku E. Edu (UGBS)

RESEARCH METHODS (slide 18)

Numerical Data - Ratio

A data is ratio if the values/observations belonging to it may

Before undertaking a specic quantitative research,

Dierence between a temperature of 100 degrees & 90 degrees is

magazine conducted a survey to develop a prole

of its subscribers (Foreign Aairs website, February 23, 2008).

dierence between the 2 drugs on average.

is the alternative, opposite statement which reects that

there will be an observed eect for our trial.