Você está na página 1de 66

Hypothesis Testing

Analysis using SPSS

Week 11
Learning Outcome

Statistical Analysis for business research


 Introduction to SPSS
 How to interpret the findings in SPSS

Statistical Analysis of collected data


 Real example of collected data
 Running reliability test
 Correlation test
 Regression analysis

Module Code and Module Title Title of Slides


Data Analysis and Results
1. Describe what you found out and what it
means. Describe Respondents
2. Present the Data in the form of tables, Eg: Gender, Age
figures, charts or other illustrations as Descriptive Stats
All Questions
needed and sequenced in terms of the
Reliability Testing
research questions or hypotheses tested. Cronbach Alpha
Normality Test
• The respondents or subjects characteristics Histogram etc.
(in tabular and/ or graphical forms), Descriptive – Variables
• Descriptive statistics of the variables -Mean, Eg: Mean, SD
SD Inferential Stats
• Reliability and Normality Test Correlation, T-Test,
Regression
• The Inferential statistics – Correlation, Other tests
multiple Regression etc Eg: EFA
It is preferable to organize the results according to the subheadings
based on the hypotheses being tested
Descriptive & Inferential Statistics

Descriptive Statistics Inferential Statistics


• Organize
• Summarize • Generalize from
• Simplify samples to population
• Presentation of data • Hypothesis testing
• Relationships among
variables

Describing data Make predictions


Using data gathered on a group to Using sample data to reach
describe or reach conclusions conclusions about the population
about that same group only from which the sample was taken
Module Code and Module Title Title of Slides
What is Hypothesis?
Hypothesis is a proposed explanation or an educated
guess made by the researcher on the basis of limited
evidence.

Module Code and Module Title Title of Slides


Hypothesis Testing Procedure
• Hypothesis testing is a method of testing a hypothesis by
comparing it with the null hypothesis.
• Generally, a six-step sequence as shown below is the one
widely used.
1. State the hypothesis (State H0 and H1)
2. Choose the statistical test
3. Select the desired level of significance
Levels of significance used by most of the researchers are
0.01 (α = 0.01) and 0.05 (α = 0.05).
4. Compute the calculated value
5. Obtain the critical test value based on the test in Step2
6. Is calculated test value > critical value?
7. Yes (Reject H0) or No (Do not reject H0)

Module Code and Module Title Title of Slides


Inferential Statistics
• Allows for comparisons across variables
– i.e. is there a relation between one’s occupation
and their reason for using the public library?
• Hypothesis Testing
Types of inferential statistics Inferential Statistics
• Parametric Statistical analysis that
 T-tests
 ANOVA
is able to generalize
 Correlation our sample to the
 Multiple regression population from which
 ANCOVA we obtained our
• Non-parametric sample
 Chi-Square
Module Code and Module Title Title of Slides Dr Jugindar Singh
T-Test – Continuous variables
One sample T-Test
Eg: test if the average marks of
the sample (X=68)is significantly
different from the population
mean of 65 (hypothetical). State
the hypothesis
H0: 65 marks
H1: 65 marks
Let level of significance set at
5%
α= 0.05; d.f. =10 –1 = 9

Module Code and Module Title Title of Slides Dr Jugindar Singh


T-Test – Continuous variables
Test the difference between two sample means for
significance. Pretest to posttest

1.Paired Sample T Test


Compare the mean scores for the same group on 2 different
occasions OR when you have matched pairs
Eg: Scores of trainees before and after the training
2. Independent samples t-test
Two separate groups (independent group) require different
analysis.
Eg: Examination Performance for male and female

Module Code and Module Title Title of Slides Dr Jugindar Singh


ANOVA
1. One way Analysis of variance - ANOVA
Comparing the mean scores of more than 2 groups
Example: Examination scores based on 3 different age
groups
2. Two way Analysis of variance - ANOVA
Two way means that there are 2 independent variables
Eg: Impact of age and gender on Examination scores
– There are 2 categorical variables: Age and Gender
– There is one continuous variable ie examination score

Module Code and Module Title Title of Slides Dr Jugindar Singh


Chi-Square Test for Goodness of Fit
• Chi-square test is the most widely used non-parametric
test and is very useful for testing hypothesis about
frequency or contingency table.
• Univariate tests involving nominal or ordinal data can be
tested using the chi-square goodness of fit test.

Module Code and Module Title Title of Slides


Chi-Square Test for Goodness of Fit

Example 12.1 Results Frequency


Let us assume for the case of a BRM class,
Pass 6
we classify students into Pass and Fail
based on the marks they obtain. Students Fail 4
with a score of 60 and above are
considered as Pass while students with a Total 10
score of below 60 are considered as Fail.
The data is reported in table given.
Expected
Let us assume from the past record we
Results Frequency Frequency
know that 80% of the students pass while (Ei)
20% fail in BRM course. Based on this
Pass 6 10 X 0.8 = 8
information we need to get the expected
frequencies. Fail 4 10 X 0.2 = 2
Total 10

Module Code and Module Title Title of Slides


Difference Between Correlation
and Causation

• Correlation is a measure of association between two


variables.

• This association should not be mistaken for causal


relationship as two variables may be related to one
another but it does not make sense to say that one
causes the other.

Module Code and Module Title Title of Slides


Correlation

Describe the strength an direction of the


linear relationship between 2 continuous
variables.

Example: The correlation between


examination scores and number of hours

Interested in two things:


(i) magnitude of the relationship,
(ii) the direction of the relationship
Module Code and Module Title Title of Slides Dr Jugindar Singh
Pearson Correlation Coefficient

• Measures the degree to which there is a linear


association between two variables (measured in either
interval and/or ratio scale)
• A positive correlation reflects a tendency for a high
value in one variable to be associated with a high value
in the second.
• A negative correlation on the other hand, reflects a
tendency for a high value in one variable to be
associated with a low value in the second.

Module Code and Module Title Title of Slides


Pearson Correlation Coefficient (cont.)
The Pearson’s correlation
coefficient (r) can be calculated
using the formula given below:

Where,
X sample mean of X, and
Y sample mean of Y.

Module Code and Module Title Title of Slides


Pearson Correlation Coefficient (cont.)
Example: Student Score Score
Recall our example of scores 1 65 12
obtained by the students in the BRM 2 75 18
course. Suppose through the survey,
3 84 15
we also obtained the data from these
10 students on total hours per week 4 62 15
they spent on self study for the 5 53 12
course. A Professor believes that 6 74 13
there is a positive relationship
7 48 10
between the total hours spent per
week on self-study and the marks 8 94 25
obtained. The data is presented in 9 59 15
the table. 10 66 17

H0: There is no correlation between hours spent on self-study and marks


H1: There is positive correlation between hours spent on self-study and marks

Module Code and Module Title Title of Slides


Multiple Regression
It is used when we want to predict the value of a variable based on
the value of two or more other variables.

The variables we are using to predict the value of the dependent


variable are called the independent variables (or sometimes, the
predictor variables).
Example
You could use multiple regression to understand whether exam
performance can be predicted based on revision time, test anxiety
and lecture attendance

Revision Time

Test Anxiety Exam Performance

Lecture
Attendance
Module Code and Module Title Title of Slides Dr Jugindar Singh
What Is Linear Regression?

• Linear: Straight line.


• Regression: Finds the model that
minimizes the total variation in the data (i.e.,
the best fit).
• Linear Regression: Can be divided into
two categories:
– Simple regression
– Multiple regression
Module Code and Module Title Title of Slides
Multiple Linear Regression Analysis
• Multiple regression analysis is a statistical technique that
can be used to analyse the relationship between a single
dependent variable (continuous) and several independent
variables (continuous or even nominal). • The R2 or
coefficient of
determination
measures the
amount of
variance in the
one variable
explained by the
combination of
all other
variables in the
model.
Module Code and Module Title Title of Slides
Multicollinearity and Multiple Regressions

• One of the serious issues in regression is collinearity


where two independent variables are highly
correlated, generally at r > 0.8 to 0.9, and this is
termed as multicollinearity.

Module Code and Module Title Title of Slides


Simple Regression
Simple regression considers the relation
between a single explanatory variable and
response variable

15: Multiple Linear Regression


22
Module Code and Module Title Title of Slides Basic Biostat
Multiple Regression
Multiple regression simultaneously considers the
influence of multiple explanatory variables on a
response variable Y

The intent is to look at


the independent effect
of each variable

15: Multiple Linear Regression


23
Module Code and Module Title Title of Slides Basic Biostat
SPSS

Module Code and Module Title Title of Slides


Introduction: What is SPSS?

• Originally it is an acronym of Statistical


Package for the Social Science but now it
stands for Statistical Product and Service
Solutions

• One of the most popular statistical packages


which can perform highly complex data
manipulation and analysis with simple
instructions
Opening SPSS
• The default window will have the data editor
• There are two sheets in the window:
1. Data view 2. Variable view
The SPSS Data Editor

Variable View
The SPSS Data Editor
Variable view
– Name
– Type (Numeric)
– Label
– Values (= the codes of
the answers)
– Measure (= Level of
Measurement)
•Name -- the unique variable name
•Type -- the kind of data to be recorded
Variable View window (e.g., strings of characters, numeric values,
or special numbers like dates)
• This sheet contains information • Width -- the number of characters used
about the data set that is stored with • Decimals -- the number of decimal
places displayed
the dataset •Label -- a text entry to describe the data
• Name provided by the variable, which can be
much longer than the variable name and
– The first character of the variable
may include spaces. With questionnaires,
name must be alphabetic
for example, the label is usually the text of
– Variable names must be unique, and the question.
have to be less than 64 characters. •Values -- if specific numeric values have a
– Spaces are NOT allowed. non-intuitive meaning, these values can be
labeled (e.g., 1 = male and 2 = female)
•Columns -- determines how wide the
variable column should be in Data View
mode
Click •Align -- determines whether the data
should be left-justified, right-justified, or
centered
•Measure -- describes the level of
measurement (nominal, ordinal, or scale)
The SPSS Data Editor
Data View
SPSS Menu’s
• Analyze
– Frequencies
– Cross tabs
– Descriptives
– Regression
SPSS Menu’s
• Graphs
– Bar
– Pie
– Histogram
– Line
– Boxplot
Transforming the
data
The Likert Scale scores given for each statement is to be
added together to get variable score
• Analogy
• We give questions to answer, after marking we add to get
overall score for ranking or to declare best student etc
• We never see scores individually but collectively
Transforming data
• Click ‘Transform’ and then click ‘Compute
Variable…’
Transformation of Data

Click ‘Transform’ and then click ‘Compute


Transforming: Adding total scores
Descriptive Statistics
Descriptive Statistics describe the properties, behaviour
and pattern of data

• Mean – Gives central value


• Median – Gives middlemost number
• Mode – Gives high frequency number
• Standard deviation - gives upper and lower limits to
mean
• Skewness and Kurtosis – Give the nature of frequency
curve
• Maximum and Minimum - gives the range of values
Descriptive statistics
• to describe the characteristics of your sample in the Method section of your report;
• to check your variables for any violation of the assumptions underlying the
statistical techniques that you will use to address your research questions; and
• to address specific research questions

Continuous variables
(e.g. age)
Descriptives, which will
provide you with
‘summary’ statistics such
as mean, median,
standard deviation
SPSS : Exercise
1. Obtain the number of males and females

2. Statistics such as mean, median, standard deviation


Frequency Analysis

• Frequency shows the number of occurrences.


• Also calculates measures of central tendency, such as
the mean, median, mode, and others.
Research Question #1

What kind of computer do people prefer to own?


• histograms;
Graphs to describe and explore the data
• bar graphs;
• scatterplots;
Histograms
Histograms are used to display the distribution of a single • boxplots;
Continuous variable (e.g. age) • line graphs.
Boxplots
Boxplots are useful when you wish to compare the distribution of
scores on variables.

Outliers are cases with scores that are quite different from the remainder of the
sample, either much higher or much lower.
Reliability
The Reliability of data is tested by Cronbach Alpha which should show
a reliability score of more than 70% (> 0.7)
• For secondary data Reliability need not be tested
as they are normally financial data downloaded
from websites and reliable sources.
• For demographic variables reliability testing is
not necessary as it is meaningless
• At pilot study stage question wise reliability will
be tested to correct the questions
• Variable wise reliability is needed at final stage
Cronbach's Alpha (α)
Most common measure of internal consistency ("reliability").
It is most commonly used when you have multiple Likert questions

Example
A researcher has devised a nine-question questionnaire to measure how safe people feel at
work at an industrial complex. Each question was a 5-point Likert item from "strongly disagree"
to "strongly agree".
SPSS Exercise: Reliability
Testing Normality

They compare the shape of your sample Tools for Normality Testing
distribution to the shape of a normal curve • Histogram and Boxplot
• Normal Quantile Plot
• Assumes, if your sample is normal shaped, (also called Normal Probability Plot)
the population from which it came is • Goodness of Fit Tests
normally distributed Shapiro-Wilk Test
Kolmogorov-Smirnov Test
How do we decide if a distribution is approximately normal?
95.0% of the scores fall between a Z of
-1.96 to +1.96

99.9% of the scores fall between a Z of


-3.30 to +3.30
Exercise: Normality test
Assess the Normality for the dependent variable
T Test
• One-sample
– Compare one sample’s mean to a normal, typical, or population
value
– Does Sample average = the normal value?
• Independent
– Compare means of two samples which don’t directly influence
each other (samples are two different groups of people or
things)
– Does average income for Midwesterners = average income for
Southerners?
• Dependent (or Paired)
– Compare means of two samples which you expect to be
connected (often, data is from the same sample at two different
times)
– Does Average productivity before training = average productivity
after?
INFO 515 Lecture #5 50
T test

A T test may be used to compare two group


means using either one of the following:
• Within-participants design (a Paired-Samples T
Test)
• Between-participants design (an Independent-
Samples T Test)
One Sample T-Test
Used for testing whether the mean of one metric variable is equal to some hypothesized
population value
Example:
A lecturer from APU believes that students score for a paper is declining. It's well known that
- on average – the score is around 400 points (out of 1000). The lecturer gets the score s of 40
students

Exam score
INDEPENDENT SAMPLE T TEST
Example
• Men and women may be different in spending habits
• Men and women may be different in emotional decisions
• Foreigners may be different in cultural aspects compared to
Malaysians

1. Here we compare two means of two different groups on a


particular variable
2. Null hypothesis is no difference (=) between the above groups
on a particular variable
3. Alternate hypothesis is there exists a difference (≠) between
the above groups on a particular variable
Conditions for t test:
1. Only two groups can be compared
2. The sample size may differ between groups
3. If there are three or more groups then perform ANOVA test
SPSS Exercise Independent sample T Test
An independent-samples t-test is used when you want to compare the mean
score, on some continuous variable, for two different groups of subjects.
Paired sample t-Test
SPSS paired samples t-test is a procedure for testing whether the
means of two metric variables are equal in some population.
Example:
A behavioral scientist wants to know whether drinking a single glass of beer affects reaction
times. She has 30 participants perform some tasks before and after having a beer and
records their reaction times. For each participant she calculates the average reaction time
over tasks both before and after the beer

1. Analyze
2. Compare Means
3. Paired-Samples T Test.
SPSS Exercise; Paired sample T test
One-Way ANOVA
One-Way ANOVA tests whether the means on a metric variable for three or
more populations are all equal.
Example: A farmer wants to know whether the weight of parsley plants is influenced by using
a fertilizer. He selects 90 plants and randomly divides them into three groups of 30 plants
each. He applies a biological fertilizer to the first group, a chemical fertilizer to the second
group and no fertilizer at all to the third group. After a month he weighs all plants
Correlations
A correlation is a statistical device that measures the
nature and strength of a supposed linear association
between two variables.

Y Y Y

X X X
Positive Relationship Negative Relationship No Relationship
Correlation Coefficient

+
r = 0.0 to 1.0

Magnitude
Direction

The strength of the linear relationship is determined


by the distance of the correlation coefficient (r) from
zero.
Research Question #1

Is there a relationship between academic


performance and Internet access?

H0 = Internet access made no difference


H1 = Internet access made a different
Research Question #1

Is there a relationship between academic


performance and Internet access?
SPSS Exercise
Linear Regression
Used when we want to predict the value of a variable based on the value of another
variable.
Example
A salesperson for a large car brand wants to determine whether there is a relationship between
an individual's income and the price they pay for a car. As such, the individual's "income" is the
independent variable and the "price" they pay for a car is the dependent variable. The
salesperson wants to use this information to determine which cars to offer potential customers
in new areas where average income is known.
Multiple Regression Analysis
Multiple regression is an extension of simple linear regression. It is used when we want to
predict the value of a variable based on the value of two or more other variables.
Example
A health researcher wants to be able to predict "VO2max", an indicator of fitness and health.
SPSS Exercise: Multiple Regression
Tutorial
SPSS Practice Session

1. Descriptive Statistics
2. Inferential Statistics

Você também pode gostar