Você está na página 1de 121

Lean Six Sigma Green Belt

Lesson 4Analyze

Copyright 2014, Simplilearn, All rights reserved.


Copyright 2014, Simplilearn, All rights reserved.

Objectives
After completing
this lesson, you will
be able to:

Explain the patterns of variation

Describe the classes of distributions

Discuss Multi-Vari studies and the causes

Explain correlation and its types

Discuss the various hypothesis tests

Discuss the application of F-test, t-test, ANOVA, and Chi-squared

Copyright 2014, Simplilearn, All rights reserved.

Analyze
Topic 1Patterns of Variation

Copyright 2014, Simplilearn, All rights reserved.

Classes of Distributions
The data obtained from measurement phase exhibits variety of distribution, depending on the data
type and its source.
The methods used to describe the parameters for classes of distribution are:

It is based on assumed
model of distribution.

Used to find the chances


of certain outcome/event
to occur.

Copyright 2014, Simplilearn, All rights reserved.

Inferential Statistics

Statistics

Probability

Uses the measured data


to determine a model to
describe the data used.

Describe the population


parameters based on the
sample data using a
particular model.

Types of Distributions
The two types of distribution are as follows:
Continuous Distribution

Discrete Distribution

Binomial distribution

Normal distribution

Poisson distribution

Chi-square distribution

t-distribution

F-distribution

Copyright 2014, Simplilearn, All rights reserved.

Discrete Probability Distribution


Discrete probability distribution is characterized by the probability mass function.

It is important to be familiar with discrete distributions while dealing with discrete data.

The two most useful discrete probability distributions are:

Binomial distribution; and

Poisson distribution.

These distributions help in predicting the sample behavior that has been observed in a population.

Copyright 2014, Simplilearn, All rights reserved.

Binomial Distribution
Binomial distribution is a probability distribution for the discrete data.
Characteristics
of Binomial
Distribution

Describes the discrete data as a


result of a particular process

Predicts sample behavior

Used to deal with defective items

Best suitable when the sample size is


less than thirty and less than ten
percent of the population

P R = n C r pr 1 p n r
where, P(R) = probability of exactly (r) successes out of a sample size of (n)
p = probability of success; r = number of successes desired; n = sample size
Copyright 2014, Simplilearn, All rights reserved.

Binomial Distribution (contd.)


Some of the key calculations of binomial distribution are shown.

Term

Formula

Mean

=
where, n = sample size
p = probability of success

Standard Deviation

= (1 )
where, n = sample size
p = probability of success

Sample factorial calculation

5! = 5 4 3 2 1 = 120
4! = 4 3 2 1 = 24

Copyright 2014, Simplilearn, All rights reserved.

Calculating Binomial DistributionExample

Q
A

Using binomial distribution formula, find the probability of getting 5 heads in 8 coin tosses.

Tossing a coin has only two outcomes, Head or Tail.


Outcomes are statistically independent.
Therefore,
p = probability of success = 0.5 (this remains fixed over time)
n = sample size = 8
r = number of successes desired = 5
P R = 8 C 5 0.55 1 0.5

Copyright 2014, Simplilearn, All rights reserved.

8_5=

0.2187 = 21.87%

Poisson Distribution
Poisson distribution is an application of the population knowledge to predict the sample behavior.

Describes the discrete


data

Characteristics
of Poisson
Distribution

Used to analyze
situations wherein the
number of trials is large

Deals with integers which


can take any value

Used where the


probability of success in
each trial is very small
Used for predicting the
number of defects

Copyright 2014, Simplilearn, All rights reserved.

10

Poisson DistributionFormula
The formula for the Poisson distribution is as follows:
x e
P =
!
where, P(x) = probability of exactly () occurrences in a Poisson distribution (n)
= mean number of occurrences during interval
= number of occurrences desired
e = base of the natural logarithm (equals 2.71828)

Mean of a Poisson Distribution () =


Standard Deviation of a Poisson Distribution () =

Copyright 2014, Simplilearn, All rights reserved.

11

Calculating Poisson DistributionExample

Q
A

The past records of a road junction which is accident-prone show that the mean number of accidents every
week is 5 at this junction. Assume that the number of accidents follows a Poisson distribution and calculate
the probability of any number of accidents happening in a week.

Assumption is the number of accidents follows a Poisson distribution


Given: =5 per week
Now, probability of zero accidents per week P 0 =

5x e5
0!

= 0.006

Probability of exactly one accident per week P 1 =

51 e5
1!

= 0.03

Probability of more than two accidents per week = 1 [P(0)+P(1)+P(2)] = 1 [0.006+0.03+0.08]


= 0.884 = 88.4%

Copyright 2014, Simplilearn, All rights reserved.

12

Continuous Probability Distribution


Continuous probability distribution is characterized by the probability density function.

A variable is said to be continuous if the range of possible values falls along a continuum.
Example: Loudness of cheering at a ball game, weight of cookies in a package, length of a pen,
or the time required to assemble a car.

These distributions help in predicting the sample behaviour observed in a population.

Copyright 2014, Simplilearn, All rights reserved.

13

Normal Distribution
The Normal or Gaussian distribution is a continuous
probability distribution, illustrated as N (, ).

It has a higher frequency of values around the


mean and fewer occurrences away from it.

It is used as a first approximation to describe


real-valued random variables that tend to
cluster around a single mean value.

It is a bell-shaped curve and is symmetrical.

The total area under the normal curve p(x which


is found in the distribution) = 1.

Copyright 2014, Simplilearn, All rights reserved.

Normal Distribution with Mean = 100 and Standard Deviation = 10

14

Normal Distribution (contd.)


In a normal distribution, to standardize comparisons of dispersion, a standard Z variable is utilized.
The uses of Z value are as follows:

It is unique for each probability within the normal distribution.

It helps in finding probabilities of data points anywhere within the distribution.

It is dimensionless with no units like mm, liters, coulombs, etc.

(Y )

where, Z = number of standard deviations between Y and the


Y = value of the data point in concern
= mean of the population
= standard deviation of the population
Z=

Copyright 2014, Simplilearn, All rights reserved.

15

Calculating Normal DistributionExample

Q
A

Suppose the time taken to resolve customer problems follows a normal distribution with the mean value of
250 hours and standard deviation value of 23 hrs. What is the probability that a problem resolution will take
more than 300 hrs?

Given:
Y = 300
= 250
= 23
Using the formula: Z =

(300250)
=
23

2.17

From a Normal Distribution Table, the Z value of 2.17 covers an area of 0.98499 under itself
Thus, the probability that a problem can be resolved in less than 300 hrs is 98.5%
The chances of a problem resolution taking more than 300 hours is 1.5%

Copyright 2014, Simplilearn, All rights reserved.

16

Z-Table Usage
The probability of areas under the curve is 1. For the actual value, one can identify the Z score by
using the Z-table.

Copyright 2014, Simplilearn, All rights reserved.

17

Z-Table
This Z-table gives the
probability that Z is between
zero and a positive number.
This is the most commonly
used normal distribution Ztable with the positive Zscores.

Copyright 2014, Simplilearn, All rights reserved.

0.0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.0

0.5000

0.5040

0.5080

0.5120

0.5160

0.5199

0.5239

0.5279

0.5319

0.5359

0.1

0.5398

0.5348

0.5478

0.5517

0.5557

0.5596

0.5636

0.5675

0.5714

0.5753

0.2

0.5793

0.5832

0.5871

0.5910

0.5948

0.5987

0.6026

0.6064

0.6103

0.6141

0.3

0.6179

0.6217

0.6255

0.6293

0.6331

0.6368

0.6406

0.6443

0.6480

0.6517

0.4

0.6554

0.6591

0.6628

0.6664

0.6700

0.6736

0.6772

0.6808

0.6844

0.6879

0.5

0.6915

0.6950

0.6985

0.7019

0.7054

0.7088

0.7123

0.7157

0.7190

0.7224

0.6

0.7257

0.7291

0.7324

0.7357

0.7389

0.7422

0.7454

0.7486

0.7517

0.7549

0.7

0.7580

0.7611

0.7642

0.7673

0.7704

0.7734

0.7764

0.7794

0.7823

0.7852

0.8

0.7881

0.7910

0.7939

0.7967

0.7995

0.8023

0.8051

0.8078

0.8106

0.8133

0.9

0.8159

0.8186

0.8212

0.8238

0.8264

0.8289

0.8315

0.8340

0.8365

0.8389

1.0

0.8413

0.8438

0.8461

0.8485

0.8508

0.8531

0.8554

0.8577

0.8599

0.8621

1.1

0.8643

0.8665

0.8686

0.8708

0.8729

0.8749

0.8770

0.8790

0.8810

0.8830

1.2

0.8849

08869

0.8888

0.8907

0.8925

0.8944

0.8962

0.8980

0.8997

0.9015

18

Using Z-TableExample

Q
A

Find the value of p of (Z less than 0).

There is no need of the table to find the answer once you know that the variable Z takes a value of
less than (or equal to) zero.

First, the area under the curve is 1, and second, the curve is symmetrical about Z = 0.

Hence, there is 0.5 (or 50%) above chance of Z = 0 and 0.5 (or 50%) below chance of Z = 0.

Copyright 2014, Simplilearn, All rights reserved.

19

Using Z-TableExample (contd.)

Q
A

Find the value of p of (Z greater than 1.12).

The opposite or complement of an event A occurring is the event A not occurring.


P(not A) = 1 P(A)
P(Z greater than 1.12) = 1 P(Z less than 1.12)

Using the table:


P(Z < 1.12) = 0.5 + P(0 < Z < 1.12) = 0.5 + 0.3686 = 0.8686
Hence P(Z > 1.12) = 1 0.8686 = 0.1314

Copyright 2014, Simplilearn, All rights reserved.

20

Using Z-TableExample (contd.)

Find the value of p of (Z lies between 0 and 1.12).

Z falls within an INTERVAL

Using the table:


P(Z lies between 0 and 1.12) = 0.3686

Copyright 2014, Simplilearn, All rights reserved.

21

Chi-Square Distribution
Chi-square distribution (chi-squared or distribution) with k-1 degrees of freedom is the distribution
of the sum of the squares of k independent standard normal random variables.

Characteristics
of
Distribution
Most widely used probability
distribution in inferential statistics

The distribution is used in a


hypothesis test

Degree of freedom (df) = k 1, where k is the sample size.

Copyright 2014, Simplilearn, All rights reserved.

22

Chi-Square DistributionFormula
The formula for the Chi-square distribution is as follows:

2calculated = =

fO fe
fe

where, 2calculated () = chi-square index


fO = observed frequency

fe = expected frequency

Chi-square distribution will be covered in detail in the later part of this lesson.

Copyright 2014, Simplilearn, All rights reserved.

23

t-Distribution
A t-distribution is most
appropriate to be used when:

the sample size <30;

population standard
deviation is not known; and

population is approximately
normal.

The t-distribution approaches normality as the sample size increases.

Copyright 2014, Simplilearn, All rights reserved.

24

F-Distribution
The F-distribution is a ratio of two Chi-square distributions, and a specific F-distribution is denoted by
the degrees of freedom for the numerator Chi-square and the degrees of freedom for the
denominator Chi-square.

Fcalculated

S12
= 2
S2

where, S1 and S2 = standard deviations of the two samples

If Fcalculated is 1, there is no difference in the variance

If S1 > S2 , then the numerator should be greater than denominator (df1 = n1 1 and df2 = n2 1)

Refer F-table to find out critical F-distribution at and degrees of freedom of samples of two
different processes (df1 and df2)

Copyright 2014, Simplilearn, All rights reserved.

25

Analyze
Topic 2Exploratory Data Analysis

Copyright 2014, Simplilearn, All rights reserved.

Multi-Vari Studies
Multi-Vari studies analyze variation, investigate process stability, identify investigation areas, and
break down the variation.

They classify variation sources into three major types:


Positional

Cyclical

Temporal

Variations within a single unit


where variation is due to location.

Variations among sequential


repetitions over a short time.

Variations which occur over


longer periods of time.

Examples: Pallet stacking in a


truck, temperature gradient in an
oven, variation observed from
cavity-to-cavity within a mold,
region of a country, line on invoice

Examples: Every nth pallet


broken, batch-to-batch variation,
lot-to-lot variation, invoices
received day-to-day, and account
activity week-to-week

Examples: Process drift,


performance before and after
breaks, seasonal and shift based
differences, month-to-month
closings, and quarterly returns

Copyright 2014, Simplilearn, All rights reserved.

27

Create Multi-Vari Chart


The five steps to create a Multi-Vari chart are:
Select Process and
Characteristics
Example: Select
the process
where the plate
is being
manufactured
and measure its
thickness within
a specified range.

Decide
Sample Size
Example: Sample
size is five pieces
from each
equipment and
the frequency of
data collection is
every two hours.

Copyright 2014, Simplilearn, All rights reserved.

Create a
Tabulation
Sheet
Example: The
tabulation sheet
with data records
contains the
columns with
time, equipment
number, and
thickness as
headers.

Plot the Chart


Example: Chart is
plotted with time
on X axis and the
plate thickness
on Y axis.

Link the
Observed
Values
Example: The
observed values
are linked by
appropriate lines.

28

Create Multi-Vari Chart (contd.)


The path to create a
Multi-Vari chart in

Minitab is:
Minitab > Stat > Quality
Tools > Multi-Vari Chart

Copyright 2014, Simplilearn, All rights reserved.

29

Simple Linear Correlation


Correlation is the association between variables. The Coefficient Correlation shows the strength of
the relationship between Y and X.

The statistical significance is denoted by correlation coefficient r.


r or Pearsons Coefficient of Correlation
-1
Movement in both
variables is inverse

0
No correlation between
the two variables

+1
Movement in both
variables is same

Higher the absolute value of r, stronger the correlation between Y and X.


An r value of > + 0.85 or < - 0.85 indicates a strong correlation.

Copyright 2014, Simplilearn, All rights reserved.

30

Correlation Levels
Correlation measures the linear
association between the output

(Y) and the input variable (X).


The patterns of correlation
displayed in scatter plots are:

easy to see when the r


value is 0.9 and above; and

difficult to see when the r


value is 0.5 or below.

Copyright 2014, Simplilearn, All rights reserved.

31

Regression
The degree of movement of variable changes is calculated using regression.
If a high percentage of variability in Y (r2> 70%) is explained by changes in X, the model to write a

transfer equation is as follows:


Y = f(X)

This equation is used to:

predict future values of Y given X, and X given Y; and

regress Y on one or more Xs simultaneously.

Simple Linear Regression is for one X and Multiple Linear Regression is for more than one X.

Copyright 2014, Simplilearn, All rights reserved.

32

Key Concepts of Regression


There are two key concepts of regression:

Transfer function to control Y


Y = f(X) may not be the correct transfer function to
control Y because there may be a low level of
correlation between the two variables.

Vital X
It is important to discover whether a statistical
significant relationship exists between Y and a
particular X by looking at p-values. Based on
regression, one can infer the vital X and eliminate the
rest.

It is important to understand if there is statistical relevance between Y and X using the metrics from
Regression Analysis. The Simple Linear Regression should be used as a Statistical Validation tool.

Copyright 2014, Simplilearn, All rights reserved.

33

Simple Linear Regression (SLR)


A simple linear regression equation is a fitted linear equation between Y and X. It is represented as
follows:
Y = A + BX C
where,

Y = Dependent variable / output / response

X = Independent variable / input / predictor

A = Intercept of fitted line on Y axis

B = Regression coefficient / Slope of the fitted line

C = Error in the model

Copyright 2014, Simplilearn, All rights reserved.

34

Least Squares Method in SLR


If Y and X are not perfectly linear (r = 1), several lines could fit in the scatter plot. It can be inferred
from the graphs below:

Minitab fits the line which has the least Sum of Squares of Error.

In a linear relationship, the points would lie on the line. Typically, the data lies off the line.

The distance from the point to line is the error distance used in the SSE calculations.

Copyright 2014, Simplilearn, All rights reserved.

35

SLRExample
A farmer wishes to predict the relationship between the amount spent on fertilizers and the annual
sales of his crops. He collects the following data of last few years and determines his expected

revenue if he spends $8 annually on fertilizer.


Years

Fertilizer Expenses in $ (Y)

Annual Selling in $ (X)

2009

20

2010

25

2011

34

2012

2013

11

40

2014

31

Copyright 2014, Simplilearn, All rights reserved.

36

SLR using Microsoft Excel


The steps to perform Simple Linear Regression in Microsoft
Excel are as follows:
1. Copy the data from the cells B1 to C6 on an Excel
Worksheet.

2. Click Insert, and choose the Plain Scatter Chart (Scatter


with only Markers).
3. Right-click on the data points and choose Add Trendline.
4. Choose Linear and select the boxes titled, Display RSquared value and Display equation.
Copyright 2014, Simplilearn, All rights reserved.

37

Regression Analysis Using Microsoft Excel


To use the data for Regression analysis, the interpretation of the scatter chart is as follows:

The r2 value (Coefficient of Determination) conveys if the model is good and can be used. The r2

value is 0.3797.

38% of variability in Y is explained by X.

The remaining 62% variation is due to residual factors.

The low value of r2 validates a poor relationship between Y and X.

Refer to the Cause and Effect Matrix and study the relationship between Y and a different X variable.

Copyright 2014, Simplilearn, All rights reserved.

38

Multiple Linear Regression


If a new variable X2 is added to the r2 model, the impact of X1 and X2 on Y gets tested. This is known as
Multiple Linear Regression. In Multiple Linear Regression:

the value of r2 changes due to the introduction of the new variable.

the resulting value of r2 is known as r2 Adjusted.

the model can be used if r2 Adjusted value is greater than 70%.

Copyright 2014, Simplilearn, All rights reserved.

39

Key Concepts of Multiple Linear Regression


Multiple Linear Regression covers the following concepts:

The residuals between the actual value and the predicted value give an indication of how good the

model is.

If the errors are small and predictions use Xs that are within the range of the collected data, the
predictions should be fine.
SST = SSR + SSE

SSR = SST - SSE

r2 = SSR SST

To check for error, take two observations of Y at the same X.

Prioritization of Xs can be done through the SLR equation; run separate regressions on Y with each X.

If an X does not explain variation in Y, it should not be explored further.

Copyright 2014, Simplilearn, All rights reserved.

40

Difference between Correlation and Causation


A regression equation may denote a relationship between variables. It does not indicate:

if change in one variable causes change in the other; and

both the variables may be dependent on another independent variable.

There is a positive
correlation between the
number of sneezes and the
deaths in the city. It cannot
be assumed that sneezing is
the cause of death though
the correlation is very strong.

Copyright 2014, Simplilearn, All rights reserved.

41

Analyze
Topic 3Hypothesis Testing

Copyright 2014, Simplilearn, All rights reserved.

Statistical and Practical Significance of Hypothesis Test


The differences between a variable and its hypothesized value may be statistically significant but may
not be practical or economically meaningful.
For example: Based on the hypothesis test, Nutri Worldwide Inc. implemented a trading strategy. The
returns:

are economically significant when logical reasons are examined before implementation.

may not be significant when statistically proven strategy is implemented directly.

may be economically insignificant due to taxes, transaction costs, and risks.

Copyright 2014, Simplilearn, All rights reserved.

43

Null Hypothesis vs. Alternate Hypothesis


The conceptual differences between a null and an alternate hypothesis are as follows:
Null Hypothesis

Alternate Hypothesis

Represented as H0

Cannot be proved, only rejected

Example: Movie is good

Represented as Ha
Challenges the null hypothesis
Example: Movie is not good

If null hypothesis is rejected, alternative hypothesis must be right.

Copyright 2014, Simplilearn, All rights reserved.

44

Type I and Type II Error


The conceptual differences between type I and type II error are as follows:
Type I Error

Type II Error

Rejecting a null hypothesis when it is true

Also known as Producers Risk

is the chance of committing a Type 1 error

The value of is 0.05 or 5%

Example: When a movie is good, it is reviewed as


not good.

Copyright 2014, Simplilearn, All rights reserved.

Accepting a null hypothesis when it is false


Also known as Consumers Risk
is the chance of committing a Type II Error
The value of is 0.2 or 20%
Any experiment should have as less value as
possible
Example: When a movie is not good, it is
reviewed as good.

45

Important Points to remember about Type I and Type II Errors


While dealing with type I or type II errors, following are the points to remember:

Probability of making one type of error can be reduced, leading to increasing the probability of

making the other type of error.

If a true null hypothesis is erroneously rejected (Type I error), a false null hypothesis may be
accepted (Type II error).

is set at 0.05, which means the risk of committing a type I error will be 1 out of 20 experiments.

It is important to decide what type of error should be less and set and accordingly.

Copyright 2014, Simplilearn, All rights reserved.

46

Power of Test
The power of a test:

helps in the probability of correctly rejecting the null hypothesis when it is false.

is represented as 1-. This is type II error.

is the probability of not committing a type II error.

helps in improving the advantage of hypothesis testing.

with highest value should be preferred when given a choice of tests.

In hypothesis testing, is the significance level and 1- is the confidence level.

Copyright 2014, Simplilearn, All rights reserved.

47

Determinants of Sample SizeContinuous Data


The sample size is determined by the responses to the following questions:

How much variation is present in the population? ( )

At what interval does the true population mean need to be estimated? ( )

How much representation error is allowed in the sample? ( )

The sample size for continuous data can be determined by the formula:

n=

Copyright 2014, Simplilearn, All rights reserved.

Z1( )

1- ( 2 ) = 0.975

48

Standard Sample Size FormulaContinuous Data


To calculate the standard sample size for continuous data, the value of is taken as 5%. According to Z
table, the Z97.5 = 1.96. The standardized sample size formula is:
1.96
n=

Q
A

for Continuous Data

The population standard deviation for the time, to resolve customer problems, is 30 hours. What should
be the size of a sample that can estimate the average problem resolution time within 5 hours tolerance
with 99% confidence?

= 5, = 30, =0.01, and Z99.5 = 2.575.


Sample size = [(2.575*30)/5]2 = 238.70 = 239

Copyright 2014, Simplilearn, All rights reserved.

49

Standard Sample Size FormulaDiscrete Data


To calculate the standard sample size for discrete data, if the average population proportion nondefective is p, then population standard deviation can be calculated as:
=

p(1 p)

n=

1.96

p(1 p)

for Discrete Data

Where = Tolerance allowed on either side of the population proportion average in %

Q
A

The non-defective population proportion for pen manufacturing is 80%. What should be the sample size to
draw a sample that can estimate the proportion of compliant pens within 5% with an alpha of 5%?

= 0.05, 2= 0.8 (1-0.8), = 0.05, and Z97.5 = 1.96


Sample size = (1.96/0.05)2 *0.8*0.2 = 245.86 = 246

Copyright 2014, Simplilearn, All rights reserved.

50

Hypothesis Testing Roadmap


The figure below helps in concluding the type of test one should perform based on the kind of data
and values available:
Hypothesis testing

Discrete data
Mean

Continuous data
Variance

Variance

Mean

F-test

Comparison Comparison
of two
of many

X2-test

Copyright 2014, Simplilearn, All rights reserved.

unknown

t-test

known

Z-test

F-test

51

Hypothesis Test for Means (Theoretical)Example


H0: Average height of North American males is 165 cm (0)
Ha: Average height of Indian males < > 165 cm
H0: = 0 against Ha: < > 0
Sample size (n) = 117 (Z-test) and Sample size (n) = 25 (t-test); Sample average (X) = 164.5 cm
Z-test ( known)
The population SD is known; = 5.2
Compute z = (X 0) / (2/n) = (165 164.5) /
(5.22/117) = 1.04
Reject H0 at level of significance if z > z
Since z0.05 = 1.96, the null hypothesis is not rejected at 5%
level of significance. Thus a conclusion based on the
sample collected is that the average height of North
American males is 165 cm.
Copyright 2014, Simplilearn, All rights reserved.

t-test ( unknown)
The population SD is unknown; however, it is estimated
from the sample SD; s = 5.0
Compute t = (X- 0) / (s2/ n) = (165 164.5) / (52 /25)=
0.5
Reject H0 at level of significance if t > tn-1,
Since t24, 0.05 = 2.064, the null hypothesis is not rejected at
5% level of significance. Thus a conclusion based on the
sample collected is that the average height of North
American males is 165 cm.
52

Hypothesis Test for VarianceExample


In hypothesis test for variance Chi square test is used. This is explained in the example below:

H0: Proportion of wins in Australia or abroad is independent of the country played against
Ha: Proportion of wins in Australia or abroad is dependent on the country played against
2 Critical = 6.251 and
2 Calculated = 1.36
Result: Since calculated value is less than the critical value, the proportion of wins of Australia hockey
team is independent of the country played or place.

Copyright 2014, Simplilearn, All rights reserved.

53

Hypothesis Test for ProportionsExample


The hypothesis test on population proportion can be performed as follows:
H0: Proportion of smokers among males in a place named R is 0.10 (p0)
Ha: Proportion of smokers among males in R is different than 0.10
H0: p = p0 against Ha: p < > p0
Among n = 150 adult males interviewed, 23 were found smokers.
Sample proportion p = 23/150 = 0.153
Compute test statistic:
Reject H0 at level of significance if z > z
Since z0.05 = 1.96, the null hypothesis is rejected at 5% level of significance in favor of the alternative
Result: It can be concluded that the proportion of smokers in R is greater than 0.10.
Copyright 2014, Simplilearn, All rights reserved.

54

Comparison of Means of Two Processes


Means of two processes are compared to:

understand the significant difference in the outcome of the two processes;

understand whether a new process is better than an old process;

understand whether the two samples belong to the same population or a different population;
and

benchmark the existing process with another process.

Copyright 2014, Simplilearn, All rights reserved.

55

Paired Comparison Hypothesis Test for Means (Theoretical)


The two-mean t-test with unequal variances is:

H0: 1 = 2 against Ha: 12

Two samples of sizes n1 = 125 and n2 = 110 are taken from the two populations

X1 = 167.3, X2 = 165.8, s1 = 4.2, s2 = 5.0 are the sample means and SDs respectively

Compute test statistic

Reject H0 at level of significance if |Computed t|> tDF,/2

Since t223, 0.025 = 1.96, the null hypothesis is rejected at 5% level of significance

Copyright 2014, Simplilearn, All rights reserved.

56

Paired Comparison Hypothesis Test for VarianceF-Test Example

Susan is examining the earnings of two companies. According to her, the earnings of Company A are more
volatile than those of Company B. She has been obtaining earnings data for the past 31 years for Company
A, and for the past 41 years for Company B. She finds that the sample standard deviation of Company As
earnings is $4.40 and of Company Bs earnings is $3.90. Determine whether the earnings of Company A
have a greater standard deviation than those of Company B at 5% level of significance.

H0 : A2= B2 = the variance of Company As earnings is equal to the variance of Company Bs earnings.
Ha : A2 < > B2 = the variance of Company As earnings is different.
A2= variance of Company As earnings.
B2= variance of Company Bs earnings.
Note: A > B. In calculating the F-test statistic, always put the greater variance in the numerator.

Copyright 2014, Simplilearn, All rights reserved.

57

Hypothesis Test for Equality of VarianceF-Test Example


The degrees of freedom for company A and company B are:

dfA (degrees of freedom of A) = 31 1 = 30

dfB (degrees of freedom of B) = 41 1 = 40

The critical value from F-table equals 1.74. The null hypothesis is rejected if the F-test statistic is

greater than 1.74.


Calculation of F-test statistic:

F= (SA2/SB2) = 4.402/3.902 = 1.273

Results: The F-test statistic (1.273) is not greater than the critical value (1.74). Therefore, at 5%
significance level, the null hypothesis cannot be rejected.

Copyright 2014, Simplilearn, All rights reserved.

58

Hypothesis TestsF-Test for Independent Groups


A restaurant wanting to explore the recent overuse of avocados suspects there is a difference
between two chefs and number of avocados used to prepare the salads. The table shows the measure

of avocados in ounces.

Copyright 2014, Simplilearn, All rights reserved.

Group A (Chef 1)

Group B (Chef 2)

4.2

4.5

4.5

7.2

6.1

5.2

8.9

5.3

5.2

6.1
59

F-Test
The steps for conducting FTest in MS-Excel are:

1. Click Data Analysis


under Data tab.
2. Select F-Test TwoSample for Variances.
3. In Variable 1 and 2

range, select the right


data set.
4. Click Ok.
Copyright 2014, Simplilearn, All rights reserved.

60

F-Test Assumptions
Before interpreting the F-test, the assumptions to be considered are as follows:

Null Hypothesis: There is no significant statistical difference between the variances of the two

groups, thus concluding any variation could be because of chance. This is Common Cause of
Variation.

Alternate Hypothesis: There is a significant statistical difference between the variances of the two
groups, thus concluding variations could be because of assignable causes also. This is Special Cause
of Variation.

Copyright 2014, Simplilearn, All rights reserved.

61

F-Test Interpretations
The interpretations for the conducted F-test are as

F-Test Two-Sample for Variances

follows:

Variable 1

Variable 2

Mean

6.016666667

5.016666667

Variance

3.197666667

0.517666667

Observations

Cause of Variation is rejected.

df

There could be Assignable Causes of Variation or

6.177076626

From the Excel result sheet, the p-value is 0.03.

If p-value is < 0.05, null must be rejected.

Null hypothesis with 97% confidence is rejected.

The fact that variation could only be due to Common

Special Causes of Variation.

6
5

P(F<=f) one-tail 0.033652302


F Critical one-tail 5.050329058

Copyright 2014, Simplilearn, All rights reserved.

62

Hypothesis Testst-Test for Independent Groups


The table shows the measure of avocados in ounces. If a significant difference in their means is found,
it can be concluded that there is a possibility of Special Cause of Variation.

Copyright 2014, Simplilearn, All rights reserved.

Group A (Chef 1)

Group B (Chef 2)

4.2

4.5

4.5

7.2

6.1

5.2

8.9

5.3

5.2

6.1

63

2-Sample t-Test
The steps for conducting 2-sample t-test in MS-Excel are given below:

Select 2-Sample
Independent t-test
assuming unequal
variances.

Open MS Excel,
click Data and click
Data Analysis.

In Variable 2 range,
select the data set
for Group B.

Copyright 2014, Simplilearn, All rights reserved.

Keep the
Hypothesized
Mean Difference
as 0.

In Variable 1 range,
select the data set
for Group A.

Click Ok.

64

Assumptions of 2-Sample Independent t-Test


The assumptions for a 2-Sample Independent t-test are as follows:

Null Hypothesis: There is no significant statistical difference between the means of the two groups,

thus concluding any variation could be because of chance. This is Common Cause of Variation.

Alternate Hypothesis: There is a significant statistical difference between the means of the two
groups, thus concluding variations could be because of assignable causes also. This is Special Cause
of Variation.
H0 : Mean of Group A = Mean of Group B
Ha : Mean of Group A Mean of Group B

The alternate hypothesis tests two conditions, Mean of A < Mean of B and Mean of A > Mean of B. Thus a
two-tailed probability needs to be used.

Copyright 2014, Simplilearn, All rights reserved.

65

2-Tailed vs. 1-Tailed Probability


The difference between the usage of the 2-tailed probability and one-tailed probability are as follows:

2-Tailed Probability

If the alternate hypothesis tests more than


one direction, either less or more, use a 2tailed probability value from the test.

Example: If Mean of A is not equal to Mean of B,


then it is 2-tailed probability.

Copyright 2014, Simplilearn, All rights reserved.

1-Tailed Probability

If the alternate hypothesis tests one


direction, use a 1-tailed probability value
from the test.

Example: If Mean of A is greater than Mean of B,


then it is 1-tailed probability.

66

2-Sample Independent t-TestResults and Interpretations


According to the table:

The p-value of 2-tailed

Variable 2

Mean

6.016666667

5.016666667

This value is greater than

Variance

3.197666667

0.517666667

Observations

Hypothesized Mean

df

T Stat

1.270798616

P(T<=t) one-tail

0.122200546

T Critical one-tail

1.894578605

The null hypothesis is not


rejected.

Variable 1

probability testing is 0.24.

0.05.

t-Test: Two-Sample Assuming Unequal Variances

Both the groups are


statistically same.

Copyright 2014, Simplilearn, All rights reserved.

67

Paired t-Test
The paired t-test is:

one of the powerful tests from the t-test family;

conducted before and after the process to be measured; and

often used in the Improve stage.

For example, a group of students score X in CSSGB before taking the Training program. Post the training
program, the scores are taken again.

One needs to find out if there is a statistical difference between the two sets of scores.

If there is a significant difference, the inference could be that the training was effective.

Copyright 2014, Simplilearn, All rights reserved.

68

Sample Variance
Sample Variance (S2) is the average of the squared differences from the mean.

It is used to calculate and understand the degree of variation of a sample.

In statistics, its value is used by converting it into standard deviation and combining with the
mean.

The steps for calculating sample variance are as follows:


Calculate the mean of
the sample

Copyright 2014, Simplilearn, All rights reserved.

Subtract each of
the value from
mean

Calculate the
square value of
the result

Take average of
the squared
differences

69

Sample VarianceExample
The example to calculate sample variance is as follows:

Consider the sample of weights. Suppose the mean value is 140 and when you subtract each value
from the mean, take the square value of the result, and then take the average of the squared
difference, the resulting sample variance value is 1936.

In order to get the standard deviation, take the square root of the sample variance: 1936 = 44.

The standard deviation along with the mean, will tell you how much the majority of the people
weigh.
o

The mean value is 140 and variance is 44, the majority of people weigh between 96 pounds
(140 - 44) and 184 pounds (140 + 44).

Copyright 2014, Simplilearn, All rights reserved.

70

ANOVAComparison of More Than Two Means


ANOVA:

is used to compare the means of more than two samples;

stands for Analysis of Variance;

helps in understanding that all sample means are not equal;

based shortlisted samples can further be tested; and

generalizes the t-test to include more than two samples.

Copyright 2014, Simplilearn, All rights reserved.

71

ANOVA Example
The table shows the takeaway food delivery time of

Outlet 1

Outlet 2

Outlet 3

three different outlets. To benchmark the delivery

48

50

49

time of the outlets:

49

48

48

the null hypothesis will assume that the three

48

36

39

means are equal; and

53

50

49

58

50

34

50

62

33

46

45

57

50

47

48

49

51

47

47

44

39

rejection of the null hypothesis would mean that at


least two outlets are different in their average

delivery time.

Copyright 2014, Simplilearn, All rights reserved.

72

Using Minitab for ANOVA


To perform ANOVA in Minitab:
1. Stack the data into two

columns.
2. In the main menu, choose
Stat > ANOVA > One-Way.
3. Select the response, delivery
time, factor, and outlet.

4. Click OK.

Copyright 2014, Simplilearn, All rights reserved.

73

Using Minitab for ANOVA (contd.)


The following output is received when the data is fed into the Minitab:

Copyright 2014, Simplilearn, All rights reserved.

74

ANOVA using Excel


To perform ANOVA, enter the data on an Excel spreadsheet, select the ANOVA-single factor test from
the Data Analysis Toolpak, and select the array for analysis and an output range.

Copyright 2014, Simplilearn, All rights reserved.

75

Interpreting Minitab Results


The result of the Minitab ANOVA is interpreted as follows:

Since p-value is more than 0.05, the null hypothesis is accepted.

There is no significant difference between the means of delivery time for three outlets.

Based on the confidence intervals, it is found that the intervals overlap.

In one-way ANOVA, one factor has to be benchmarked unlike the two-way ANOVA.

Copyright 2014, Simplilearn, All rights reserved.

76

Chi-Square Distribution
The Chi-square distribution (-distribution) or Chi-squared:

is a widely used probability distribution in inferential statistics;

needs one sample for the test to be conducted; and

with k-1 degrees of freedom is the distribution of a sum of the squares of k independent standard
normal random variables.

2Calculated =

f0 fe
fe

Where,
2Calculated = chi-square index
Fo = An observed frequency
Fe = An expected frequency
Copyright 2014, Simplilearn, All rights reserved.

77

Chi-Square TestExample
To analyze the Australian hockey teams wins,
the data has two classifications:

The table is called a 2 X 4 contingency


table.

Expected frequency for each of the


observed frequencies = (row total)(column

Estimated Population
Parameters

Sample Statistics

total)/overall total.
Example: Observed frequency of 3 wins
against South Africa in Australia would convert

to expected frequency of (21 / 31) * 5 = 3.39


Copyright 2014, Simplilearn, All rights reserved.

78

Chi-Square TestExample (contd.)


The table is populated by:

calculating and adding the estimated

population parameters;

estimating the observed frequency; and

calculating the final chi-square index.

Copyright 2014, Simplilearn, All rights reserved.

79

Chi-Square TestExample: Interpretation of Results


There is a different chi-square distribution for each different number of degrees of freedom. For chisquare distribution, degrees of freedom are calculated as per the number of rows and columns in the

contingency table.
Degrees of freedom = (2 - 1)*(4-1) = 3
Assuming = 10%, 2Critical = 6.251
2Calculated = 1.36
2
2
Critical divides region into acceptance and rejection zones while Calculated allows
accepting or rejecting the null hypothesis depending on which zone it falls.

The calculated value is found to be less than the critical value.


Copyright 2014, Simplilearn, All rights reserved.

80

Analyze
Topic 4Hypothesis Testing with Non-Normal Data

Copyright 2014, Simplilearn, All rights reserved.

Mann-Whitney Test
Mann-Whitney or Wilcoxon Rank Sum test is a non-parametric test used to compare two unpaired
groups. In this test:

The value of is set as 0.05.

The rejection and acceptance condition remains the same for different cases:

If p<

Reject null hypothesis

If p>

Cannot reject null hypothesis, accept null hypothesis

The aim of this test is to rank the entire data available for each condition and then compare the total
outcome of the two ranks.

Copyright 2014, Simplilearn, All rights reserved.

82

Mann-Whitney Test
The steps to perform Mann-Whitney test are as follows:
Find the average of the
ranks for all the identical
values

Rank all the values from low to


high

The smallest number gets a


rank of 1.
The largest number gets a
rank of n, where n is the total
number of values in the two
groups.

Copyright 2014, Simplilearn, All rights reserved.

Continue till all the wholenumber ranks are used.

Test the values

Summate the ranks for the


observations from sample 1
and then summate the rank
in sample 2 (larger group).

83

Mann-Whitney TestExample
An example of performing Mann-Whitney test is shown here.
Group

G1

G2

Data

Sorted Data

14
2
5
16
9
4
2
18
14
8

Copyright 2014, Simplilearn, All rights reserved.

2
2
4
5
8
9
14
14
16
18

Group
G1
G2
G2
G1
G2
G1
G1
G2
G1
G2

Final Rank

Rank A
1
2
3
4
5
6
7
8
9
10

Avg. = 1.5

Avg. = 7.5

1.5
1.5
3
4
5
6
7.5
7.5
9
10

G1 Rank
(R1)

G2 Rank
(R2)

1.5
4
6
7.5
9
Total = 28
n1 = 5

1.5
3
5
7.5
10
Total = 27
n2 = 5

84

Mann-Whitney TestExample (contd.)


The formula for the Mann-Whitney U test for n1 and n2 values is:

U1 = n1 n2 +

[n1(n1 + 1)]

U2 = n1 n2 +

[n2(n2 + 1)]

2 R1
2 R2

In this example,
U1 = 12 and U2 = 13

Copyright 2014, Simplilearn, All rights reserved.

Mann-Whitney TestExample (contd.)


To calculate the U value:

U = Min (U1, U2) = Min (12, 13) = 12

Lookup the Mann-Whitney U test table for n1 = 5 and n2 = 5.

To be statistically significant, the obtained U value must be equal to or less than this critical value.
Since the calculated U value is 12 (not less than 2), there is no statistical difference between the mean of
the two groups.

Copyright 2014, Simplilearn, All rights reserved.

86

Kruskal-Wallis Test
The Kruskal-Wallis test is also a non-parametric test used for testing the source of origin of the
samples.
Characteristics of Kruskal-Wallis test are as follows:

Only way to analyze the variance by ranks.

Medians of two or more samples are compared to find the source of origin of the sample.

Unlike the analogous one-way analysis of variance, it does not assume the normal distribution of

the residuals.

Null hypothesis is when medians of all the groups are equal, and
Alternative hypothesis is when at least one population median of one group is different than that of at
least one other group.

Copyright 2014, Simplilearn, All rights reserved.

87

Moods Median Test


The Moods median is a non-parametric test that is used to test the equality of medians from two or
more different populations. This test works when:

the output (Y) variable is continuous, discrete-ordinal or discrete-count, and

the input (X) variable is discrete with two or more attributes.

The steps involved in Moods Median test are as follows:

Find median of the


combined data set

Copyright 2014, Simplilearn, All rights reserved.

Find the number


of values in each
sample > median

Form a
contingency
table

Find expected
value for each
cell

Find chi-square
value

88

Friedman Test
Friedman test is a form of non-parametric test that does not make any assumptions on the shape and
origin of the sample.

It allows smaller sample data sets to be analysed, and

Unlike ANOVA, it does not require the dataset to be randomly sampled from normally distributed
populations with equal variances.

The test uses null hypothesis where the population medians of each treatment are statistically identical to
the rest of the group.

Copyright 2014, Simplilearn, All rights reserved.

89

1 Sample Sign Test


The 1 Sample Sign test is the simplest of all the non-parametric tests that can be used instead of a
one sample t test.

Here, H0 is the hypothecated median or assumed median of the sample, which belongs to the
population.

Steps involved in 1 Sample Sign test are as follows:


Count the number of positive
values
Values that are larger than
hypothesized median

Copyright 2014, Simplilearn, All rights reserved.

Count the number of


negative values
Values that are smaller than
the hypothesized median

Test the values

Check if there are significantly


more positives (or negatives)
than expected
90

1 Sample Wilcoxon Test


The 1 Sample Wilcoxon test also known as Wilcoxon Signed Rank test is a non-parametric test.
This test is:

equivalent to parametric One Sample t-Test, and

powerful than non-parametric 1 Sample Sign Test.

Copyright 2014, Simplilearn, All rights reserved.

91

Characteristics of 1 Sample Wilcoxon Test


Some characteristics of this test are as follows:
It assumes the existing sample is randomly taken from a population, with a symmetric frequency

distribution around the median, and


The symmetry can be observed with a histogram, or by checking if the median and mean are

approximately equal.

The conclusion in this test is that if the value is on the mid-point, you can continue and accept the null

hypothesis. If not, reject the alternate hypothesis.

Copyright 2014, Simplilearn, All rights reserved.

92

1 Sample Wilcoxon TestExample


An example of Sample Wilcoxon test is shown.
The Median customer satisfaction score of an organization has always been 3.7 and the management
wants to see if this has changed. They conducted a survey and got the results grouped by the
customer type.

Conclusion:

If median = 3.7 = Accept H0

If median 3.7 = Reject Ha

= 0.05

Copyright 2014, Simplilearn, All rights reserved.

93

Quiz

Copyright 2014, Simplilearn, All rights reserved.

QUIZ
1

Which of the following describes the population parameters based on the sample data
using a particular model?

a.

Statistics

b. Inferential Statistics
c.

Probability

d.

Correlation

Copyright 2014, Simplilearn, All rights reserved.

95

QUIZ
1

Which of the following describes the population parameters based on the sample data
using a particular model?

a.

Statistics

b. Inferential Statistics
c.

Probability

d.

Correlation

Answer: b.
Explanation: Inferential statistics describe the population parameters based on the sample
data using a particular model.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

96

QUIZ
2

Which of the following is an application of the population knowledge to predict the


sample behavior?

a.

Poisson distribution

b. Normal distribution
c.

Chi-square distribution

d.

Probability distribution

Copyright 2014, Simplilearn, All rights reserved.

97

QUIZ
2

Which of the following is an application of the population knowledge to predict the


sample behavior?

a.

Poisson distribution

b. Normal distribution
c.

Chi-square distribution

d.

Probability distribution

Answer: a.
Explanation: Poisson distribution is an application of the population knowledge to predict
the sample behavior.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

98

QUIZ
3

Which of the following is used to calculate the degree of movement of variable Y as X


changes?

a.

Correlation

b. Probability
c.

F-distribution

d.

Regression

Copyright 2014, Simplilearn, All rights reserved.

99

QUIZ
3

Which of the following is used to calculate the degree of movement of variable Y as X


changes?

a.

Correlation

b. Probability
c.

F-distribution

d.

Regression

Answer: d.
Explanation: The degree of movement of variable Y as X changes is calculated using
regression.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

100

QUIZ
4

A null hypothesis states that a process has not improved as a result of some
modifications. The type II error is to conclude that:

a.

we have failed to reject the null hypothesis (H0) when it is true.

b. we have failed to reject the null hypothesis (H0) when it is false.


c.

we have rejected the null hypothesis.

d.

we have made a correct decision with alpha probability.

Copyright 2014, Simplilearn, All rights reserved.

101

QUIZ
4

A null hypothesis states that a process has not improved as a result of some
modifications. The type II error is to conclude that:

a.

we have failed to reject the null hypothesis (H0) when it is true.

b. we have failed to reject the null hypothesis (H0) when it is false.


c.

we have rejected the null hypothesis.

d.

we have made a correct decision with alpha probability.

Answer: b.
Explanation: A type II error means that we have failed to reject the null hypothesis (H0)
when it is false.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

102

QUIZ
5

The test used for testing significance in an analysis of variance table is the:

a.

Z-test.

b. t-test.
c.

F-test.

d.

Chi-square test.

Copyright 2014, Simplilearn, All rights reserved.

103

QUIZ
5

The test used for testing significance in an analysis of variance table is the:

a.

Z-test.

b. t-test.
c.

F-test.

d.

Chi-square test.

Answer: c.
Explanation: The appropriate ANOVA test is the F-test. ANOVA is a test of the equality of
means.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

104

QUIZ
6

Which of the following is the only way to analyze the variance by ranks?

a.

1 Sample Wilcoxon test

b. 1 Sample Size test


c.

Friedman test

d.

Kruskal-Wallis test

Copyright 2014, Simplilearn, All rights reserved.

105

QUIZ
6

Which of the following is the only way to analyze the variance by ranks?

a.

1 Sample Wilcoxon test

b. 1 Sample Size test


c.

Friedman test

d.

Kruskal-Wallis test

Answer: d.
Explanation: The Kruskal-Wallis test is the only way to analyze the variance by ranks.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

106

QUIZ
7

What distribution is used while making inferences about a population variance based
on a single sample from that population?

a.

Chi-square distribution

b. Normal distribution
c.

t-distribution

d.

F-distribution

Copyright 2014, Simplilearn, All rights reserved.

107

QUIZ
7

What distribution is used while making inferences about a population variance based
on a single sample from that population?

a.

Chi-square distribution

b. Normal distribution
c.

t-distribution

d.

F-distribution

Answer: a.
Explanation: The chi-square distribution is used to compare a sample variance with a known
population variance.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

108

QUIZ
8

If p-value is less than the significant value, the null hypothesis has to be:

a.

rejected.

b. accepted.
c.

maintained as it is.

d.

re-evaluated.

Copyright 2014, Simplilearn, All rights reserved.

109

QUIZ
8

If p-value is less than the significant value, the null hypothesis has to be:

a.

rejected.

b. accepted.
c.

maintained as it is.

d.

re-evaluated.

Answer: a.
Explanation: If the p-value is less than the significant value, the null hypothesis has to be
rejected as the data is not supporting the null hypothesis and the difference will be
statistically significant.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

110

QUIZ
9

Which of the following is a nonparametric test that is used to test the equality of
medians from two or more different populations?

a.

Moods median test

b. Kruskal-Wallis test
c.

Friedman test

d.

1 Sample Sign test

Copyright 2014, Simplilearn, All rights reserved.

111

QUIZ
9

Which of the following is a nonparametric test that is used to test the equality of
medians from two or more different populations?

a.

Moods median test

b. Kruskal-Wallis test
c.

Friedman test

d.

1 Sample Sign test

Answer: a.
Explanation: The Moods median is a nonparametric test that is used to test the equality of
medians from two or more different populations.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

112

QUIZ
10

Which of the following is a ratio of two chi-square distributions?

a.

F-distribution

b. t-distribution
c.

Poisson distribution

d.

Binomial distribution

Copyright 2014, Simplilearn, All rights reserved.

113

QUIZ
10

Which of the following is a ratio of two chi-square distributions?

a.

F-distribution

b. t-distribution
c.

Poisson distribution

d.

Binomial distribution

Answer: a.
Explanation: The F-distribution is a ratio of two chi-square distributions.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

114

QUIZ
11

Which of the following is the probability of correctly rejecting the null hypothesis when
it is false?

a.

Simple linear correlation

b. Power of a test
c.

Simple linear regression

d.

1 Sample Sign test

Copyright 2014, Simplilearn, All rights reserved.

115

QUIZ
11

Which of the following is the probability of correctly rejecting the null hypothesis when
it is false?

a.

Simple linear correlation

b. Power of a test
c.

Simple linear regression

d.

1 Sample Sign test

Answer: b.
Explanation: The power of a test is the probability of correctly rejecting the null hypothesis
when it is false.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

116

QUIZ
12

Which of the following assumes that the existing sample is randomly taken from a
population, with a symmetric frequency distribution around the median?

a.

Kruskal-Wallis test

b. Moods median test


c.

1 Sample Wilcoxon test

d.

Friedman test

Copyright 2014, Simplilearn, All rights reserved.

117

QUIZ
12

Which of the following assumes that the existing sample is randomly taken from a
population, with a symmetric frequency distribution around the median?

a.

Kruskal-Wallis test

b. Moods median test


c.

1 Sample Wilcoxon test

d.

Friedman test

Answer: c.
Explanation: 1 Sample Wilcoxon test assumes that the existing sample is randomly taken
from a population, with a symmetric frequency distribution around the median.
Copyright 2012-2014,Simplilearn,All rights reserved
Copyright 2014, Simplilearn, All rights reserved.

118

Summary
Here is a quick
recap of what we
have learned in this
lesson:

Discrete probability distribution is characterized by the probability mass


function and continuous probability distribution is characterized by the
probability density function.

Multi-Vari studies are used to analyze variation in a process.

Correlation means association between variables. Simple Linear Regression


and Multiple Regression are its two main techniques.

Hypothesis testing is conducted on different sets of data. Analysis of


variance is used to compare the means of more than two sample sets.

A t-test is used for 1-sample and 2-sample tests are used for comparing two
means.

Copyright 2014, Simplilearn, All rights reserved.

119

Summary (contd.)
Here is a quick
recap of what we
have learned in this
lesson:

The Mann-Whitney or Wilcoxon Rank Sum test is used to compare two


unpaired groups.

The KruskalWallis Test is used for testing the source of origin of samples.

The Moods median test is used to test the equality of medians from two
or more different populations.

The Friedman test does not make any assumptions on the shape and
origin of the sample.

The 1 Sample Sign test is the simplest of all the non-parametric tests that

can be used in the place of a 1 sample t-test.

Copyright 2014, Simplilearn, All rights reserved.

120

THANK YOU

Copyright 2014, Simplilearn, All rights reserved.


Copyright 2014, Simplilearn, All rights reserved.

Você também pode gostar