16 visualizações

Título original: statistics (1).pptx

Enviado por Sameer Shafqat

statistics (1).pptx

© All Rights Reserved

- Binomial vs. Geometric Distributions
- Class 1 Mathematical Basis for Managerial Decision - Chapter1 - Gaurav
- Binomial Distribution - Wikipedia, The Free Encyclopedia
- TableofContents_H259
- Stat Infer
- MB0040
- Tutorial 4 - Probability Distribution (With Answers)
- M Sc CM Revised 2013
- Mathematics T 954 STPM Lapo 2010
- UT Dallas Syllabus for se3341.501.08f taught by Michael Baron (mbaron)
- Course Outline MA 1050 2014_15_II
- Introduction
- Probability and Stochastics Systems by Yates
- Statistical Methods in Laboratory
- Hypothesis.pdf
- 3880_October_2010
- Binomial Distribution
- l3 probability distributions
- distrib
- Negative Binomial

Você está na página 1de 220

1.1

Statistics

Data Information

numerical facts, collected communicated concerning

together for reference or some particular fact.

information.

Meaning of term statistics

Following three meanings sense

Plural sense

Singular sense

Plural of the word statistics

Plural sense

Any systematically collected data for a specific purpose.

Is describe as the statistics in plural sense

For example: statistics of prices. Road accidents, crime birth, educational

institutions

Singular sense

Describe a body of procedures and techniques used to collect process

and analyze numeric data to make inferences and reach decision.

Plural of the word statistics :

Which mean a numerical quantity calculated from sample observation?

Definition of Statistics:

Definition of Statistics:

1. A collection of quantitative data pertaining to a

subject or group. Examples are blood pressure

statistics etc.

2. The science that deals with the collection,

tabulation, analysis, interpretation, and

presentation of quantitative data

Kinds of Statistics

1) Descriptive Statistics

2) Statistical Inference

Descriptive Statistics

1.5

presenting data in a convenient and informative way.

These methods include:

Graphical Techniques

Numerical Techniques

The actual method used depends on what information

we would like to extract. Are we interested in

measure(s) of central location? and/or

measure(s) of variability (dispersion)?

Statistical Inference

1.6

estimate, prediction, or decision about a population

based on a sample. Sample

Population Inference

Population

Statistic

Parameter

What can we infer about a Populations Parameters

based on a Samples? Statistics

Variables

7

change or take on different values.

Most research begins with a general question about

the relationship between two variables for a

specific group of individuals.

Population

8

population.

For example, a researcher may be interested in the

relation between class size (variable 1) and

academic performance (variable 2) for the

population of third-grade children.

Sample

9

cannot examine the entire group. Therefore, a

sample is selected to represent the population in a

research study. The goal is to use the results

obtained from the sample to help answer questions

about the population.

Types of Variables

11

continuous.

Discrete variables (such as class size) consist of

indivisible categories, and continuous variables

(such as time or weight) are infinitely divisible into

whatever units a researcher may choose. For

example, time can be measured to the nearest

minute, second, half-second, etc.

Real Limits

12

researcher must use real limits which are

boundaries located exactly half-way between

adjacent categories.

Measuring Variables

13

researchers must observe the variables and record

their observations. This requires that the variables

be measured.

The process of measuring a variable requires a set

of categories called a scale of measurement and a

process that classifies each individual into one

category.

4 Types of Measurement Scales

14

categories identified only by name. Nominal

measurements only permit you to determine

whether two individuals are the same or different.

2. An ordinal scale is an ordered set of categories.

Ordinal measurements tell you the direction of

difference between two individuals.

4 Types of Measurement Scales

15

categories. Interval measurements identify the

direction and magnitude of a difference. The zero

point is located arbitrarily on an interval scale.

4. A ratio scale is an interval scale where a value of

zero indicates none of the variable. Ratio

measurements identify the direction and magnitude

of differences and allow ratio comparisons of

measurements.

Experiments

16

cause-and-effect relationship between two

variables; that is, to show that changing the value of

one variable causes changes to occur in a second

variable.

Experiments (cont.)

17

create treatment conditions. A second variable is

observed and measured to obtain scores for a group

of individuals in each of the treatment conditions. The

measurements are then compared to see if there are

differences between treatment conditions. All other

variables are controlled to prevent them from

influencing the results.

In an experiment, the manipulated variable is called

the independent variable and the observed variable

is the dependent variable.

Definitions

1.18

sample.

E.g. student grades.

Typically denoted with a capital letter: X, Y, Z

values for a variable.

E.g. student marks (0..100)

E.g. student marks: {67, 74, 71, 83, 93, 55, 48}

Interval Data

1.19

Interval data

Real numbers, i.e. heights, weights, prices, etc.

Also referred to as quantitative or numerical.

Data, thus its meaningful to talk about 2*Height, or

Price + $1, and so on.

Nominal Data

1.20

Nominal Data

The values of nominal data are categories.

E.g. responses to questions about marital status, coded

as:

Single = 1, Married = 2, Divorced = 3, Widowed = 4

dont make any sense (e.g. does Widowed 2 = Married?!)

Ordinal Data

1.21

values have an order; a ranking to them:

poor = 1, fair = 2, good = 3, very good = 4, excellent = 5

(e.g. does 2*fair = very good?!), we can say things like:

excellent > poor or fair < very good

are assigned to each category.

Graphical & Tabular Techniques for Nominal Data

1.22

count the frequency of each value of the variable.

the categories and their counts called a frequency

distribution.

and the proportion with which each occurs.

Nominal Data (Tabular Summary)

1.23

Nominal Data (Frequency)

1.24

Nominal Data

1.25

(based on the same data).

Just different presentation.

Graphical Techniques for Interval Data

1.26

when the data are interval (i.e. numeric, non-

categorical).

the histogram.

technique used to summarize interval data, but it is

also used to help explain probabilities.

Building a Histogram

1.27

2) Create a frequency distribution for the data.

3) Draw the Histogram.

Histogram and Stem & Leaf

1.28

Ogive

1.29

1) Calculate relative frequencies.

2) Calculate cumulative relative frequencies by

adding the current class relative frequency to the

previous class cumulative relative frequency.

(For the first class, its cumulative relative frequency is just its relative

frequency)

Cumulative Relative Frequencies

1.30

first class

next class: .355+.185=.540

:

:

Ogive

1.31

answer questions like:

is at the 50th percentile?

around $35

(Refer also to Fig. 2.13 in your textbook)

Scatter Diagram

1.32

to what extent the selling price of a home is

related to its size

2) Determine the independent variable (X house

size) and the dependent variable (Y selling

price)

3) Use Excel to create a scatter diagram

Scatter Diagram

1.33

the greater the house size the greater the selling

price

Patterns of Scatter Diagrams

1.34

interested in

Time Series Data

1.35

are called cross-sectional data.

are called time-series data.

plots the value of the variable on the vertical axis

against the time periods on the horizontal axis.

Numerical Descriptive Techniques

1.36

Mean, Median, Mode

Measures of Variability

Range, Standard Deviation, Variance, Coefficient of Variation

Percentiles, Quartiles

Covariance, Correlation, Least Squares Line

Measures of Central Location

1.37

mean, is the most popular & useful measure of

central location.

observations and dividing by the total number of

observations:

Sum of the observations

Mean =

Number of observations

Arithmetic Mean

Sample Mean

Population Mean

1.38

Statistics is a pattern language

1.39

Population Sample

Size N n

Mean

The Arithmetic Mean

1.40

e.g. heights of people, marks of student papers, etc.

outliers. E.g. as soon as a billionaire moves into a

neighborhood, the average household income

increases beyond what it was previously!

Measures of Variability

1.41

story about the distribution; that is, how much are

the observations spread out around the mean

value?

For example, two sets of class grades are

shown. The mean (=50) is the same in

each case

than the blue class.

Range

1.42

calculated as:

E.g.

Data: {4, 4, 4, 4, 50} Range = 46

Data: {4, 8, 15, 24, 39, 50} Range = 46

The range is the same in both cases,

but the data sets have very different

distributions

Statistics is a pattern language

1.43 Population Sample

Size N n

Mean

Variance

Variance

1.44

population mean

population

The variance of size

a population is: sample mean

Note! the denominator is sample size (n) minus one !

Application

1.45

number of jobs six randomly selected students applied

for: 17, 15, 23, 7, 9, 13.

Finds its mean and variance.

randomly selected students applied for: 17, 15, 23, 7,

9, 13.

Finds its mean and variance.

as opposed to or 2

Sample Mean & Variance

Sample Mean

1.46

Sample Variance

Standard Deviation

1.47

the variance, thus:

Standard Deviation

1.48

manufacturer has designed a new club and wants to

determine if it is hit more consistently (i.e. with less

variability) than with an old club.

Using Tools > Data Analysis [may need to add in > Descriptive

Statistics in Excel, we produce the following tables

for interpretation

You get more

consistent distance

with the new club.

The Empirical Rule If the histogram is bell shaped

Approximately 68% of all observations fall

1.49

within one standard deviation of the mean.

within two standard deviations of the mean.

within three standard deviations of the mean.

Chebysheffs TheoremNot often used because interval is very wide.

1.50

deviation is derived from Chebysheff s Theorem,

which applies to all shapes of histograms (not just

bell shaped).

For k=2 in anythesample

(say), theoremthat

states

lie that at least 3/4 of all observations

lie within 2 standard deviations of

within k standard deviations of theThis

the mean. mean is at bound

is a lower least:

compared to Empirical Rules

approximation (95%).

Box Plots

These box plots are

1.51

based on data in Xm04-

15.

shortest and least

variable.

variability, while Jack-in-

the-Box has the longest

service times.

Methods of Collecting Data

1.52

data for statistical analysis. Three of the most

popular methods are:

Direct Observation

Experiments, and

Surveys.

Sampling

1.53

conclusions about a population based on a sample.

population) is often done for reasons of cost (its less

expensive to sample 1,000 television viewers than 100

million TV viewers) and practicality (e.g. performing a

crash test on every automobile produced is impractical).

population should be similar to one another.

Sampling Plans

1.54

specifying how a sample will be taken from a

population.

Stratified Random Sampling, and

Cluster Sampling.

Simple Random Sampling

1.55

a way that every possible sample of the same size

is equally likely to be chosen.

names of the students in the class is an example of

a simple random sample: any group of three names

is as equally likely as picking any other group of

three names.

Stratified Random Sampling

1.56

simple random sampling to generate the complete

sample:

we would draw 100 of them from the low income group

50 of them from the high income group.

Cluster Sampling

1.57

or clusters of elements (vs. a simple random sample of

individual objects).

develop a complete list of the population members or

when the population elements are widely dispersed

geographically.

similarities among cluster members.

Sampling Error

1.58

the population that exist only because of the observations

that happened to be selected for the sample.

different samples (of the same size) is due to sampling

error:

happened to get the highest income level data points in our

first sample and all the lowest income levels in the second,

this delta is due to sampling error.

Nonsampling Error

1.59

mistakes made in the acquisition of data or due to the

sample observations being selected improperly. Three

types of nonsampling errors:

Nonresponse errors, and

Selection bias.

type of error.

Approaches to Assigning

1.60

Probabilities

There are three ways to assign a probability, P(Oi), to

an outcome, Oi, namely:

equally likely, independence) about situation.

experimentation or historical data.

the assignors judgment.

Interpreting Probability

1.61

of times, the relative frequency for any given outcome

is the probability of this outcome.

balanced coin is .5, determined using the classical

approach. The probability is interpreted as being the

long-term relative frequency of heads if the coin is

flipped an infinite number of times.

Conditional Probability

1.62

events are related; that is, we can determine the

probability of one event given the occurrence of

another related event.

and read as the probability of A given B and is

calculated as:

Independence

1.63

probability is to determine whether two events are

related.

independent, that is, if the probability of one event is

not affected by the occurrence of the other event.

P(A|B) = P(A)

or

P(B|A) = P(B)

Complement Rule

1.64

A does not occur.

NOT occurring. That is:

P(AC) = 1 P(A)

the number 1 being rolled is 1/6. The probability that

some number other than 1 will be rolled is 1 1/6 = 5/6.

Multiplication Rule

1.65

probability of two events. It is based on the formula

for conditional probability defined earlier:

Addition Rule

1.66

provide a way to compute the probability of event

A or B or both A and B occurring; i.e. the union of A

and B.

P(A

fromorthe

B) sum

= P(A)

of +

theP(B) P(A and B)

probabilities of A and B?

Addition Rule for Mutually Excusive Events

1.67

event makes the other one impossible. This means that

P(A and B) = 0

calculated from a probability tree

Two Types of Random Variables

1.68

one that takes on a countable number of values

E.g. values on the roll of dice: 2, 3, 4, , 12

one whose values are not discrete, not countable

E.g. time (30.1 minutes? 30.10000001 minutes?)

Analogy:

Integers are Discrete, while Real Numbers are Continuous

Laws of Expected Value

1.69

1. E(c) = c

The expected value of a constant (c) is just the value of

the constant.

2. E(X + c) = E(X) + c

3. E(cX) = cE(X)

We can pull a constant out of the expected value

expression (either as part of a sum with a random

variable X or as a coefficient of random variable X).

Laws of Variance

1.70

1. V(c) = 0

The variance of a constant (c) is zero.

2. V(X + c) = V(X)

The variance of a random variable and a constant is just

the variance of the random variable (per 1 above).

3. V(cX) = c2V(X)

The variance of a random variable and a constant

coefficient is the coefficient squared times the variance of

the random variable.

Binomial Distribution

1.71

that results from doing a binomial experiment.

Binomial experiments have the following properties:

2. Each trial has two possible outcomes, a success and

a failure.

3. P(success)=p (and thus: P(failure)=1p), for all trials.

4. The trials are independent, which means that the

outcome of one trial does not affect the outcomes of

any other trials.

Binomial Random Variable

1.72

successes in n trials of the binomial experiment. It

can take on values from 0, 1, 2, , n. Thus, its a

discrete random variable.

value we use combintorics:

for x=0, 1, 2, , n

Binomial Table

1.73

i.e. what is P(X 4), given P(success) = .20 and

n=10 ?

P(X 4) = .967

Binomial Table

1.74

correct?

i.e. what is P(X = 2), given P(success) = .20 and

n=10 ?

remember, the table shows cumulative probabilities

=BINOMDIST() Excel Function

1.75

can also be used to calculate these probabilities.

For example: # successes

# trials

correct?

P(success)

cumulative

(i.e. P(Xx)?)

P(X=2)=.3020

=BINOMDIST() Excel Function

1.76

can also be used to calculate these probabilities.

For example: # successes

# trials

P(success)

cumulative

(i.e. P(Xx)?)

P(X4)=.9672

Binomial Distribution

1.77

general formulas for the mean, variance, and

standard deviation of a binomial random variable.

They are:

Poisson Distribution

1.78

discrete probability distribution and refers to the

number of events (a.k.a. successes) within a specific time

period or region of space. For example:

The number of cars arriving at a service station in 1 hour.

(The interval of time is 1 hour.)

The number of flaws in a bolt of cloth. (The specific region

is a bolt of cloth.)

The number of accidents in 1 day on a particular stretch of

highway. (The interval is defined by both time, 1 day, and

space, the particular stretch of highway.)

The Poisson Experiment

1.79

four defining characteristic properties:

1. The number of successes that occur in any interval is

independent of the number of successes that occur in

any other interval.

2. The probability of a success in an interval is the same

for all equal-size intervals

3. The probability of a success is proportional to the size

of the interval.

4. The probability of more than one success in an interval

approaches 0 as the interval becomes smaller.

Poisson Distribution

1.80

successes

that occur in a period of time or an interval of space in

a Poisson experiment.

time period

every hour.

textbook edition averages 1.5 per 100 pages.

successes (?!) interval

Poisson Probability Distribution

1.81

assumes a value of x is given by:

FYI:

Example 7.12

1.82

of textbooks varies considerably from book to

book. After some analysis he concludes that the

number of errors is Poisson distributed with a mean

of 1.5 per 100 pages. The instructor randomly

selects 100 pages of a new book. What is the

probability that there are no typos?

There is about a 22% chance of finding zero errors

Poisson Distribution

1.83

the interval

pages, we can determine a mean value for a 400 page

book as:

Example 7.13

1.84

there are

no typos?

P(X=0) =

there is a very small chance there are no typos

Example 7.13

1.85

Probability Density Functions

1.86

in Chapter 7, a continuous random variable is one

that can assume an uncountable number of values.

We cannot list the possible values because there

is an infinite number of them.

Because there is an infinite number of values, the

probability of each individual value is virtually 0.

Point Probabilities are Zero

1.87

probability of each individual value is virtually 0.

values only.

meaningful to talk about P(X=5), say.

In a continuous setting (e.g. with time as a random variable), the

probability the random variable of interest, say task length, takes

exactly 5 minutes is infinitesimally small, hence P(X=5) = 0.

It is meaningful to talk about P(X 5).

Probability Density Function

1.88

(over the range a x b if it meets the following

requirements:

f(x)

area=1

a b x

The Normal Distribution

1.89

probability distributions. The probability density

function of a normal random variable is given by:

Bell shaped,

Symmetrical around the mean

The Normal Distribution

1.90

The normal distributionis fully defined by two parameters:

its standard deviation and mean

symmetrical about the mean

Normal distributions range from minus infinity to plus infinity

Standard Normal Distribution

1.91

deviation is one is called

0

the standard normal distribution.

1

converted to a standard normal distribution with simple

algebra. This makes calculations much easier.

Calculating Normal Probabilities

1.92

normal random variable to a standard normal

random variable

draw a picture!

Calculating Normal Probabilities

1.93

normally distributed with a mean of 50 minutes and a

standard deviation of 10 minutes:

a time between 45 and 60 minutes?

Calculating Normal Probabilities

1.94

mean of 50 minutes and a

standard deviation of 10 minutes

P(45 < X < 60) ?

0

Calculating Normal Probabilities

1.95

Appendix B to look-up

probabilities P(0 < Z < z)

P(.5 < Z < 0) + P(0 < Z < 1)

P(.5 < Z < 0) = P(0 < Z < .5)

Hence: P(.5 < Z < 1) = P(0 < Z < .5) + P(0 < Z < 1)

Calculating Normal Probabilities

1.96

This table gives probabilities P(0 < Z < z)

First column = integer + first decimal

Top row = second decimal place

Using the Normal Table (Table 3)

1.97

0 1.6

= .5 .4452

= .0548

Using the Normal Table (Table 3)

1.98

-2.23 0 2.23

= .5 P(0 < Z < 2.23)

= .0129

Using the Normal Table (Table 3)

1.99

P(Z < 0) = .5 P(0 < Z < 1.52)

0 1.52

= .5 + .4357

= .9357

Using the Normal Table (Table 3)

1.100

0 0.9 1.9

P(0.9 < Z < 1.9) = P(0 < Z < 1.9) P(0 < Z < 0.9)

=.4713 .3159

= .1554

Finding Values of Z

1.101

Z.05 = 1.645

Z.01 = 2.33

Using the values of Z

1.102

that we can state

Similarly

P(-1.645 < Z < 1.645) = .90

Other Continuous Distributions

1.103

will be used extensively in later sections are

introduced here:

Student t Distribution,

Chi-Squared Distribution, and

F Distribution.

Student t Distribution

1.104

variable, hence the name. The density function for

the Student t distribution is as follows

(Gamma function) is (k)=(k-1)(k-2)(2)(1)

Student t Distribution

1.105

distribution, , the degrees of freedom, defines the

Student

t Distribution:

Figure 8.24

As the number of degrees of freedom increases, the t

distribution approaches the standard normal distribution.

Determining Student t Values

1.106

inference. Table 4 in Appendix B lists values of

degrees of freedom such that:

critical values, typically in the

10%, 5%, 2.5%, 1% and 1/2% range.

Using the t table (Table 4) for values

1.107

we under

wantthe

thecurve

value of t with 10

value (tA) : COLUMN

degrees of freedom such that the area under the

Student t curve is .05:

t.05,10

t.05,10=1.812

F Distribution

1.108

weve already seen these are again degrees of

freedom.

is the numerator degrees of freedom and

is the denominator degrees of freedom.

Determining Values of F

1.109

area under the right hand tail of the curve, with a

numerator degree of freedom of 3 and a

denominator

There are different tables degree of freedom of 7?

for different values of A.

Make Solution:

sure you start use

with the F look-up (Table 6)

the correct table!!

F.05,3,7=4.35

F.05,3,7

Denominator Degrees of Freedom : ROW

Numerator Degrees of Freedom : COLUMN

Determining Values of F

1.110

the curve, we can leverage the following

relationship:

CHAPTER 9

Sampling Distributions

1.111

Sampling Distribution of the Mean

1.112

with the random variable X = # of spots on any throw.

Thexprobability

1 2distribution

3 of4X is: 5 6

P(x) 1/6 1/6 1/6 1/6 1/6 1/6

Sampling Distribution of Two Dice

1.113

all samples of size n=2 (i.e. two dice) and their means

only 11 values for , and some (e.g. =3.5) occur more

frequently than others (e.g. =1).

Sampling Distribution of Two Dice

1.114

The

P( sampling

) distribution of

6/36

is shown below:

1.0 1/36 5/36

1.5 2/36

2.0 3/36

4/36

2.5 4/36

)

3.0 5/36

3.5 6/36 3/36

P(

4.0 5/36

4.5 4/36 2/36

5.0 3/36

5.5 2/36

6.0 1/36 1/36

1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Compare

1.115

1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

Central Limit Theorem

1.116

sample drawn from any population is approximately

normal for a sufficiently large sample size.

sampling distribution of X will resemble a normal

distribution.

Central Limit Theorem

1.117

distributed for all values of n.

approximately normal only for larger values of n.

may be sufficiently large to allow us to use the

normal distribution as an approximation for the

sampling distribution of X.

Sampling Distribution of the Sample

1.118

Mean

1.

2.

approximately normal for sufficiently large sample sizes.

Note: the definition of sufficiently large depends on

the extent of nonnormality of x (e.g. heavily skewed;

multimodal)

Example 9.1(a)

1.119

the amount of soda in each 32-ounce bottle is

actually a normally distributed random variable,

with a mean of 32.2 ounces and a standard

deviation of .3 ounce.

probability that the bottle will contain more than

32 ounces?

Example 9.1(a)

1.120

distributed and =32.2 and =.3

soda contains more than 32oz.

Example 9.1(b)

1.121

the amount of soda in each 32-ounce bottle is

actually a normally distributed random variable,

with a mean of 32.2 ounces and a standard

deviation of .3 ounce.

the probability that the mean amount of the four

bottles will be greater than 32 ounces?

Example 9.1(b)

1.122

distributed

with =32.2 and =.3

Things we know:

1) X is normally distributed, therefore so will X.

2) = 32.2 oz.

3)

Example 9.1(b)

1.123

the probability that the mean amount of the four

bottles will be greater than 32 ounces?

bottles will exceed 32oz.

Graphically Speaking mean=32.2

1.124

what is the probability that one bottle will what is the probability that the mean of

contain more than 32 ounces? four bottles will exceed 32 oz?

Sampling Distribution: Difference of two means

1.125

difference between two sample means. This requires:

two normal populations

difference between the two sample means, i.e.

will be normally distributed.

(note: if the two populations are not both normally

distributed, but the sample sizes are large (>30), the

distribution of is approximately normal)

Sampling Distribution: Difference of two means

1.126

distribution of are given by:

mean:

standard deviation:

between two means)

Estimation

1.127

hypothesis testing; estimation is introduced first.

approximate value of a population parameter on

the basis of a sample statistic.

the population mean ( ).

Estimation

1.128

approximate value of a population parameter on

the basis of a sample statistic.

Point Estimator

Interval Estimator

Point & Interval Estimation

1.129

income of a class of business students. For n=25 students,

is calculated to be 400 $/week.

The mean income is between 380 and 420 $/week.

Estimating when is known

the confidence

1.130 interval

We established in Chapter 9:

in the center of the

Thus, the probability that the interval: interval

confidence interval estimator for .

Four commonly used confidence

1.131

levels

Confidence Level

cut & keep handy!

Table 10.1

Example 10.1

1.132

over 25 time235periods:

374 309 499 253

421 361 514 462 369

394 439 348 344 330

261 374 302 466 535

386 316 296 332 334

lead time is 75 computers. We want to estimate the

mean demand over lead time with 95% confidence in

order to set inventory levels

CALCULATE

Example 10.1

1.133

following pieces of data: Calculated from the data

370.16

1.96

75

Given

n 25

therefore:

The lower and upper confidence limits are 340.76 and 399.56.

INTERPRET

Example 10.1

1.134

lies between 340.76 and 399.56 we can use this as

input in developing an inventory policy.

lead time falls between 340.76 and 399.56, and this

type of estimator is correct 95% of the time. That also

means that 5% of the time the estimator will be

incorrect.

19 times out of 20, which emphasizes the long-run

aspect of the confidence level.

Interval Width

1.135

For example, suppose we estimate with 95% confidence

that an accountants average starting salary is between

$15,000 and $100,000.

of starting salaries between $42,000 and $45,000.

accounting students more precise information about

starting salaries.

Interval Width

1.136

function of the confidence level, the population

standard deviation, and the sample size

Selecting the Sample Size

1.137

the sample size necessary to produce narrow intervals.

within 5 units; i.e. we want to the interval estimate to

be:

Since:

It follows that

Solve for n to get requisite sample size!

Selecting the Sample Size

1.138

estimate of the mean (5 units), we need to sample

865 lead time periods (vs. the 25 data points we

have currently).

Sample Size to Estimate a Mean

1.139

estimate a population mean with an interval

estimate of:

Example 10.2

1.140

diameter of trees to determine whether or not there

is sufficient lumber to harvest an area of forest.

They need to estimate this to within 1 inch at a

confidence level of 99%. The tree diameters are

normally distributed with a standard deviation of 6

inches.

Example 10.2

1.141

Things we know:

1 that

We are given = 6.

Example 10.2

1.142

We compute

1

That is, we will need to sample at least 239 trees to

have a

99% confidence interval of

Nonstatistical Hypothesis Testing

1.143

the statistics.

In a trial a jury must decide between two hypotheses. The

null hypothesis is

H0: The defendant is innocent

H1: The defendant is guilty

The jury does not know which hypothesis is true. They must

make a decision on the basis of evidence presented.

Nonstatistical Hypothesis Testing

1.144

A Type I error occurs when we reject a true null

hypothesis. That is, a Type I error occurs when the

jury convicts an innocent person.

null hypothesis. That occurs when a guilty defendant

is acquitted.

Nonstatistical Hypothesis Testing

1.145

(Greek letter alpha). The probability of a type II

error is (Greek letter beta).

Decreasing one increases the other.

Nonstatistical Hypothesis Testing

1.146

1. There are two hypotheses, the null and the alternative

hypotheses.

2. The procedure begins with the assumption that the null

hypothesis is true.

3. The goal is to determine whether there is enough

evidence to infer that the alternative hypothesis is true.

4. There are two possible decisions:

Conclude that there is enough evidence to support the

alternative hypothesis.

Conclude that there is not enough evidence to support

the alternative hypothesis.

Nonstatistical Hypothesis Testing

1.147

Type I error: Reject a true null hypothesis

Type II error: Do not reject a false null

hypothesis.

P(Type I error) =

P(Type II error) =

Concepts of Hypothesis Testing (1)

1.148

and the other the alternative or research hypothesis. The usual

notation is:

pronounced

H nought

The null hypothesis (H0) will always state that the parameter

equals the value specified in the alternative hypothesis (H1)

Concepts of Hypothesis Testing

1.149

during assembly lead time) again. Rather than estimate

the mean demand, our operations manager wants to

know whether the mean is different from 350 units. We

can rephrase this request into a test of the hypothesis:

H0: = 350

This is what we are interested

H1: 350 in determining

Concepts of Hypothesis Testing (4)

1.150

alternative hypothesis

(also stated as: rejecting the null hypothesis in favor of the

alternative)

the alternative hypothesis

(also stated as: not rejecting the null hypothesis in favor of

the alternative)

NOTE: we do not say that we accept the null hypothesis

Concepts of Hypothesis Testing

1.151

next step is to randomly sample the population and

calculate a test statistic (in this example, the sample mean).

hypothesis we reject the null hypothesis and infer that the

alternative hypothesis is true.

For example, if were trying to decide whether the mean is

not equal to 350, a large value of (say, 600) would

provide enough evidence. If is close to 350 (say, 355) we

could not say that this provides a great deal of evidence to

infer that the population mean is different than 350.

Types of Errors

1.152

hypothesis (i.e. Reject H0 when it is TRUE)

H0 T F

Reject I

Reject II

hypothesis (i.e. Do NOT reject H0 when it is FALSE)

Recap I

1.153

2) ASSUME H0 is TRUE

3) GOAL: determine if there is enough evidence to

infer that H1 is TRUE

4) Two possible decisions:

Reject H0 in favor of H1

NOT Reject H0 in favor of H1

5) Two possible types of errors:

Type I: reject a true H0 [P(Type I)= ]

Type II: not reject a false H0 [P(Type II)= ]

Example 11.1

1.154

billing system will be cost-effective only if the mean

monthly account is more than $170.

which the sample mean is $178. The accounts are

approximately normally distributed with a standard

deviation of $65.

effective?

Example 11.1

1.155

balance for all customers is greater than $170.

parameter of interest)

Example 11.1

1.156

H1: > 170

H0: = 170 (well assume this is true)

We know:

n = 400,

= 178, and

= 65

Example 11.1

1.157

approaches:

computing statistics manually), and

computer and statistical software).

Example 11.1 Rejection Region

1.158

the test statistic falls into that range, we decide to

reject the null hypothesis in favor of the alternative

hypothesis.

Example 11.1

1.159

170.

) we want

Example 11.1

1.160

Since our sample mean (178) is greater than the critical value we

calculated (175.34), we reject the null hypothesis in favor of H1, i.e.

that: > 170 and that it is cost effective to install the new billing

system

Example 11.1 The Big Picture

1.161

H 0: = 170 =178

Reject H0 in favor of

Standardized Test Statistic

1.162

statistic:

H1

PLOT POWER CURVE

1.163

p-Value

1.164

a test statistic at least as extreme as the one

computed given that the null hypothesis is true.

is the probability of observing a sample mean at

least as extreme as the one already observed (i.e.

= 178), given that the null hypothesis (H0: = 170)

is true?

p-value

Interpreting the p-value

1.165

exists to support the alternative hypothesis.

If the p-value is less than 1%, there is overwhelming

evidence that supports the alternative hypothesis.

If the p-value is between 1% and 5%, there is a strong

evidence that supports the alternative hypothesis.

If the p-value is between 5% and 10% there is a weak

evidence that supports the alternative hypothesis.

If the p-value exceeds 10%, there is no evidence that

supports the alternative hypothesis.

We observe a p-value of .0069, hence there is

overwhelming evidence to support H1: > 170.

Interpreting the p-value

1.166

significance level:

be small enough to reject the null hypothesis.

null hypothesis.

of H1

Chapter-Opening Example

1.167

the mean payment period. Thus, the parameter to be

tested is the population mean. We want to know

whether there is enough statistical evidence to show that

the population mean is less than 22 days. Thus, the

alternative hypothesis is

H1: < 22

H0: = 22

Chapter-Opening Example

1.168

The x

test statistic is

z

/ n

alternative only if the sample mean and hence the

value of the test statistic is small enough. As a result we

locate the rejection region in the left tail of the

sampling distribution.

We set the significance level at 10%.

Chapter-Opening Example

1.169

z z z.10 1.28

Rejection region:

x

x

4,759

i

21 .63

and 220 220

x 21 .63 22

z .91

/ n 6 / 220

p-value = P(Z < -.91) = .5 - .3186 = .1814

Chapter-Opening Example

1.170

that the mean is less than 22.

will be profitable.

We fail to reject Ho: > 22

at a 10% level of significance.

PLOT POWER CURVE

1.171

Right-Tail Testing

1.172

compare against the observed value of the sample

mean ( )

Left-Tail Testing

1.173

compare against the observed value of the sample

mean ( )

TwoTail Testing

1.174

research hypothesis that a parameter is not equal

() to some value

Example 11.2

1.175

AT&Ts argues that its rates are such that customers wont

see a difference in their phone bills between them and their

competitors. They calculate the mean and standard

deviation for all their customers at $17.09 and $3.87

(respectively).

a monthly phone bill based on competitors rates.

H1: 17.09. We do this by assuming that:

H0: = 17.09

Example 11.2

1.176

hypothesis when the test statistic is large or when it is small.

in the rejection region must sum to , so we divide this

probability by 2.

Example 11.2

1.177

/2 = .025. Thus, z.025 = 1.96 and our rejection

region is:

z

-z.025 0 +z.025

Example 11.2

1.178

We find that:

Since z = 1.19 is not greater than 1.96, nor less than 1.96

we cannot reject the null hypothesis in favor of H1. That is

there is insufficient evidence to infer that there is a difference

between the bills of AT&T and the competitor.

PLOT POWER CURVE

1.179

Summary of One- and Two-Tail Tests

1.180

(left tail) (right tail)

Inference About A Population

Population

[SIGMA UNKNOWN]

1.181

Sample

Inference

Statistic

Parameter

population parameters:

Population Mean

Population Variance

Population Proportion p

Inference With Variance Unknown

1.182

population mean when the population standard

deviation ( ) was known or given:

variance?

Testing when is unknown

1.183

and the population is normal, the test statistic for

testing hypotheses about is:

of freedom. The confidence interval estimator of

is given by:

Example 12.1

1.184

experienced workers within one week of being

hired and trained?

packages/hour, thus if our conjecture is correct, we

expect new workers to be able to process .90(500)

= 450 packages per hour.

IDENTIFY

Example 12.1

1.185

numbers of packages processed in 1 hour by new

workers, that is we want to know whether the new

workers productivity is more than 90% of that of

experienced workers. Thus we have:

H1 : > 450

H0 : = 450

COMPUTE

Example 12.1

1.186

freedom. Our hypothesis under question is:

H 1: > 450

Our rejection region becomes:

alternative if our calculated test static falls in this region.

COMPUTE

Example 12.1

1.187

and thus:

Since

evidence to conclude that the new workers are

producing at more than 90% of the average of

experienced workers.

IDENTIFY

Example 12.2

1.188

companies that won quality awards?

companies. We want to construct a 95% confidence

interval for the mean return, i.e. what is:

??

COMPUTE

Example 12.2

1.189

and so:

Check Requisite Conditions

1.190

the population is nonnormal, the results of the t-test and

confidence interval estimate are still valid provided that

the population is not extremely nonnormal.

and see how bell shaped the resulting figure is. If a

histogram is extremely skewed (say in the case of an

exponential distribution), that could be considered

extremely nonnormal and hence t-statistics would be

not be valid in this case.

Inference About Population Variance

1.191

populations variability, the parameter we need to investigate

is the population variance:

efficient point estimator for . Moreover,

Testing & Estimating Population

1.192

Variance

Combining this statistic:

IDENTIFY

Example 12.3

1.193

wants a machine to fill 1 liter (1,000 ccs) so that that

variance of the fills is less than 1 cc2. A random sample

of n=25 1 liter fills were taken. Does the machine

perform as it should at the 5% significance level?

Variance is less than 1 cc2

We want to show that:

H1 : <1

(so our null hypothesis becomes: H0: = 1). We will

use this test statistic:

COMPUTE

Example 12.3

1.194

H1: <1

falls into this rejection region:

And thus our test statistic takes on this value

Example 12.4

1.195

favor of the alternative. That is, there is not enough

evidence to infer that the claim is true.

Note: the result does not say that the variance is

greater than 1, rather it merely states that we are

unable to show that the variance is less than 1.

variance of the fills

COMPUTE

Example 12.4

1.196

the variance, we need these formulae:

calculation, and we have from Table 5 in Appendix

B:

Comparing Two Populations

1.197

test parameters for one population:

Population Mean , Population Variance

We will still consider these parameters when we are

looking at two populations, however our interest will

now be:

The difference between two means.

The ratio of two variances.

Difference of Two Means

1.198

between two population means, we draw random

samples from each of two populations. Initially, we

will consider independent samples, that is, samples

that are completely unrelated to one another.

use the statistic:

Sampling Distribution of

1.199

populations are normal or approximately normal if

the populations are nonnormal and the sample sizes are

large (n1, n2 > 30)

3. The variance of is

Making Inferences About

1.200

populations are normal or approximately normal if

the populations are nonnormal and the sample sizes are

large (n1, n2 > 30), then:

random variable. We could use this to build test

statistics or confidence interval estimators for

Making Inferences About

1.201

since the population variances are unknown.

??

the unknown population variances: when we believe

they are equal and conversely when they are not equal.

When are variances equal?

1.202

are equal?

cant know for certain whether theyre equal, but we

can examine the sample variances and informally

judge their relative values to determine whether we

can assume that the population variances are equal

or not.

Test Statistic for (equal

1.203

variances)

1) Calculate the pooled variance estimator as

degrees of freedom

CI Estimator for (equal

1.204

variances)

The confidence interval estimator for

when the population variances are equal is given

by:

Test Statistic for (unequal variances)

1.205

variances are unequal is given by:

degrees of freedom

IDENTIFY

Example 13.2

1.206

chairs. Assembly times are recorded (25 times for

each method). At a 5% significance level, do the

assembly times for the two methods differ?

COMPUTE

Example 13.2

1.207

recorded and preliminary data is prepared

The sample variances are similar, hence we will assume that the

population variances are equal

COMPUTE

Example 13.2

1.208

rejection region will be:

region) becomes:

COMPUTE

Example 13.2

1.209

calculate the pooled variance estimator, followed by

the t-statistic

INTERPRET

Example 13.2

1.210

rejection region, we cannot reject H0 in favor of H1, that

is, there is not sufficient evidence to infer that the mean

assembly times differ.

INTERPRET

Example 13.2

1.211

information

Compare

or look at p-value

Confidence Interval

1.212

for the difference in mean assembly times as:

two assembly methods between .36 and .96 minutes.

Note: zero is included in this confidence interval

Matched Pairs Experiment

1.213

examined independent samples.

matched with an observation in a second sample,

this is called a matched pairs experiment.

example 13.4

Identifying Factors

1.214

Inference about the ratio of two

1.215

variances

So far weve looked at comparing measures of central

location, namely the mean of two populations.

ratio of the variances, i.e. the parameter of interest to us is:

degrees of freedom.

Inference about the ratio of two

1.216

variances

Our null hypothesis is always:

H0 :

hence their ratio will be one)

df1 = n1 - 1

df2 = n2 - 1

IDENTIFY

Example 13.6

1.217

samples of people who consumed high fiber cereal and

those who did not and assumed they were not equal.

We can use the ideas just developed to test if this is in

fact the case.

(the variances are not equal to each other)

CALCULATE

Example 13.6

1.218

We are doing a two-tailed test, and our rejection

region is:

F

CALCULATE

Example 13.6

1.219

.58 1.61 F

hypothesis in favor of the alternative; that is, there is a

difference in the variance between the two populations.

INTERPRET

Example 13.6

1.220

drawing conclusions

Our research hypothesis

H1:

requires two-tail testing,

but Excel only gives us values

for one-tail testing

If we double the one-tail p-value Excel gives us, we have the p-value of

the test were conducting (i.e. 2 x 0.0004 = 0.0008). Refer to the text

and CD Appendices for more detail.

- Binomial vs. Geometric DistributionsEnviado porRocket Fire
- Class 1 Mathematical Basis for Managerial Decision - Chapter1 - GauravEnviado porlakshay187
- Binomial Distribution - Wikipedia, The Free EncyclopediaEnviado pordonodoni0008
- TableofContents_H259Enviado porsivatejamundlamuri
- Stat InferEnviado porMohamed Amir
- MB0040Enviado porAbdul Rahman
- Tutorial 4 - Probability Distribution (With Answers)Enviado porLiibanMaahir
- M Sc CM Revised 2013Enviado porShambhu Sah
- Mathematics T 954 STPM Lapo 2010Enviado porFu Hong
- UT Dallas Syllabus for se3341.501.08f taught by Michael Baron (mbaron)Enviado porUT Dallas Provost's Technology Group
- Course Outline MA 1050 2014_15_IIEnviado porAmanuel Q. Mulugeta
- IntroductionEnviado porjomari_santos
- Probability and Stochastics Systems by YatesEnviado porBrandi Nguyen
- Statistical Methods in LaboratoryEnviado poromar
- Hypothesis.pdfEnviado porkamalkant05
- 3880_October_2010Enviado porrubina.sk
- Binomial DistributionEnviado porssd13
- l3 probability distributionsEnviado porapi-287224366
- distribEnviado porKathleen Grace Maniago
- Negative BinomialEnviado porJitendra K Jha
- Answers Exam 1Enviado porZhastalap Aldanysh
- Statistics in 3 PagesEnviado porJean Chan
- orca_share_media1549486790745Enviado porThania Saligan
- 5.5.4 Homework AssignmentEnviado porTino Tino
- 61 SolutionEnviado porTom Rodney
- Math276-Week3annot.pdfEnviado porLuan Ozelim
- Lecture 8 (Notes)Enviado porillustra7
- seminarski.docxEnviado poroptimus 07
- Chapter 1Enviado porAitham Anil Kumar
- 213f15lec2 (1)Enviado pormarkydee_20

- Project Fyp SohaEnviado porSameer Shafqat
- Dept. of Marketing Cumilla UniversityEnviado porSameer Shafqat
- College of Business Administration American Univeristy of EmiratesEnviado porSameer Shafqat
- Department of Marketing University of DhakaEnviado porSameer Shafqat
- Template 03Enviado porSameer Shafqat
- ASSIGNMENT NO.docxEnviado porSameer Shafqat
- FYP 2 CostEnviado porSameer Shafqat
- Final ProjectEnviado porSameer Shafqat
- 235367279 Survey Questionnaires in the Study of Study Habits of StudentsEnviado porSameer Shafqat
- Journals DetailsEnviado porSameer Shafqat
- Hassan Farid(056)Enviado porSameer Shafqat
- Dentonic Report 2Enviado porSameer Shafqat
- Fast Food Consumption and Obesity Final ReportEnviado porSameer Shafqat
- data.docxEnviado porSameer Shafqat
- Faculty at Management Department.docxEnviado porSameer Shafqat
- dslv_agbEnviado porSameer Shafqat
- Interest Free BankingEnviado porSameer Shafqat
- case wac 8Enviado porSameer Shafqat
- PROTECTION+OF+ECONOMIC+REFORMS+ACT+1992.doc.pdfEnviado porHaseeb Javid
- lecture # 8Enviado porSameer Shafqat
- Assignment 5Enviado porSameer Shafqat
- Mirex.docxEnviado porSameer Shafqat
- compensation.docxEnviado porSameer Shafqat
- Final 495.docEnviado porSameer Shafqat
- Sample QTB ProjectEnviado porSameer Shafqat
- Garments Stitching Unit Mens Dress TrousersEnviado porSameer Shafqat
- SIC - T2 1 SIC Research Landscape ReportEnviado porSameer Shafqat
- httpsEnviado porSameer Shafqat
- Air Bus a319Enviado porSameer Shafqat
- BarterEnviado porSameer Shafqat

- SAP Parts - TBM Cutter Roller SealsEnviado porPrasad
- 1718SEM1-E1701Enviado porSevilla Chea
- Lawachara NP of BangladeshEnviado pormakzabid
- Trikhanda MudraEnviado porSathis Kumar
- Week 5 Money NarrativeEnviado porDiana Chita
- Schneider - English in MalaysiaEnviado porDaniel Oon Wei Rhen
- FG201734.pdfEnviado porAlinour Amena
- Business Model CanvasEnviado porSergio Andrés Paz
- 9781292036243Enviado porJuanda Rendy
- Paper ACI Coupling Beams Canbolat-Parra-WightEnviado porOvidiu Cristian
- WWII Civil Air Patrol U-Boat StoryEnviado porCAP History Library
- Ips SlidesEnviado porLutfiana
- Usb Epic Manv3.2Enviado porybourini
- Normal MenstruationEnviado porAnish Veettiyankal
- Garment Seam Slippage Test ProcedureEnviado porAnonymous jVpMr5bQ
- Price UltihihgEnviado porjudg
- DS70600 InterruptsEnviado portreax
- Tales of Madness, Miracles, Death and SalvationEnviado porValentina Ferracioli
- DIGITAL-TO-DIGITAL CONVERSIONEnviado porLocrian Ionian
- Power Barges.docxEnviado porpatlincuna
- Quotation 132kv Ct Pt 33kv PtEnviado porSharafat Ali
- Cyclic Concept of NatureEnviado porJessica Prias Moscardon
- An Experimental Analysis of Turning Operation in SAE E52100 AlloyEnviado porEditor IJTSRD
- THE ANALYSIS OF MEANING - GROUP 5TH - THEORY OF TRANSLATION.pptxEnviado porarum
- Dream Kitchens & Baths - Fall Winter 2015.pdfEnviado porElena Elena
- Section 5 Family Planning.docxEnviado porLaurenLiu
- Bread and Pastry Production Nc II - 1st Edition 2016Enviado porDonald Bose Mandac
- Developing ADO.NET and OLE DB Applications_db2ane90Enviado porapi-3763796
- Pastpaper Pk Nts Gat Subject Test Sample Paper of ManagementEnviado porMuhammad Arslan Usman
- Solucionario Capitulo 38 - Paul E. TippensEnviado porCarlos Francisco Corado