Você está na página 1de 65

Quiz 2

Quiz 2 Exam
Date: At the end of W4b power point
Time: At Start of Class
Venue: In Class
PLEASE BRING/BORROW A NOTEBOOK/TABLET

1. What is Data Analysis ?


Data analysis is statistics + visualization + know-how
Many of the methods for data analysis are based on
multivariate statistics, which poses an additional problem to the
beginner: multivariate statistics cannot be understood without a
profound knowledge of simple statistics.
Furthermore, several fields in science and engineering have
developed their own nomenclature assigning different names to
the same concepts.
One has to gather considerable knowledge and experience in
order to perform the analysis of data efficiently.
Possible applications of statistical methods can be in the fields
of medicine, engineering, quality inspection, election polling,
analytical chemistry, physics, gambling etc

1. What is Data Analysis ? (cont)


Statistics and statistical methodology as the basis of data
analysis are concerned with two basic types of problems:
1. Summarizing, describing, and exploring the data
2. Using sampled data to infer the nature of the process which
produced the data
The first type of problems is covered by descriptive statistics,
the second part is covered by inferential statistics.
Another important aspect of data analysis is the data, which can
be of two different types: qualitative data, and quantitative
data. Qualitative data does not contain quantitative information.
Qualitative data can be classified into categories. In contrast,
quantitative data represent an amount of something.

1. What is Data Analysis ? (cont)


A third distinction can be made according to the number of
variables involved in the data analysis.
If only one variable is used, the statistical procedures are
summarized as univariate statistics. More than one variable
result in multivariate statistics.
A special case of multivariate statistics with only two variables is
sometimes called bivariate statistics.

2. Descriptive Statistics
Statistics plays an important role in the description of mass
phenomena.
It offers methods to summarize a collection of data. These
methods may be numerical or graphical, both of which have
their own advantages and disadvantages.
Graphical methods are better suited for the recognition of
patterns in the data, whereas numerical methods give welldefined measures of some properties.
In general, it is recommended to use both approaches for the
description of data. A field which is closely related to
descriptive statistics, is exploratory data analysis.

3. Inferential Statistics
Inferential statistics is used to draw conclusions
about a data set. Usually this means drawing
inferences about a population from a sample either by
estimating some relationships or by testing some
hypothesis.
Population: A Population is the set of all possible states
of a random variable. The size of the population may be
either infinite or finite.
Sample: A Sample is a subset of the population; its size
is
always finite.

4. When do you need statistical calculations?


When analyzing data, your goal is simple: You wish to
make the strongest possible conclusion from limited
amounts of data. To do this, you need to overcome two
problems:
Important differences can be obscured by biological
variability and experimental imprecision. This makes it
difficult to distinguish real differences from random
variability.
The human brain excels at finding patterns, even from
random data. Our natural inclination (especially with
our
own data) is to conclude that differences are real,
and to
minimize the contribution of random variability.
Statistical rigor prevents you from making this mistake.

4. When do you need statistical calculations?


Statistical analyses are necessary when observed
differences are small compared to experimental
imprecision and biological variability. When you work with
experimental systems with no biological variability and little
experimental error, heed these aphorisms:
If you need statistics to analyze your experiment, then
you've done the wrong experiment.
If your data speak for themselves, don't interrupt!
But in many fields, scientists/engineers can't avoid large
amounts of variability, yet care about relatively small
differences. Statistical methods are necessary to draw
valid conclusions from such data.

These are Abbys science test scores.


86

97

84

73

63

88

97

100

95

What can you tell us about these


numbers?
86
73
97

97
63
100

84
88
95

What is the MEAN?


How do we find it?
The mean is the numerical average
of the data set.
The mean is found by adding all the
values in the set, then dividing the
sum by the number of values.

97
84
88

Lets find Abbys


MEAN science test
score?

783

The mean is 87

100

95
63
73
86
97

783

What is the MEDIAN?


How do we find it?
The MEDIAN is the number that is in the
middle of a set of data
1. Arrange the numbers in the set in
order from least to greatest.
2. Then find the number that is in the
middle.

63 73 84 86 88 95 97 97 100

The median is 88.


Half the numbers are

Half the numbers are

less than the median.

greater than the median.

Median
Sounds like
MEDIUM
Think middle when you hear median.

How do we find
the MEDIAN
when two numbers are in the middle?

1. Add the two numbers.


2. Then divide by 2.

63 73 84 88 95 97 97 100

88 + 95 = 183

183

The median is
91.5

What is the MODE?


How do we find it?
The MODE is the piece of data that
occurs most frequently in the data set.
A set of data can have:

One mode
More than one mode
No mode

63 73 84 86 88 95 97 97 100

The value 97 appears twice.


All other numbers appear just once.

97 is the MODE

A Hint for remembering the MODE


The first two letters give you a hint

Most Often

MOde

Which set of data has ONE MODE?


A

9, 11, 16, 6, 7, 17, 18

18, 7, 10, 7, 18

9, 11, 16, 8, 16

Which set of data has NO MODE?


A

9, 11, 16, 6, 7, 17, 18

18, 7, 10, 7, 18

13, 12, 12, 11, 12

Which set of data has


MORE THAN ONE MODE?
A
B
C

9, 11, 16, 8, 16
9, 11, 16, 6, 7, 17, 18

18, 7, 10, 7, 18

What is the RANGE?


How do we find it?
The RANGE is the difference between the
lowest and highest values.

63 73 84 86 88 95 97 97

97
-63
34

34 is the RANGE
or spread
of this set of data

What is the RANGE of this set of data?

99

48
97

84

86
71

88

What is the RANGE of this set of data?


48 71

84 86 88 97 99

99
-48
51

What is the RANGE of this set of data?

17

48
15

33

46
67

85

What is the RANGE of this set of data?


15 17 33

46 48

85
-15
70

67 85

What is the RANGE of this set of data?


267
357

119

329
401

227

483

What is the RANGE of this set of data?


119 227 267 329 357 401 483

483
-119
364

This one is the requires more


work than the others.
Right in the
MIDDLE.
This one is the easiest to
find Just LOOK.

Find the.

Find the.

Find the.

Find the.

Find the.

Figure 2.1 Samples with defectives (black squares).


46

Standard Deviation and


Variance

47

Standard Deviation and


Variance

48

Standard Deviation and


Variance

Standard Deviation and


Variance

50

Standard Deviation and


Variance

51

Normal distribution (Fig 4.3, p. 67)

Normal distribution extremely useful!


Used for many statistical tests and calculations
provided the observations follow a normal distribution.
68% of observations lie within 1SD of the mean.
~95% of observations lie within 2SD (1.96) of the mean
52

Distribution of data
Image from: http://www.southalabama.edu/coe/bset/johnson/lectures/lec15.htm

Normal:
Left (Negatively) skewed:
Right (Positively) skewed:
Normality Test:

Mean = Median = Mode


Mean Median Mode
Mode Median Mean
53

Using the data below, fill up the descriptive statistics form from
the excel output.

1. Step To Perform Hypothesis Testing


A. State hypothesis
B. Determine the appropriate test
C. Choosing / Determine risk (, ) and sample size, n
D. Determine rejection region
E. Conduct the sampling experiment and calculate the value
test statistics
F.

Interpret results and make conclusion

2. Null Hypothesis
Denoted by Ho
A proposed relationship between 2 or more parameters
(Usually a status quo)
Usually what is wished to be disproved
Example:
Ho : 1= 2
There are no differences in population average for solder
weight between the 2 machines

3. Alternative Hypothesis
Denoted by Ha
Opposite to the null hypothesis
What we wish to establish as true
Example:
HA : 1 2 The population average for sample 1 is not equal to the
population average of sample 2
1 > 2 The population average for capacitor shear for the high
adhesive weight is greater than the low adhesive weight.
1 2 The population average for glass length for machine 2 is less
than for machine 1

4. Test Statistics
A sample statistics is used to decide whether to reject the
null hypothesis
Calculated from the sample data
A critical value of the test statistics for which the null
hypothesis will be rejected
If null hypothesis is rejected, alternative hypothesis
accepted as true

4. Test Statistics (cont)


Rejection Region
A table value (Inverse value at level) is used to define a
region by which the null hypothesis will be rejected
Depends on
Conclusion
Reject Ho when calculated test statistics table value
Fail to reject Ho when calculated test statistics < table value

5. Consequences
Since decision to accept of reject Ho is based on a sample
difference containing some variation, there is a possibility of
making an incorrect decisions. The terminology and
probabilities for this risk is categorized below:
Accept Ho
Reject Ho

Ho True

Ho False

Correct Decision
Probability=1-

Type II Error
Probability=

Type I Error
Correct Decisions
Probability=
Probability=1-
: rejecting Ho when it is true : rejecting Ho when it is false

5. Consequences (cont)

Ho:=30
Ho:30

/2

Risk exists in hypothesis testing because we are using


sample population to draw conclusion on the population

5. Consequences (cont)
Example when evaluating 2 process pressure.
Null hypothesis states that there are no significant difference in defect level
Alternative hypothesis states that there is significant differences in defect
level between the 2 pressures.
Error

Reality

Conclusion

Consequences

Type I

Pressure does not


effect defects level

It does affects
defects level

Try to control
pressure to
reduce defects
level waste of
time

Type II

Pressure does effects It does not affect


defects level
pressure

Missed
opportunity for
defects
reduction

6. One Sided vs. Two Sided Tests


Making a decision is - statistically speaking - the selection of a
threshold value on a one-dimensional scale (i.e. on the
argument axis of the probability density plot)
We have to distinguish between two cases: first, one may ask
whether a property lies above or below a pre-defined
threshold (left figure). In this case we have to apply a onesided test.
Secondly, one may be interested to determine whether a
property is within certain boundaries. This question involves
setting two boundaries and asks for the probability of an
event lying either between or outside these boundaries (right
figure). In this case we have to apply a two-sided test.

6. One Sided vs. Two Sided Tests (cont)

One Sided

Two Sided

Example:
Does the iron concentration of an iron ore exceed 30 %? [one sided
test]
Is the pizza you ordered from the pizza service well done (it could
be raw, or burnt, or well done) [two sided test]

6. NORMALITY TEST

http://sdittami.altervista.org/shapirotest/ShapiroTest.html

Você também pode gostar