Você está na página 1de 67

Herramientas estadsticas

What is statistics?
Statistics is the science of describing or making inferences about the world from a sample of data.

Statistics

Descriptive

Inferential

Descriptive Statistics
Descriptive statistics are methods for organizing and summarizing data.

Inferential Statistics
Two main methods:

1. Estimation 2. Hypothesis testing

Definitions
A variable is a characteristic or condition that can change or take on different values. Datum is one observation about the variable being measured. Data are a collection of observations.

The goal of statistics is to help researchers organize and interpret the data.

TYPES OF VARIABLES
VARIABLES QUALITATIVE QUANTITATIVE

NOMINAL

ORDINAL

INTERVAL

RATIO

Discrete

Continuous

Qualitative variables

Qualitative variable
Qualitative, nominal or categorical variable is data that comprises of categories that cannot be rank ordered each category is just different.

Examples:
What is your gender?
(please tick)

What is your favorite team?


(please tick)

Male Female

Real Madrid

Barcelona
None

Ordinal qualitative variable


Example:
How satisfied are you with the level of service you have received? (please tick)
Very satisfied

Somewhat satisfied
Neutral Somewhat dissatisfied Very dissatisfied

Quantitative variables

Interval variables
Interval variables measured on a continuous scale and has no true zero point. Examples: Time moves along a continuous measure or seconds, minutes and so on and is without a zero point of time. Temperature moves along a continuous measure of degrees and is without a true zero.

Ratio variables
Ratio data measured on a continuous scale and does have a true zero point. Examples: Age Weight Height Ratio data measured on a discrete scale and does have a true zero point. Example: Number of children

Hierarchical data order


These levels of measurement can be placed in hierarchical order.

Ratio Interval Ordinal

Nominal

Population
The entire group of individuals is called the population.

Population

Sample
Usually populations are so large that a researcher cannot examine the entire group. Therefore, a sample is selected to represent the population in a research study.

Population

Sample

Why sample?
Measuring all units is impractical, if not impossible. Sampling just a few units saves money. Sampling just a few units saves time. Some measurements are destructive.

Parameter versus Statistic


A descriptive value for a population is called a parameter and a descriptive value for a sample is called a statistic.
Population Sample
Statistic Parameter

How to organize the data?

Design matrix
Sex Female Male Male Male Age 23 43 19 23 Smoke Yes Yes Not Yes Country USA Colombia Brazil Brazil Married Yes Yes Yes Not

Female
Female Male Male Female

56
78 54 76 43

Not
Yes Not Yes Not

Canada
USA Spain Colombia Peru

Yes
Yes Not Not Yes

9 Individuals

5 Variables

Dimension 9 x 5

Statistic tools

Tables
One way frequency table Number of passangers 2 4 5 6 7 8 Total Absolute frequency 2 23 41 18 8 1 93 Relative frequency 2/93 23/93 41/93 18/93 8/93 1/93

For nominal, ordinal and discrete variables.

Tables
Two way frequency table
Sex\ Hobby

Dance 2 16 18

Sports 10 6 16

TV 8 8 16

Total 20 30 50

Men Women Total

For nominal, ordinal and discrete variables.

Tables
Frequency table

Age
10-14 15-19 20-24

Absolute frequency 2 16 18

Relative frequency 5 40 45 7.5

25-29
30-34 Total
For quantitive variables.

3
1 40

2.5 100

Graphs

Bar chart Pie chart Pictograms Histogram Density plot Scatter plot Time series plot Boxplot
26

Graphs

For nominal, ordinal and discrete variables.


27

Graphs

For nominal, ordinal and discrete variables.


28

Graphs
Statistic pictograms

Do not recommended
29

Graphs

Only for numerical variables


30

Graphs

Only for numerical variables


31

Graphs

Only for numerical variables


32

Graphs examples on web

33

Recommended book

http://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.html#_Toc334380216
34

A cartoon

35

Recommended videos

http://www.youtube.com/watch?v=nUJNstRFvvo
http://www.youtube.com/watch?v=ETbc8GIhfHo

36

Measures of Central Tendency


A measure of central tendency is a value that represents a typical, or central, entry of a data set. The three most commonly used measures of central tendency are the mean, the median, and the mode.

37

Mean
The mean of a data set is the sum of the data entries divided by the number of entries.

Population mean:

x N
mu

Sample mean:

x x n

x-bar

38

Mean
Example: the following are the ages of all seven employees of a small company: 53 32 61 57 39 44 57

Calculate the population mean.

x 343 N 7

Add the ages and divide by 7.

49 years

The mean age of the employees is 49 years.


39

Median
The median of a data set is the value that lies in the middle of the data when the data set is ordered. If the data set has an odd number of entries, the median is the middle data entry. If the data set has an even number of entries, the median is the mean of the two middle data entries. Example: calculate the median age of the seven employees. 53 32 61 57 39 44 57 57 61
40

To find the median, sort the data. 32 39 44 53 57

The median age of the employees is 53 years.

Mode
The mode of a data set is the data entry or category that occurs with the greatest frequency. If no entry is repeated, the data set has no mode. If two entries occur with the same greatest frequency, each entry is a mode and the data set is called bimodal.
Example: find the mode of the ages of the seven employees. 53 32 61 57 39 44 57

The mode is 57 because it occurs the most times. An outlier is a datum that is far from the other in the data set.
41

Weighted Mean
A weighted mean is the mean of a data set whose entries have varying weights. A weighted mean is given by

(x w ) x w
where w is the weight of each entry x.

42

Weighted Mean
Example: grades in a statistics class are weighted as follows. Tests are worth 50% of the grade, homework is worth 30% of the grade and the final is worth 20% of the grade. A student receives a total of 80 points on tests, 100 points on homework, and 85 points on his final. What is his current grade?

Weighted Mean
Begin by organizing the data in a table.
Source Score, x Weight, w xw

Tests Homework Final

80 100 85

0.50 0.30 0.20

40 30 17

x (x w ) 87 0.87 w 100
The students current grade is 87%.

Shapes of distributions

Histogram

Density

Shapes of distributions
A frequency distribution is symmetric when a vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are approximately the mirror images.

Shapes of distributions
A frequency distribution is uniform (or rectangular) when all entries, or classes, in the distribution have equal frequencies. A uniform distribution is also symmetric.

Shapes of distributions
A frequency distribution is skewed if the tail of the graph elongates more to one side than to the other. A distribution is skewed left (negatively skewed) if its tail extends to the left. A distribution is skewed right (positively skewed) if its tail extends to the right.

Measures of Variation

49

The mean is a good indicator of the central tendency of a set of data, but it does not provide the whole picture about the data set. Example 1: comparison of the distribution of two data sets Mean 7 7 Median 7 7

Data set A: Data set B:

5 1

6 2

7 7

8 12

9 13

50

Example 2: Suppose that in a hospital, each patients pulse rate is taken in the morning, at noon, and in the evening. On a certain day, pulse rate for Mean Median Patient A: 72 76 74 74 74

Patient B:

72

91

59

74

72

Note: Mean pulse rate is same for both the patients. While patient As pulse rate is stable, patient Bs fluctuates widely.

51

Range
The range of a data set is the difference between the maximum and minimum date entries in the set. Range = (Maximum data entry) (Minimum data entry) Example: The following data are the closing prices for a certain stock on ten successive Fridays. Find the range. Stock 56 56 57 58 61 63 63 67 67 67

The range is 67 56 = 11.

Population Variance and Standard Deviation


The population variance of a population data set of N entries is Population variance =
sigma squared
( )2 =
2

The population standard deviation of a population data set

of N entries is the square root of the population variance.


Population standard deviation =
sigma
53

( )2

Sample Variance and Standard Deviation


The sample variance of a sample data set of n entries is Sample variance =
S squared
2 ( ) 2 = 1

The sample standard deviation of a sample data set of n entries is the square root of the sample variance.
( )2 1
54

Sample standard deviation =


S

Interpreting Standard Deviation


When interpreting standard deviation, remember that is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation.

14
12

14

Frequency

10 8 6 4 2 0 2 4

Frequency

x =4 s = 1.18

12 10 8
6 4 2 0

x =4 s=0

Data value

Data value

6
55

Galton board

Recomended video and applet: http://www.youtube.com/watch?v=6YDHBFVIvIs http://www.disfrutalasmatematicas.com/datos/quincunce.html

Normal Distribution
The most widely used distribution is the normal distribution, also known as the Gaussian distribution.

Random variation of many physical measurements are normally distributed.


The location and spread of the normal are independently determined by mean () and standard deviation ().

Characteristics of the normal curve


It is symmetrical -- Half the cases are to one side of the center; the other half is on the other side.
The distribution is single peaked, not bimodal or multi-modal Also known as the Gaussian distribution

Characteristics of the normal curve


Most of the cases will fall in the center portion of the curve and as values of the variable become more extreme they become less frequent, with "outliers" at the "tail" of the distribution few in number.

Empirical Rule
P( < X < + ) = 0.6827 P( 2 < X < + 2) = 0.9545 P( 3 < X < + 3) = 0.9973

Standard Normal Distribution


A normal random variable with = 0 and 2 = 1 is called a standard normal random variable and is denoted as Z.

N(0,1)

Example: Standard Normal Distribution


Assume Z is a standard normal random variable. Find P(Z -0.86).

On table

Answer: 0.1949

Example: Standard Normal Distribution


Assume Z is a standard normal random variable. Find P(Z 1.37).

On table

Answer: 0.9147

Example: Standard Normal Distribution


Assume Z is a standard normal random variable. Find P(-1.25 Z 0.37).

On table

Answer: 0.6443 - 0.1056 = 0.5387

Example: Standard Normal Distribution


Assume Z is a standard normal random variable. Find P(Z > -1.23).

On table

Answer: 1- 0.1093 = 0.8907

Standardizing

A Practical Example: Your company packages sugar in 1 kg bags.

When you weigh a sample of bags you get these results: 1007gr, 1032gr, 1002gr, 983gr, 1004gr, ... (a hundred measurements) Mean = 986 gr Standard Deviation = 20 gr

What is the probability that a bag has a weigh of 985 gr or less?

< 990 = <

985 1010 = < 1.25 = 0.1056 20

Você também pode gostar