Você está na página 1de 4

Notes

Unit 2: Basics: Data Description

Table of Contents Description Working with Data Central Values for Data Variability The Standard Deviation The Coefficient of Variation Two Variables Relationships Between Two Variables Correlation

Page 1 of 4

Notes

Unit 2: Basics: Data Description

Description Working with Data


With any data set we encounter, we must find ways to allow the data to tell their story. Ordering and graphing data sets often expose patterns and trends, thus helping us to learn more about the data and the underlying situation. If data can provide insight into a situation, they can help us to make the right decisions. Organize data into groups to allow direct comparison of data within a particular group or across different groups. Arrange data graphically to reveal patterns of data behavior and to help identify outliers. One useful type of graph is a histogram. Investigate outliers before making a decision to leave them alone, remove them, or change them.

Central Values for Data


To summarize a data set using a single value, we can choose one of three values: the mean, the median, or the mode. They are often called summary statistics or descriptive statistics. All three give a sense of the 'center' or 'central tendency' of the data set, but we need to understand how they differ before using them: The mean is the average value of the data set. The median is the middle value of the data set: half of all values in the data set fall above the median, and half below. The mode is the most common value in the data set.

Page 2 of 4

Notes Variability

Unit 2: Basics: Data Description

The Standard Deviation


The standard deviation measures how much data vary about their mean value. The standard deviation is the square root of the variance.
mean
1
2

The larger the standard deviation, the more the data are spread out from the mean. Standard deviations and variances are easy to compute using Excel's built-in functions.

The Coefficient of Variation


The coefficient of variation expresses the standard deviation as a fraction of the mean. We can use it to compare variation in different data sets of different scales or units. Coefficient of Variation =

Page 3 of 4

Notes

Unit 2: Basics: Data Description

Two Variables Relationships Between Two Variables


Plotting two variables helps us see relationships between two data sets. But even when relationships exist, we still need to be skeptical: is the relationship plausible? An apparent relationship between two variables may simply be coincidental, or may stem from a relationship each variable has with a third, often hidden variable. Plotting two variables on a scatter diagram can help illustrate the relationship between them. When one variable is time, the relationship is known as a time series. A relationship is not proof of causality. Be alert to the possibility of hidden variables.

Correlation
The correlation coefficient characterizes the strength and direction of a linear relationship between two data sets. The value of the correlation coefficient ranges between -1 and +1. A correlation coefficient near +1 or -1 indicates that the two variables have a strong positive or negative linear relationship, respectively. A correlation coefficient near zero indicates a weak or nonexistent linear relationship. A coefficient near zero does not prove there is no relationship between the two variables; it indicates only that any relationship that does exist is not linear. Outliers can unduly influence the calculation of the correlation coefficient, making the correlation much higher or lower than what it would be without the outlying points.

Page 4 of 4

Você também pode gostar