Você está na página 1de 13

SEARCH:

0 phpdig.html search.php

Go

PreviousHomeNext Home Measurement

Levels of Measurement
The level of measurement refers to the relationship among the values that are assigned to the attributes for a variable. What does that mean? Begin with the idea of the variable, in this example "party affiliation." That variable has a number of attributes. Let's assume that in this particular election context the only relevant attributes are "republican", "democrat", and "independent". For purposes of analyzing the results of this variable, we arbitrarily assign the values 1, 2 and 3 to the three attributes. The level of measurement describes the relationship among these three values. In this case, we simply are using the numbers as shorter placeholders for the lengthier text terms. We don't assume that higher values mean "more" of something and lower numbers signify "less". We don't assume the the value of 2 means that democrats are twice something that republicans are. We don't assume that republicans are in first place or have the highest priority just because they have the value of 1. In this case, we only use the values as a shorter name for the attribute. Here, we would describe the level of measurement as "nominal".

Why is Level of Measurement Important?


First, knowing the level of measurement helps you decide how to interpret the data from that variable. When you know that a measure is nominal (like the one just described), then you know that the numerical values are just short codes for the longer names. Second, knowing the level of measurement helps you decide what statistical analysis is appropriate on the values that were assigned. If a measure is nominal, then you know that you would never average the data values or do a t-test on the data.

There are typically four levels of measurement that are defined:


Nominal Ordinal Interval Ratio

In nominal measurement the numerical values just "name" the attribute uniquely. No ordering of the cases is implied. For example, jersey numbers in basketball are measures at the nominal level. A player with number 30 is not more of anything than a player with number 15, and is certainly not twice whatever number 15 is. In ordinal measurement the attributes can be rank-ordered. Here, distances between attributes do not have any meaning. For example, on a survey you might code Educational Attainment as 0=less than H.S.; 1=some H.S.; 2=H.S. degree; 3=some college; 4=college degree; 5=post college. In this measure, higher numbers mean more education. But is distance from 0 to 1 same as 3 to 4? Of course not. The interval between values is not interpretable in an ordinal measure. In interval measurement the distance between attributes does have meaning. For example, when we measure temperature (in Fahrenheit), the distance from 30-40 is same as distance from 70-80. The interval between values is interpretable. Because of this, it makes sense to compute an average of an interval variable, where it doesn't make sense to do so for ordinal scales. But note that in interval measurement ratios don't make any sense - 80 degrees is not twice as hot as 40 degrees (although the attribute value is twice as large). Finally, in ratio measurement there is always an absolute zero that is meaningful. This means that you can construct a meaningful fraction (or ratio) with a ratio variable. Weight is a ratio variable. In applied social research most "count" variables are ratio, for example, the number of clients in past six months. Why? Because you can have zero clients and because it is meaningful to say that "...we had twice as many clients in the past six months as we did in the previous six months." It's important to recognize that there is a hierarchy implied in the level of measurement idea. At lower levels of measurement, assumptions tend to be less restrictive and data analyses tend to be less sensitive. At each level up the hierarchy, the current level includes all of the qualities of the one below it and adds something new. In general, it is desirable to have a

higher level of measurement (e.g., interval or ratio) rather than a lower one (nominal or ordinal). PreviousHomeNext Copyright 2006, William M.K. Trochim, All Rights Reserved Purchase a printed copy of the Research Methods Knowledge Base Last Revised: 10/20/2006

Home Table of Contents Navigating Foundations Sampling Measurement o Construct Validity o Reliability o Levels of Measurement o Survey Research o Scaling o Qualitative Measures o Unobtrusive Measures Design Analysis Write-Up Appendices Search

Descriptive Statistics
(Lecture Three)

Dr. Lari Arjomand Introduction: The purpose of this lecture is to help you to understand conceptually the meanings of measures of locations (i.e., mean, median, and mode) and measures of variability (i.e., range, variance, standard deviation, and coefficient of variation). Measures of Location for Ungrouped or Raw Data: Measures of location give information about location in a group of numbers or data. The measures of location presented in this lecture note for ungrouped (raw) data are the mean, the median, and the mode. Arithmetic Mean: The arithmetic mean (or the average or simply mean) is computed by summing all numbers and dividing by the number of observations. For example, to compute the arithmetic mean of a sample of numbers, such as 19, 20, 21, 23, 18, 25, and 26, first sum the numbers: (19+20+21+23+18+25+26) = 152, and then calculate the sample mean by dividing this total (152) by the number of observations (7), which gives a mean of 21.7 or about 22. The mean uses all the observations and each observation affects the mean. Even though the mean is sensitive to extreme values (i.e., extremely large or small data can cause the mean to be pulled toward the extreme data) it is still the most widely used measure of location. This is due to the fact that the mean has valuable mathematical properties that make it convenient for use with inferential statistics analysis. For example, the sum of the deviations of the numbers in a set of data from the mean is zero, and the sum of the squared deviations of the numbers in a set of data from the mean is minimum value. These points will be explained in detail in lecture number 14. Weighted Mean: In some cases the data in the sample or population should not be weighted equally,

and each value weighted according to its importance. For example, suppose Lari wants to find his average in stat course, and assume that the exams are weighted as follows: First Test..............100 Points.....15% Second Test..........100 Points.....20% Third Test.............100 Points......25% Final Test.............100 Points......30% Assignments.........050 Points.....10% Availabe Points.....450 Points......100% Assume Lari made 90, 71, 87, 77, and 40 on first test, second test, third test, final exam, and the assignments, respectively. Larie's average in the stat course is calculated as follows: (90x0.15+71x0.20+87x0.25+77x0.30+40x0.10)/(0.15+0.20+0.25+0.30+.010)=76.55 or 77 points. Median The median is the middle value in an ordered array of observations. If there is an even number of data in the array, the median is the average of the two middle numbers. If there is an odd number of data in the array, the median is the middle number. For example, suppose you want to find the median for the following set of data: 74, 66, 69, 68,73, 70 First, we arrange the data in an ordered array: 66, 68, 69, 73, 70, 74 Since there is an even number of data, the average of the middle two numbers (i.e., 69 and 73) is the median (142/2 = 71). Note that in general, location of the median is=(n+1)/2 where n=total number of items. Generally, the median provides a better measure of location than the mean when there are some extremely large or small observations (i.e., when the data are skewed to the right or to the left). For this reason, median income is used as the measure of location for the U.S. household's income. Note that if the median is less than the mean, the data set is skewed to the right (i.e., data having lower limit but not upper limit will result in positively skewed to the right). If the median is greater than the mean, the data set is skewed to the left (data having upper limit but no lower limit will result in negatively skewed to the left). Median does not have important mathematical properties for use in future calculations. See the following figure:

Mode: The mode is the most frequently occurring value in a set of observation. For example, given 2, 3, 4, 5, 4, the mode is 4, because there are more fours than any other number. Data may have two modes. In this case we say the data are bimodal, and observations with more than two modes are referred to as multimodal. Note that the mode does not have important mathematical properities for future use. Also, the

mode is not a helpful measure of location, because there can be more than one mode or even no mode. Measures of Variability for Ungrouped or Raw Data: Measures of variability represent the dispersion of a set of data. For example, let's go back the Lari's grade in the stat course: Lari made 90, 71, 87, 77, and 40 on first test, second test, third test, final exam, and the assignments, respectively. Remember that Lari's average in the course was 77. What does this average score mean to Lari? Should he be satisfied with this information? Measure of location (mean in this case) does not produce or grant sufficient or adequate information to describe the data set. What is needed is a measure of variability of the data. Note that a small value for a measure of dispersion indicates that the data are around the mean; therefore, the mean is a good representative of the data set. On the other hand, a large measure of dispersion indicates that the mean is not a good representative of the data set. Also, measures of dispersion can be used when we want to compare the distributions of two or more sets of data. In this lecture we will talk about range, variance, standard deviation, and coefficient of variation for ungrouped or raw data. Range: The range is the difference between the largest observation of a data set and the smallest observation. The major disadvantage of the range is that it does not include all of the observations. Only the two most extreme values are included and these two numbers may be untypical observations. For example, given that the ages for a sample of 8 students at CSC are: 24, 18, 22, 19, 25, 20, 23, and 21, the range for this data set is: 25 - 18 = 7. Variance: An important measure of variability is variance. Variance is the average of the squared deviations from the arithmetic mean. For example, suppose that the height (in inches) of a sample of students at CSC are as follows: Height in inches 66 73 68 69 74 The following steps are used to calculate the variance: 1. Find the arithmetic mean. 2. Find the difference between each observation and the mean. 3. Square these differences. 4. Sum the squared differences. 5. Since the data is a sample, divide the number (from step 4 above) by the number of observations minus one, i.e., n-1 (where n is equal to the number of observations

in the data set). Later on, this term (n-1) will be called the degrees of freedom. Following the above steps, the variance is calculated as follows: Height.............Deviation..............Square (Inches)........................................Deviation 66....................66-70= - 4.............16 73....................73-70= +3..............09 68....................68-70= - 2..............04 69....................69-70= - 1..............01 74....................74-70= +4..............16 Total of column one = 350, and total of column three = 46 Arithmetic mean = (350)/(5) = 70 inches and variance = (46)/(5-1) = 11.5 squared inches. As you see in the above example, the variance is not expressed in the same units as the observations. In other words, the variance is hard to understand because the deviations from the mean are squared, making it too large for logical explanation. These problems can be solved by working with the square root of the variance, which is called standard deviation. Standard Deviation: Both variance and standard deviation provide the same information; one can always be obtained from the other. In other words, the process of computing a standard deviation always involves computing a variance. As we said, since standard deviation is the square root of the variance, it is always expressed in the same units as the raw data. For example, in the above problem the variance was 11.5 square inches. The standard deviation is the square root of 11.5 which is equal to 3.4 inches (expressed in same units as the raw data). Meaning of Standard Deviation: One way to explain the standard deviation as a measure of variation of a data set is to answer questions such as how many measurements are within one, two, and three standard deviations from the mean. To answer questions such as this, we need to talk about empirical rule and Chebyshev's rule. The following rules present the guidelines to help answer the questions of how many measurements fall within 1, 2, and 3 standard deviations. Empirical Rule: This rule generally applies to mound-shaped data, but specifically to the data that are normally distributed, i.e., bell shaped. The rule is as follows: Approximately 68% of the measurements (data) will fall within one standard deviation of the mean, 95% fall within two standard deviations, and 97.7% (or almost 100% ) fall within three standard deviations. See the following figure:

For example, in the height problem, the mean height was 70 inches with a standard deviation of 3.4 inches. Thus, 68% of the heights fall between 66.6 and 73.4 inches, one standard deviation, i.e., (mean + 1 standard deviation) = (70 + 3.4) = 73.4, and (mean - 1 standard deviation) = 66.6. Ninety five percent (95%) of the heights fall betweeen 63.2 and 76.8 inchesd, two standard deviations. Ninety nine and seven tenths percent (99.7%) fall between 59.8 and 80.2 inches, three standard deviations. See the following figure:

Z Score: We can pick any point on the X axis in the above figure and find out how many standard deviations above or below the mean that point falls. In other words, a Z score represents the number of standard deviations an observation (X) is above or below the mean. The larger the Z value, the further away a value will be from the mean. Note that values beyond three standard deviations are very unlikely. Note that if a Z score is negative, the observation (X) is below the mean. The Z score is found by using the following relationship: Z = (a given value - mean) / standard deviation For example, for a data set that is normally distributed with a mean of 25 and a standard deviation of 5, you want to find out the Z score for a value of 35. This value (X = 35) is 10 units above the mean, with a Z value of: Z = (35 - 25)/(5) = (10)/(5) = +2 This Z score shows that the raw score (35) is two standard deviations above the mean. Would you be pleased with a grade in this course that is 2 standard deviations above the mean of the class? The topic of Z score will be discussed in more detail in lecture note six. Chebyshev's Rule: Chebyshev's rule applies to any sample of measurements regardless of the shape of their distribution. The rule states that: It is possible that none of the measurements will fall within one standard deviation of the mean. At least 75% (or 3/4) of the measurements will fall within two standard deviations of the mean, and 89% (or 8/9) of the measurements will fall within three standard deviations of the mean. Generally, according to this rule, at least 1 - (1/k squared) of the measurements will fall within [(mean + - (k) standard deviation)], i.e., within k standard deviation of the mean, where k is any number greater than one. For example, if k = 2.8, at least .87 of all values fall within (mean + - 2.8 x standard deviation), because 1 - (1/k squared) = 1 - (1/7.84) = 1 - 0.13 = 0.87. Coefficient of Variation: We said that standard deviation measures the variation in a set of data. For distributions having the same mean, the distribution with the largest standard deviation has the greatest variation. But when considering distributions with different means, decision makers can't compare the uncertainty in distribution only by comparing standard deviations. In this case, the coefficient of variation is used, i.e., the coefficients of variation for different distributions are compared, and the distribution with the largest coefficient of varation value has the greatest relative variation.

The coefficient of variation expresses the standard deviation as a percentage of the mean, i.e., it reflects the variation in a distribution relative to the mean: Coefficient of Variation (C.V.) = (standard deviation / mean) x 100

For example, Mark teaches two sections of statistics. He gives each section a different test covering the same material. The mean score on the test for the day section is 27, with a standard deviation of 3.4. The mean score for the night section is 74 with a standard deviation of 8.0. Which section has the greatest variation or dispersion of scores? Day Section....................Night Section Mean.......27.......................94 S.D............03.4..................08.0 Direct comparison of the two standard deviations shows that the night section has the greatest variation. But comparing the coefficient of variations show quite different results: C.V.(day) = (3.4/27) x 100 = 12.6% and C.V.(night) = (8/94) x 100 = 8.5% Thus, based on the size of the coefficient of variation, Mark finds that the night section test results have a smaller variation relative to its mean than do the day section test results.

Links related to this lecture

All contents copyright (c) 1996. All rights reserved. Updated:

Você também pode gostar