Você está na página 1de 11

CONTENTS

Contents...........................................................................................................................................1

Descriptive and inferential statistics ...............................................................................................2

Descriptive statistics........................................................................................................................2

Descriptive measures.......................................................................................................................3

Central tendency .........................................................................................................................3

Measures of central tendency...................................................................................................3

Variability ...................................................................................................................................4

Measures of variability ...........................................................................................................4

Important concerns in descriptive statistics.....................................................................................5

Skewness .....................................................................................................................................5

Kurtosis........................................................................................................................................6

Inferential Statistics.........................................................................................................................7

Important statistical tests.............................................................................................................7

t-Test ......................................................................................................................................8

ANOVA .................................................................................................................................8

ACOVA .................................................................................................................................8

Factor Analysis ......................................................................................................................8

Correlation (D), .......................................................................................................................9

Regression Analysis. ...............................................................................................................9

Meta-Analysis........................................................................................................................10

Important concerns for inferential statistics ..................................................................................10

Research Error ..........................................................................................................................10

Probability of Error ...................................................................................................................11

1
Decision Errors..........................................................................................................................11

Descriptive and inferential statistics

DESCRIPTIVE STATISTICS
Descriptive statistics are used to describe the basic features of the data in a study. They provide
simple summaries about the sample and the measures. In most cases, descriptive statistics are
used to examine or explore one variable at a time. The word data refers to the information that
has been collected from an experiment, a survey, an historical record, etc. Commonly used
descriptive statistics include frequency counts, ranges (high and low scores or values), means,
modes, median scores, and standard deviations. Two concepts are essential to understanding
descriptive statistics: variables and distributions
Variables

Statistics are used to explore numerical data. Numerical data are observations which are recorded
in the form of numbers. Numbers are variable in nature, which means that quantities vary
according to certain factors. Variables are divided into three basic categories:

Nominal Variables: Nominal variables classify data into categories. This process involves
labeling categories and then counting frequencies of occurrence. For example, gender would be a
nominal variable. The categories themselves are not quantified. The frequencies of each category
results in data that is quantified for example, eight males and nine females etc.

Ordinal Variables: Ordinal variables organize data in terms of degree. Ordinal variables do not
establish the numeric difference between data points. They indicate only that one data point is
ranked higher or lower than another is. Letter grades are an example of an ordinal variable.

Interval Variables: Interval variables score data. Thus, the order of data is known as well as the
precise numeric distance between data points.

Distributions

2
A distribution is a graphic representation of data. The line formed by connecting data points is
called a frequency distribution. This line may take many shapes. The single most important
shape is that of the bell-shaped curve, which characterizes the distribution as "normal." A
perfectly normal distribution is only a theoretical ideal. This ideal, however, is an essential
ingredient in statistical decision-making.

DESCRIPTIVE MEASURES
The measures that are used to describe data set includes central tendency, variability, relative
standing measures

Central tendency
In statistics, the term central tendency relates to the way in which quantitative data is clustered
around some value. A measure of central tendency is a way of specifying central value. In the
simplest cases, the measure of central tendency is an average of a set of measurements, the word
average being variously construed as mean, median, or other measure of location, depending on
the context.

Measures of central tendency


There are three ways to measure the central tendency. Mean, median and mode
Mean
The arithmetic mean is the most common measure of central tendency. It simply the sum of the
numbers divided by the total number of measurements.
µ =Σ Y÷ N
Where Y= sum of scores
N= total number of scores
µ = Mean
For example:

If we have scores: 23, 24, 27, 18, and 20. The mean of these scores according to above formula
will be
Y= 23+24+27+18+20= 112
N= 5
3
µ = 112/ 5= 22.4
Mode
Mode is the score with the highest frequency. A bimodal distribution is one which has two
modes. A multimodal distribution has three or more modes.
For example: if the scores are 1,3,5,7,8,8,8,9,11,12,13,14. Then the mode will be 8
Median
The median is also a frequently used measure of central tendency. The median is the midpoint of
a distribution: the same number of scores is above the median as below it. To compute median
we have two ways;
• When there is an odd number of numbers, the median is simply the middle number. For
example, the median of 2, 4, and 7 is 4.
• When there is an even number of numbers, the median is the mean of the two middle
numbers for example the median of 2 and 4 will be 3

Variability
Measures of central tendency locate only the center of a distribution of measures. Therefore we
need other measures to describe data. Variability refers to how "spread out" a group of scores is.
The terms variability, spread, and dispersion are synonyms, and refer to how spread out a
distribution is.

Measures of variability
To measure variability we have to compute range, variance and standard deviation.
Range
The range is the simplest measure of variability to calculate. It is defined as the difference
between the largest and smallest sample values. The range is simply calculated by the highest
score minus the lowest score. It depends only on extreme values and provides no information
about how the remaining data is distributed.
For example; the range of the following group of numbers: 11, 6, 8.9.5, 3, 4, 10, and 1 will be
computed by subtracting lowest value from the highest value. In this example, the highest value
is 11 and lowest value is 1 therefore the range for this group of numbers will be 11-1 = 10
Variance

4
Variability can also be defined in terms of how close the scores in the distribution are to the
middle of the distribution. Using the mean as the measure of the middle of the distribution, the
variance is defined as the average squared difference of the scores from the mean. The variance
is calculated by following formulae;

where ,
σ2 = variance,
μ = mean
N = number of scores.
Standard Deviation

The standard deviation is simply the square root of the variance. The standard deviation is an
especially useful measure of variability when the distribution is normal or approximately normal
because the proportion of the distribution within a given number of standard deviations from the
mean can be calculated.

IMPORTANT CONCERNS IN DESCRIPTIVE STATISTICS


The important concerns related to the measures of descriptive statistic involve measure of
symmetry and peak of the graphs that are formed on the basis of type of data collected. The
measure of symmetry is called skew and measure of peak is called kurtosis .

Skewness
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or
data set, is symmetric if it looks the same to the left and right of the center point. The mean is
very sensitive to extreme scores and will be drawn in the direction of the skew. The median is
not sensitive to extreme scores. If the mean is greater than the median, positive skew is likely to
be formed where as if the mean is less than the median then negative skew is likely to be formed
Normal distribution; A normal distribution is a bell-shaped distribution of data where the mean,
median and mode all coincide. A frequency curve showing a normal distribution would look like
this:

5
In a normal distribution, approximately 68% of the values lie within one standard deviation of
the mean and approximately 95% of the data lies within two standard deviations of the mean.
Positive Skewed: If there are extreme values towards the positive end of a distribution, the
distribution is said to be positively skewed. In a positively skewed distribution, the mean is
greater than the mode. For example:

Negatively Skewed: A negatively skewed distribution, on the other hand, has a mean which is
less than the mode because of the presence of extreme values at the negative end of the
distribution.

Kurtosis
Kurtosis is a measure of whether the data are peaked or flat relative to a normal distribution. That
is, data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly,
and have heavy tails. Data sets with low kurtosis tend to have a flat top near the mean rather than
a sharp peak. A uniform distribution would be the extreme case.
Normal distribution: A normal random variable has a kurtosis of 3 irrespective of its mean or
standard deviation.

6
Leptokurtic: If a random variable's kurtosis is greater than 3, it is said to be leptokurtic. .
Leptokurtosis is associated with PDFs that are simultaneously "peaked" and have "fat tails."

Platykurtic: If its kurtosis is less than 3, it is said to be platykurtic Platykurtosis is associated


with PDFs that are simultaneously less peaked and have thinner tails. They are said to have
"shoulders.

INFERENTIAL STATISTICS
Inferential statistics are used to draw conclusions and make predictions based on the descriptions
of data. While descriptive statistics provide information about the central tendency, dispersion,
skew, and kurtosis of data, inferential statistics allow making broader statements about the
relationships between data. Inferential statistics refer to the use of current information regarding
a sample of subjects in order to:
• Make assumptions about the population at large and/or
• Make predictions about what might happen in the future.
The goal of inferential statistics is to do just that to take what is known and make assumptions or
inferences about what is not known.

Important statistical tests


Specific procedures used to make inferences about an unknown population or unknown score
vary depending on the type of data used and the purpose of making the inference. There are five
main categories of inferential statistical tests: t-test, ANOVA, ACOVA Factor Analysis,
Regression Analysis, and Meta-Analysis.

7
t-Test
A t-test is perhaps the most simple of the inferential statistics. The purpose of this test is to
determine if a difference exists between the means of two groups. For example, to determine if
the GPAs of students with prior work experience differs from the GPAs of students without this
experience, we would employ the t-test by comparing the GPAs of each group to each other.
To compare these groups, the t-test statistical formula includes the means, standard deviations,
and number of subjects for each group.

ANOVA
The term ANOVA is short for Analysis of Variance and is typically used when there are one or
more independent variables and two or more dependent variables. The ANOVA is superior for
complex analyses for two reasons;
• The first being its ability to combine complex data into one statistical procedure.
• The second benefit over a simple t-test is the ANOVA’s ability to determine what are
called interaction effects.
It permits comparison of two or more populations when interval variables are used. ANOVA
does this by comparing the dispersion of samples in order to make inferences about their means.
ANOVA seeks to answer two basic questions:

• Are the means of variables of interest different in different populations?

• Are the differences in the mean values statistically significant?

ACOVA
It examines whether or not interval variables move together in ways that are independent of their
mean values. Ideally, variables should move independently of one another, regardless of their
means. Unfortunately, in the real world, groups of observations usually differ on a number of
dimensions, making simple analyses of variance tests problematic since differences in other
characteristics could cause observed differences in the values of the variables of interest.

Factor Analysis
A factor analysis is used when an attempt is being made to break down a large data set into
different subgroups or factors. By using a somewhat complex procedure that is typically

8
performed using specialized software, a factor analysis will look at each question within a group
of questions to determine how these questions accumulate together. Factor analysis is commonly
used when analyzing data from multi-question surveys to reduce the numerous questions to a
smaller set of more global issues.

Correlation (D),
Like ACOVA, it is used to measure the similarity in the changes of values of interval variables
but is not influenced by the units of measure. Another advantage of correlation is that it is
always bounded by the interval:

-1 # D # 1

Here -1 indicates a perfect inverse linear relationship, i.e. y increases uniformly as x decreases,
and 1 indicates a perfect direct linear relationship, i.e. x and y move uniformly together. A value
of 0 indicates no relationship. Note that correlation can determine that a relationship exists
between variables but says nothing about the cause or directional effect. For example, a known
correlation exists between muggings and ice cream sales. However, one does not cause the
other. Rather, a third variable, the warm weather which puts more people on the street both to
mug and buy ice cream is a more direct cause of the correlation. As a rule, it is wise to examine
the correlations between all variables in a data set. This both warns auditors/evaluators about
possible co-variations and suggests areas for possible follow-up investigation.

Regression Analysis.
When a correlation is used we are able to determine the strength and direction of the relationship
between two or more variables. It is often used to determine the effect of independent variables
on a dependent variable. Regression measures the relative impact of each independent variable
and is useful in forecasting. It is used most appropriately when both the independent and
dependent variables are interval. Logistic regression analysis is used to examine relationships
between variables when the dependent variable is nominal, even though independent variables
are nominal, ordinal, interval, or some mixture thereof. Discriminant analysis is similar to
logistic regression in that the outcome variable is categorical. However, here the independent
variables must be interval.

9
Meta-Analysis
A meta-analysis refers to the combining of numerous studies into one larger study. When this
technique is used, each study becomes one subject in the new meta study. For instance, the
combination of 12 studies on work experience and college grades would result in a meta study
with 12 subjects. While the process is a little more complex than this in reality, the meta analysis
basically combines many studies together to determine if the results of all of them, when taken as
a whole, are significant. The meta study is especially helpful when different related studies
conducted in the past have found different results.

IMPORTANT CONCERNS FOR INFERENTIAL STATISTICS

Research Error
Every statistic contains both a true score and an error score. A true score is the part of the
statistic or number that truly represents what is being measured. An error score is that part of the
statistic or number that represents something other than what is being measured.

Confidence Level: When we use statistics to summarize any phenomenon, we are always
concerned with how much of that statistic represents the true score and how much is error.
Imagine a person scores a 100 on a standardized IQ test. Is his true IQ really 100 or could this
score be off some due to an unknown level of error? Chances are that there is error associated
with his score and therefore we must use this score of 100 as an estimate of his true IQ. When
using an achieved score to estimate a true score, we must determine how much error is
associated with it. Methods to estimate a true score are called estimators, and fall into three main
groups: Point Estimation; Interval Estimation; and Confidence Interval Estimation.

Point Estimation: In point estimation, the value of a sample statistic or achieved score is used as
a best guess or quick estimate of the population statistic or true score. The major concern of point
estimation is the lack of concern for error; the achieved score is assumed to be the true score.

Interval Estimation: Interval estimation goes a step further and assumes that some level of error
has occurred in the achieved score, which is almost always the case. There are different methods
to determine error but perhaps the most commonly used is called the standard error of the mean.

10
Using a simple statistical formula, the amount of error is determined and the true score is said to
be the achieved score plus or minus the standard error of the mean

Confidence Interval Estimation: The confidence interval estimation uses the same method as the
interval estimation but provides a level of confidence or certainty in the true score. Through
more complex statistics, a specific level of confidence in an interval can be determined. We
might say then, based on these statistics, that we are 95% confident that the true score lies
somewhere between 78 and 81. The more confident we are, the larger the interval.

Probability of Error
Since every score has some level of error researchers must decide how much error they are
willing to accept prior to performing their research. This acceptable error is then compared with
the probability of error and if it is less, the study is said to be significant. The probability of error
is often abbreviated with a lower case ‘p,’ and the acceptable error is abbreviated with a lower
case alpha (a). When we accept the null, then p > a, and when we reject the null, then p < = a.

Decision Errors
Two types of errors can result from a hypothesis test.

Type I error: A Type I error occurs when the researcher rejects a null hypothesis when it is true.
The probability of committing a Type I error is called the significance level. This probability is
also called alpha, and is often denoted by α.

Type II error: A Type II error occurs when the researcher fails to reject a null hypothesis that is
false. The probability of committing a Type II error is called Beta, and is often denoted by β. The
probability of not committing a Type II error is called the Power of the test

11

Você também pode gostar