Você está na página 1de 4

LOH YU HAN PQB 170005 15.12.

2017

THE USE OF STATISTICS

1. T-test

A t-test is most commonly applied when the test statistic would follow a normal distribution if
the value of a scaling term in the test statistic were known. The t-test can be used to
determine if two sets of data are significantly different from each other.

Among the most frequently used t-tests are:

 A one-sample location test of whether the mean of a population has a value specified in
a null hypothesis.
 A two-sample location test of the null hypothesis such that the means of two populations
are equal.
 A test of the null hypothesis that the difference between two responses measured on the
same statistical unit has a mean value of zero. For example, suppose we measure the
size of a cancer patient's tumor before and after a treatment. If the treatment is effective,
we expect the tumor size for many of the patients to be smaller following the treatment.
 A test of whether the slope of a regression line differs significantly from 0.

2. Mean test

The mean (or average) is the most popular measure of central tendency. It can be used with
both discrete and continuous data, although its use is most often with continuous data.
Mean implies average and it is the sum of a set of data divided by the number of data.

3. Anova

The ANOVA Test

An ANOVA test is a way to find out if survey or experiment results are significant. In other
words, they help you to figure out if you need to reject the null hypothesis or accept the
alternate hypothesis. Basically, you’re testing groups to see if there’s a difference between
them. Examples of when you might want to test different groups:

 A group of psychiatric patients are trying three different therapies: counseling,


medication and biofeedback. You want to see if one therapy is better than the others.
 A manufacturer has two different processes to make light bulbs. They want to know if
one process is better than the other.

One-way or two-way refers to the number of independent variables (IVs) in your


Analysis of Variance test.

One Way ANOVA

A one way ANOVA is used to compare two means from two independent (unrelated) groups
using the F-distribution. With a One Way, you have one independent variable affecting a
dependent variable. The null hypothesis for the test is that the two means are equal.
Therefore, a significant result means that the two means are unequal.

When to use a one way ANOVA

Situation 1: You have a group of individuals randomly split into smaller groups and completing
different tasks. For example, you might be studying the effects of tea on weight loss and form
three groups: green tea, black tea, and no tea.

Situation 2: Similar to situation 1, but in this case the individuals are split into groups based on
an attribute they possess. For example, you might be studying leg strength of people
according to weight. You could split participants into weight categories (obese, overweight
and normal) and measure their leg strength on a weight machine.
LOH YU HAN PQB 170005 15.12.2017

Limitations of the One Way ANOVA

A one way ANOVA will tell you that at least two groups were different from each other.
But it won’t tell you what groups were different. If your test returns a significant f-statistic, you
may need to run an ad hoc test (like the Least Significant Difference test) to tell you exactly
which groups had a difference in means.

Two Way ANOVA

A Two Way ANOVA is an extension of the One Way ANOVA. With a Two Way ANOVA, there
are two independents. Use a two way ANOVA when you have one measurement variable (i.e.
a quantitative variable) and two nominal variables. In other words, if your experiment has a
quantitative outcome and you have two categorical explanatory variables, a two way ANOVA
is appropriate.

For example, you might want to find out if there is an interaction between income and
gender for anxiety level at job interviews. The anxiety level is the outcome, or the variable that
can be measured. Gender and Income are the two categorical variables. These categorical
variables are also the independent variables, which are called factors in a Two Way ANOVA.

The factors can be split into levels. In the above example, income level could be split
into three levels: low, middle and high income. Gender could be split into three levels: male,
female, and transgender. Treatment groups and all possible combinations of the factors. In
this example there would be 3 x 3 = 9 treatment groups.

Assumptions for Two Way ANOVA

 The population must be close to a normal distribution.


 Samples must be independent.
 Population variances must be equal.
 Groups must have equal sample sizes.

4. Pearson / spearman correlation

Pearson correlation

The Pearson correlation evaluates the linear relationship between two continuous variables. It
is referred to as Pearson's correlation or simply as the correlation coefficient.

A Pearson's correlation is used when there are two quantitative variables. The
possible research hypotheses are that there is a postive linear relationship between the
variables, a negative linear relationship between the variables, or no linear relationship
between the variables.

Pearson's correlation coefficient (r) is a measure of the strength of the association


between the two variables. The first step in studying the relationship between two continuous
variables is to draw a scatter plot of the variables to check for linearity.

The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A
value of 0 indicates that there is no association between the two variables. A value greater
than 0 indicates a positive association; that is, as the value of one variable increases,
so does the value of the other variable.

Spearman correlation

The Pearson correlation evaluates the linear relationship between two continuous variables.
The Spearman correlation coefficient is based on the ranked values for each variable rather
than the raw data. Spearman correlation is often used to evaluate relationships involving
ordinal variables.
LOH YU HAN PQB 170005 15.12.2017

The Spearman's rank-order correlation is the nonparametric version of the Pearson


product-moment correlation. Spearman's correlation coefficient, (ρ, also signified by r s)
measures the strength and direction of association between two ranked variables.

In summary, correlation coefficients are used to assess the strength and direction of
the linear relationships between pairs of variables. When both variables are normally
distributed use Pearson's correlation coefficient,otherwise use spearman's correlation
coefficient.

5. Chi square

A chi-square test is any statistical hypothesis test where in the sampling distribution of the test
statistic is a chi-squared distribution when the null hypothesis is true. Next, chi-square test for
independence. The test is applied when you have two categorical variables from a single
population.

The Chi Square statistic is also commonly used for testing relationships between
categorical variables. Chi-squared test can be used to attempt rejection of the null hypothesis
that the data are independent. The null hypothesis of the Chi-Square test is that no
relationship exists on the categorical variables in the population; they are independent.

As addition, chi square tests for different purposes. Chi square


test for testing goodness of fit is used to decide whether there is any difference between the
observed (experimental) value and the expected (theoretical) value.

The Chi-square test is intended to test how likely it is that an observed distribution is
due to chance. It is also called a "goodness of fit" statistic, because it measures how well the
observed distribution of data fits with the distribution that is expected if the variables are
independent.

6. Normality test

A normality test is used to determine whether sample data has been drawn from a normally
distributed population (within some tolerance). A number of statistical tests, such as the
Student's t-test and the one-way and two-way ANOVA require a normally distributed sample
population.

A normality test show for small sample sizes, normality tests have little power to reject
the null hypothesis and therefore small samples most often pass normality tests. Power is the
most frequent measure of the value of a test for normality—the ability to detect whether a
sample comes from a non-normal distribution.

More precisely, the tests are a form of model selection, and can be interpreted several
ways, depending on one's interpretations of probability:

 In descriptive statistics terms, one measures a goodness of fit of a normal model to the
data – if the fit is poor then the data are not well modeled in that respect by a normal
distribution, without making a judgment on any underlying variable.
 In frequentist statistics statistical hypothesis testing, data are tested against the null
hypothesis that it is normally distributed.
 In Bayesian statistics, one does not "test normality" per se, but rather computes the
likelihood that the data come from a normal distribution with given parameters μ,σ (for
all μ,σ), and compares that with the likelihood that the data come from other distributions
under consideration, most simply using a Bayes factor (giving the relative likelihood of
seeing the data given different models), or more finely taking a prior distribution on
possible models and parameters and computing a posterior distribution given the
computed likelihoods.
LOH YU HAN PQB 170005 15.12.2017

7. Kurtosis

Kurtosis is a descriptor of the shape of a probability distribution and, just as for skewness,
there are different ways of quantifying it for a theoretical distribution and corresponding ways
of estimating it from a sample from a population. Depending on the particular measure of
kurtosis that is used, there are various interpretations of kurtosis, and of how particular
measures should be interpreted.

In statistics, kurtosis describes the shape of the probability distribution curve and
there are 3 main types, they are: leptokurtic - a “positive” or tall and thin distribution (fatter
tails); mesokurtic - a normal distribution ; platykurtic - a “negative” or flat and wide distribution
(thin tails).

The kurtosis of any univariate normal distribution is 3. It is common to compare the


kurtosis of a distribution to this value. Distributions with kurtosis less than 3 are said to
be platykurtic, although this does not imply the distribution is "flat-topped" as sometimes
reported. Rather, it means the distribution produces fewer and less extreme outliers than
does the normal distribution.

8. Skewness

Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution,


or data set, is symmetric if it looks the same to the left and right of the center
point. Skewness can be quantified to define the extent to which a distribution differs from a
normal distribution. This situation is also called negative skewness.

In probability theory and statistics, skewness is a measure of the asymmetry of the probability
distribution of a real-valued random variable about its mean. Theskewness value can be
positive or negative, or undefined. The qualitative interpretation of the skew is complicated
and unintuitive.

Skewness is a term in statistics used to describes asymmetry from the normaldistribution in a


set of statistical data. Skewness can come in the form of negative skewness or positive
skewness, depending on whether data points are skewed to the left and negative, or to the
right and positive of the data average.