Escolar Documentos
Profissional Documentos
Cultura Documentos
3.
Take the absolute value of each deviation from the mean
4.
Total the absolute values of the deviations from the mean
5.
Divide the total by the sample size.
Formula
Variation
The variation is the sum of the squares of the deviations from the mean.
It has units that are squared instead of the same as the original data and
it does not take the sample size into account.
Procedure for finding
1.
Find the mean of the data
2.
Subtract the mean from each value to find the deviation from
the mean
3.
Square the deviation from the mean
4.
Total the squares of the deviation from the mean
Formula
Standard Deviation
The standard deviation is the average deviation from the mean. It is
found by taking the square root of the variance and solves the problem
of not having the same units as the original data. The sample standard
deviation is denoted by s. It is not an unbiased estimator of the
population standard deviation.
Procedure for finding
1.
Find the variance
2.
Take the square root
Formula
The range rule of thumb says that the range is approximately four times
the standard deviation. Alternatively, the standard deviation is
approximately one-fourth the range. That means that most of the data
lies within two standard deviations of the mean.
Procedure for finding
1.
Find the range
2.
Divide it by four
Formula
Coefficient of Variation
The coefficient of variation is expressed as a percent and describes the
standard deviation relative to the mean. It can be used to compare
variability when the units are different (the units will divide out,
providing just a raw number).
Procedure for finding
1.
Find the mean and standard deviation for the data
2.
Divide the standard deviation by the mean
3.
Multiply by 100
Formula
10
Total
Boys
432
379
501
410
420
418
2,560
Girls
408
513
412
436
461
500
2,730
Totals
840
892
913
846
881
918
5,290
A listing of all the values the random variable can assume with their
corresponding probabilities make a probability distribution.
A note about random variables. A random variable does not mean that the values
can be anything (a random number). Random variables have a well defined set of
outcomes and well defined probabilities for the occurrence of each outcome. The
random refers to the fact that the outcomes happen by chance -- that is, you
don't know which outcome will occur next.
Categories of Sampling
There are two major categories of sampling:
1.Random or Probability Sampling: Random sampling is also called
Probability Sampling because the laws of probability can be applied to it.
Note that the term 'random sample' is not used to describe data in a
sample; it is a process used to select the sample from a population.
Random Sampling does not depend upon the existence of detailed
information about the universe. It also provides such data as are
unbiased. Also, we can measure the relative efficiency of different
sample designs with random sampling methods.
Limitations of this type of sampling cannot be ignored either. It requires
high levels of skill. Also, it consumes a lot of time for planning the
process of actual sampling. The cost of execution of this sampling
method is very high.
2. Non-Random or Judgment Sampling: This is a process of sample
selection where we do not use random methods. A non-random sample
is selected on the basis of judgment or convenience. There is no
selection on the basis of probability considerations. The pattern of
sample variability in the process cannot be known.
Q.7.What is hypothesis testing? states its technique using chi-square
test?
Ans:-Hypothesis: Formulation, Types and Testing
In hypothesis testing, we must state the hypothesized value of the
population parameter before we begin sampling. The hypothesis we
wish to test is called Null Hypothesis and is denoted as H0. Example: If
we want to test the hypothesis that the population mean is equal to
600, we can write it as follows: H0: p = 600 and read, "The null
hypothesis is that the population mean is equal to 600."
Hypothesis Testing Test of Means
1. By One-Tailed Test: Take an example of a drug, which is frequently
used by a hospital. The individual dose of this drug is 125 cc. There is no
harm when body takes excessive does of this drug. But on the other
hand, insufficient doses do not assist doctors in the necessary medical
treatment. The hospital has been purchasing the same drug from the
same manufacturer for many years and the population's standard
deviation is 4 cc. The hospital inspects 50 doses of this drug at random
from a very large consignment and calculates the mean of these doses
to be 99.5 cc. The data in this case are:
pH0 =125 (hypothesised value of the population mean) cr = 4 (population
standard deviation) n = 50 (sample size) x = 99.5 (sample mean)
The hospital sets a 0.10 significance level. We have to find out "whether
the dosages in this consignment are too small."
In order to find the answer, we can state the problem as follows: H 0: p =
125 (null hypothesis) H,:p<125 (alternative hypothesis)
a = 0.10 -level of significance for testing this hypothesis
Here, we would calculate the standard error or the mean, (the
population size is assumed to be infinite).
The hospital wants to know whether the actual dosage is 125 cc, or the
dosage is too small. The hospital must see that the dosage should be
more than a certain amount otherwise it should reject the consignment.
This is one-tailed test and the shaded portion is representing the 0.10
significance level. In Fig. 8.9, the acceptance region includes 40 per cent
of the area on the left side of the mean and 50 per cent of the area on
the right side of the mean. The non-acceptance region has an area of 10
per cent. It has been shown by shaded portion.
+As we know the population standard deviation and n is larger than 30,
we can use the nor distribution. The appropriate z value for 40 per cent
of the area under the curve is 1.28. Using information, we can calculate
the acceptance region's lower limit: pH0 - 1.28 oxen = 125 - 1.28 (0.5658)
= 125 - 0.7242 = 124.276 cc (lower limit)
As a result, the hospital should accept the null hypothesis, because
there is no significant difference between our hypothesized mean of
125 cc and the observed mean of the sample i.e., 99.5. On the basis of
this sample of 50 doses, the hospital should accept the consignment.
2. By Two-Tailed Test: An engineering firm supplies water pumps to a
hotel. These pumps must have a pumping capacity of 40,000 gallon per
10
15
20
25
30
Y:
10
13
18
17
21
29
Thus, from the above example it is clear that the ratio of change
between two variables is not same. Now, if we plot all these variables
on a graph, they would not fall on a straight line.
C. Number of Variables
According to the number of variables, correlation is said to be of the
following three types viz;
(i) Simple Correlation.
(ii) Partial Correlation.
(iii) Multiple Correlations.
(i) Simple Correlation:
In simple correlation, we study the relationship between two variables.
Of these two variables one is principal and the other is secondary? For
instance', income and expenditure, price_ and demand etc. Here
income and price are principal variables while expenditure and demand
are secondary variables.
(ii) Partial Correlation:
If in a given problem, more than two variables are involved and of these
variables we study the relationship between only two variables keeping
the other variables constant, correlation is said to be partial. It is so
because the effect of other variables is assumed" to be constant
(iii) Multiple Correlations:
Under multiple correlations, the relationship between two and more
variables is studied jointly. For instance, relationship between rainfall,
use of fertilizer, manure on per hectare productivity of maize crop.