Você está na página 1de 7

TITLE : Testing of Hypothesis

SUBJECT : Probability and Statistics


NAME OF THE STUDENT: Bobburi Nikhil Sai
ROLL NO : 18951A0593
GUIDE NAME : Mrs. B. Praveena
ABSTRACT:
This paper reviews the methods to select correct statistical tests for research projects or other
investigations. Research is a scientific search on a particular topic including various steps in which formulating
and testing of hypothesis is an important step. To test a hypothesis there are various tests like Student’s t-test, F
test, Chi square test, ANOVA etc. and the conditions and methods to apply these tests are explained here. Only
the correct use of these tests gives valid results about hypothesis testing.

The correlation and interactions among different biological entities comprise the biological system.
Although already revealed interactions contribute to the understanding of different existing systems, researchers
face many questions everyday regarding inter-relationships among entities. Their queries have potential role in
exploring new relations which may open up a new area of investigation. In this paper, we introduce a text
mining based method for answering the biological queries in terms of statistical computation such that
researchers can come up with new knowledge discovery. It facilitates user to submit their query in natural
linguistic form which can be treated as hypothesis. Our proposed approach analyzes the hypothesis and
measures the p-value of the hypothesis with respect to the existing literature. Based on the measured value, the
system either accepts or rejects the hypothesis from statistical point of view. Moreover, even it does not find any
direct relationship among the entities of the hypothesis, it presents a network to give an integral overview of all
the entities through which the entities might be related. This is also congenial for the researchers to widen their
view and thus think of new hypothesis for further investigation. It assists researcher to get a quantitative
evaluation of their assumptions such that they can reach a logical conclusion and thus aids in relevant re-
searches of biological knowledge discovery. The system also provides the researchers a graphical interactive
interface to submit their hypothesis for assessment in a more convenient way.

INTRODUCTION:
Testing statistical hypotheses is one of the most important areas of statistical analysis. In many situations,
the researchers in the field of data analysis are interested in testing a hypothesis about the population parameter.
In traditional testing, the observations of sample are crisp and a statistical test leads to the binary decision.
However, in real life, the data sometimes cannot be recorded precisely. The statistical hypothesis testing under
fuzzy environments has been studied by many authors. Arnold discussed the fuzzy hypotheses testing with crisp
data. The Neyman–Pearson type testing hypotheses was proposed by Casals and Gil and Son et al. Saade
considered the binary hypotheses testing and discussed the fuzzy likelihood functions in the decision making
process. Casals and Gil considered the Bayesian sequential tests for fuzzy parametric hypotheses from fuzzy
information. In the human sciences, Niskanen discussed the applications of soft statistical hypotheses.The
statistical hypotheses testing for fuzzy data by proposing the notions of degrees of optimism and pessimism was
proposed by Wu. Akbari and Rezaei investigated a bootstrap method for inference about the variance based on
fuzzy data.

Viertl investigated some methods to construct confidence intervals and statistical tests for fuzzy data. Wu
proposed some approaches to construct fuzzy confidence intervals for the unknown fuzzy parameter. Arefi and
Taheri developed an approach to test fuzzy hypotheses upon fuzzy test statistic for vague data. The fuzzy tests
for hypotheses testing with vague data were proposed by Grzegorzewski , Montenegro et al, Baloui Jamkhaneh
and Nadi Ghara and Watanabe and Imaizumi. A new approach to the problem of testing statistical hypotheses for
fuzzy data using the relationship between confidence intervals and testing hypotheses is introduced by Chachi et
al.

In this paper, we propose a new statistical hypothesis testing procedure about population means when the
data of the given two samples are real intervals. We provide the decision rules which are used to accept or reject
the null and alternative hypotheses. In the proposed test, we split the given interval data into two different sets of
crisp data namely, upper level data and lower level data ; then, we find the test statistic values for the two sets of
crisp data and then we obtain a decision about the population means on the basis of the decision rules. In this
testing procedure, we are not using degrees of optimism and pessimism and h-level set. To illustrate the
proposed testing procedure, a numerical example is given. Further, we extend the proposed test to statistical
hypotheses with fuzzy data.
METHODOLOGY:
Null Hypothesis: A null hypothesis is a specific baseline statement to be tested and it usually takes such forms
as “no effect” or “no difference.” An alternative (research) hypothesis is denial of the null hypothesis. [7]. We
always make null hypothesis which is of the form like “There is no significant difference between x and y”.

Alternative Hypothesis: An Alternative Hypothesis is denoted by H1 or Ha, is the hypothesis that sample
observations are influenced by some non-random cause. Rejection of null hypothesis leads to the acceptance of
alternative hypothesis e.g. Null hypothesis: “x = y.”

Alternative hypothesis: “x≠y” → (Two tailed)

“x< y” (Left tailed) → (Single tailed)

“x> y” (Right tailed) → (Single tailed)

Numerical Steps in Testing of Hypothesis:

1) Establish the null hypothesis and alternative hypothesis.

2) Set up a suitable significance level e.g.at 1%, 5%, 10% level of significance etc.

3) Determine a suitable test tool like t, Z, F, Chi Square, ANOVA etc.

4) Calculate the value of test statistic using any of test tools.

5) Compare this calculated value with table value.

6) Draw conclusions.

 If calculated value < Table value then null hypothesis is accepted

If calculated value > Table value then null hypothesis is rejected

Important notations or symbols

Parameters Sample Population


Mean x μ
Standard Deviation s σ

 If the sample size is n ≥ 30, then apply large sample tests.


 If the sample size is n<30, then apply small sample tests.
 The setting of alternative hypothesis is very important to decide whether we have to use a single
tail(left or right) or two tailed tests.
 Alternative hypothesis which is expressed by the symbol ‘>’ or ‘<’ is called one-tailed test.
 The entire critical region lies in one-tail of the distribution.
 An alternative hypothesis containing symbol ‘>’ lies entirely in right tail of distribution.
 The critical region for an alternative hypothesis containing the symbol ‘<’ lies entirely in the left side of
the region.
 A test of any statistical hypothesis where the alternative hpothesis is written with the symbol ‘≠’ is
called Two-tailed test.
Discussion & Result:
Testing of significance for single proportion:-

Applications: To find significant difference between proportion of sample and population

p−P


Z= PQ
n
 Testing of significance for difference of proportions:-

Applications: To find significant difference between two sample proportions P1 and P2.

P 1−P 2


Z= 1 1
PQ ( + )
n1 n 2
n1 P1+n2 P 2
P= , Q=1-P
n 1+n2
Testing of significance for single mean:-

Applications: To find significant difference between mean of sample and population

x−μ
Z= σ When population S.D. is known
√n
x−μ
Z= s When population S.D. is not known
√n
Testing of significance for difference of means:-

Applications: To find significant difference between two sample means and

x 1−x 2
Z=
√ σ 12 σ 22
+
n1 n2
When population S.D. is known

x 1−x 2
Z=
√ s 12 s 22
+
n1 n 2
When population S.D. is not known

Testing of significance for difference of Standard Deviations:-

Applications: To find significant difference between two sample S. D. and

s 1−s 2
Z=
√ σ 12 σ 22
+
2 n1 2 n 2
When population S.D. is known.
s 1−s 2
Z=
√ s 12 s 22
+
2 n1 2 n 2
When population S.D. is not known

Chi Square Test:

It is an important test amongst various tests of significance and was developed by Karl Pearson in 1900. It is
based on frequencies and not on the parameters like mean, S.D. etc.

Applications: Chi Square test is used to compare observed and expected frequencies objectively. It can be used
(i) as a test of goodness of fit and (ii) as a test of independence of attributes. Conditions for applying

χ2 Test:-

(i)The total number of items N must be at least 50.

(ii) No expected cell frequency should be smaller than 10. If this type of problem occurs then difficulty is
overcome by grouping two or more classes before calculating (O-E).

(a) Chi Square Test As a test of goodness of fit:-

Chi square test enables us to see how well does the assumed theoretical distribution (such as Binomial
distribution, Poisson distribution or Normal distribution) fit to the observed data.
2
(Oij−Eij)
Formula: χ2 = Σ
Eij
Oij = observed frequency of the cell in ith row and jth column.

Eij = expected frequency of the cell in ith row and jth column.

Degree of freedom=

n-1 (For Binomial Distribution)

n-2 (For Poisson Distribution)

n-3 (For Normal Distribution)

Where n= total no. of terms in a series

(b) Chi Square Test As a test of independence of attributes:-

χ2 test enables us to explain whether two attributes are associated or not.

e.g: It may help in finding whether a new drug is effective in curing a disease or not.

(O−E)2
Formula: χ2=∑
E
where ‘O’ represents the observed frequency. E is the expected frequency under the null hypothesis and
sample ¿ ¿
computed by E= row total∗columntotal
¿
Here, degree of freedom = (r-1) (s-1)

(c) Difference in test of goodness of fit and independence of attributes:


For the goodness-of-fit test, a theoretical relationship is used to calculate the expected frequencies. For
the test of independence, only the observed frequencies are used to calculate the expected frequencies.

Student’s t-test:

t statistic was developed by William S. Gossett and was published under the pseudonym Student.

Applications: t-test is used to test the significance of sample mean, difference of two sample means or two
related sample means in case of small samples when population variance is unknown.

(a) t-test for the mean of a random sample:-

Applications:- It is used to test whether the mean of a sample deviates significantly from a stated value when
variance of population is unknown.

x−μ
Formula: t= s when S.D is given
√n
1
When S.D is not given then find s by using the formula S 2= ∑(xi-x)2
n−1
Degree of freedom= n-1

(b) t-test for difference of means of two samples:-

Applications: It is used to compare the mean of two samples of size n1 and n2 when population variances are
equal.

x− y


Formula: t= 1 1 when s.d. of two samples is known
s +
n1 n 2

And S2 =
n1 S 12+ n2 S 22
n 1+n 2−2
Degree of freedom = n1+n2-2

CONCLUSION:
In modern manufacturing plants, people still seldom attach importance to hypothesis testing, which they believe
is merely a matter of theory. However, the application of hypothesis testing in quality management should be
promoted. Both parametric test (t-test and z-test) and nonparametric test (sign test and Wilcoxon rank-sum test)
are appropriate for use in a manufacturing environment.

Data collection establishes the foundation for appraising quality of a product or service. But without correct data
processing, it becomes challenging to make an objective conclusion. Sometimes, the observation is wrongly
interpreted.

For instance, suppose that the fallout rate of samples drawn from two different groups is 15% and 10%,
respectively. It would be a partial judgment saying that one is better than the other. On this occasion, hypothesis
testing is instrumental in explanation of phenomena. Unfortunately, in many manufacturing facilities people
tend to merely focus on descriptive statistics such as arithmetic mean and range. Simply put, application of
hypothesis testing is indispensable to better understand quality data and provide guidance to production control.

To estimate the population mean, confidence interval is introduced because the mean value of samples is not
equal to that of population. For spot checks in the manufacturing process, two risks of making a wrong
conclusion appear: Type I and Type II risks. Type I risk (α) is the probability of rejecting qualified products (for
producer); Type II risk (β) is the probability of accepting nonconforming products (for customer).
With the correct use of above discussed tests, valid results can be found. So precaution should be taken while
selecting the tests of hypothesis for large and small sample tests otherwise one get invalid results. That is why
selection of a correct statistical test is much important.

BIBLIOGRAPHY:
[1] https://www.ijsr.net/archive/v4i5/SUB153997.pdf

[2] https://www.researchgate.net/publication/321003737_HYPOTHESIS_TESTING

[3] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.416.4895&rep=rep1&type=pdf
[4] Minhaz Fahim Zibran , CHI-Squared Test of Independence, Department of Computer Science , University of
Calgary, Alberta,Canada

[5] Bhattacharya, Dipak Kumar Research Methodology, New Delhi, Excell books.

[6] Pannerselvam, R. 2014: Research Methodology, New Delhi.Prentice Hall of India Pvt Ltd.

[7] Bali N.P., Gupta P.N., Gandhi C.P., 2008: Quantitative Techniques, New Delhi, and University Science
Press.

[8] Cochram W.G., 1963: Sampling Techniques, New York, John Wiley & Sons.

[9] Chance, William A.1975: Statistical Methods for Decision Making, Bombay, D.B. Taraporevala sons & Co.
Pvt. Ltd.

[10]Chaturvedi, J.C., 1953: Mathematical Statistics, Agra, Nok Jhonk Karyalaya.

Você também pode gostar