Escolar Documentos
Profissional Documentos
Cultura Documentos
where is the observed frequency for bin i and is the expected frequency for bin i. The expected frequency is calculated by where F is the cumulative Distribution function for the distribution being tested, Yu is the upper limit for class i, Yl is the lower limit for class i, and N is the sample size.
This test is sensitive to the choice of bins. There is no optimal choice for the bin width (since the optimal bin width depends on the distribution). Most reasonable choices should produce similar, but not identical, results. For the chi-square approximation to be valid, the expected frequency should be at least 5. This test is not valid for small samples, and if some of the counts are less than five, you may need to combine some bins in the tails. Significance . Level: Critical The test statistic follows, approximately, a chiRegion: square distribution with (k - c) degrees of freedom where k is the number of non-empty cells and c = the number of estimated parameters (includinglocation and scale parameters and shape parameters) for the distribution + 1. For example, for a 3parameter Weibull distribution, c = 4. Therefore, the hypothesis that the data are from a population with the specified distribution is rejected if
where is the chi-square critical value with k - c degrees of freedom and significance level. Chi-Square Test Example We generated 1,000 random numbers for normal, double exponential, t with 3 degrees of freedom, and lognormal distributions. In all cases, a chi-square test with k = 32 bins was applied to test for normally distributed data. Because the normal distribution has two parameters, c = 2 + 1 = 3 The normal random numbers were stored in the variable Y1, the double exponential random numbers were stored in the variable Y2, the t random numbers were stored in the variable Y3, and the lognormal random numbers were stored in the variable Y4.
H0: Ha: the data are normally distributed the data are not normally distributed 2 = 32.256
Y1 Test statistic:
Significance level: = 0.05 Degrees of freedom: k - c = 32 - 3 = 29 Critical value: 21-,k-c = 42.557 Critical region: Reject H0 if 2 > 42.557
As we would hope, the chi-square test fails to reject the null hypothesis for the normally distributed data set and rejects the null hypothesis for the three non-normal data sets. Questions The chi-square test can be used to answer the following types of questions:
Are the data from a normal distribution? Are the data from a log-normal distribution? Are the data from a Weibull distribution? Are the data from an exponential distribution? Are the data from a logistic distribution? Are the data from a binomial distribution?
Importance
Many statistical tests and procedures are based on specific distributional assumptions. The assumption of normality is particularly common in classical statistical tests. Much reliability modeling is based on the assumption that the distribution of the data follows a Weibull distribution. There are many non-parametric and robust techniques that are not based on strong distributional assumptions. By nonparametric, we mean a technique, such as the sign test, that is not based on a specific distributional assumption. By robust, we mean a statistical technique that performs well under a wide range of distributional assumptions. However, techniques based on specific distributional assumptions are in general more powerful than these non-parametric and robust techniques. By power, we mean the ability to detect a difference when that difference actually exists. Therefore, if the distributional assumption can be confirmed, the parametric techniques are generally preferred. If you are using a technique that makes a normality (or some other type of distributional) assumption, it is important to confirm that this assumption is in fact justified. If it is, the more powerful parametric techniques can be used. If the distributional assumption is not justified, a non-parametric or robust technique may be required.