Você está na página 1de 5

MATH 2441

Probability and Statistics for Biological Sciences

The t-Distribution
The so-called Student's t-distribution could well be the second most commonly used probability distribution in statistics. It was first described in a paper written by William Gosset in 1908. Gosset, an employee of the Guinness brewing company in Dublin, Ireland, became involved in the statistical analysis of data collected from studies of the brewing process. Gossett published reports of his work under the pseudonym "Student" to get around a Guinness company policy prohibiting employees from publishing reports on their work hence the name "Student t-distribution." Just as the standard normal random variable is conventionally denoted by the character 'z', the student t-distributed random variable is conventionally denoted by the symbol 't'. The t-distribution often arises in situations involving small sample sizes, and perhaps limited information in other respects. For example, when data from a large sample is available for problems involving the population mean (or small samples, but the population standard deviation is known), the standard normal distribution applies. However, when the sample size is small and the population standard deviation is not known, it is necessary to use the t-distribution rather than the standard normal distribution. In a way, you can think of the standard normal distribution as a special case of the t-distribution appropriate when sample sizes are large. One way to write the probability density function for the t-distribution is:

+ 1 ( +1) / 2 1 t2 2 f (t) = 1+ 2
Here, the symbols ( ) denote a mathematical function called the "Gamma Function" which is a generalization of the factorial function which you've seen earlier in this course. The two Gamma Functions just produce constants in this formula. These along with the square root term are present to ensure that Pr(- < t < +) = 1, as is required of all such probability distributions. You need to notice two things about this probability density function. first, it depends only on t2 which means that it is symmetric about t = 0. Thus, like the standard normal probability distribution, the t-distribution is symmetric about 0. The shape of the density curve depends on a parameter denoted here (the Greek letter 'n', pronounced 'noo'). This quantity, , called the degrees of freedom, is a positive nonzero integer value.

-4

-3

-2

-1

The solid curves in the figure to the right are graphs of f(t) for = 1 (the smallest possible value) and = 3, respectively in order of increasing height at t = 0. The dotted curve is a graph of f(z), for the standard normal random variable. From this you see that the t-distribution has a very bell-like shape, but for smaller values of , the bell is lower and broader. As the value of is increased, the t-distribution looks more and more like the standard normal distribution. It can be shown mathematically that the t-distribution is identical to the standard normal distribution in the limit that . However, at values of as small as 10 or 12, the graphs of f(t) are nearly indistinguishable from graphs of the standard normal probability density function, and by the time is as large as 29 or 30, results using the t-distribution agree with results from the standard

David W. Sabo (1999)

The t-Distribution

Page 1 of 5

normal distribution to within a percentage point or two, and so statisticians tend to use the standard normal probability tables in place of t-tables whenever the value of is larger than 29 or 30. To summarize, the t-distributed random variable and its distribution have the following general properties: the mean value is zero (like the standard normal random variable) the distribution is bell-shaped, and symmetric about the value zero on the horizontal axis the t random variable can have any value between - and +, but most of the probability density is found in the near vicinity of t = 0 (though not in quite as narrow a region as for the standard normal distribution) there is a family of distinct t-distributions, distinguished by the value of a single parameter , which can have any positive non-zero integer value. The larger the value of , the more that particular t-distribution will be like the standard normal distribution. generally, for smaller values of , the t-distribution will have a lower central peak and higher tails than does the standard normal distribution. Whereas the variance of the standard normal distribution is exactly 1, the variance of the t-distribution is

2
a value which is bigger than 1 in principle (indicating that the t-distribution is more spread out than the standard normal distribution), but this fraction has values very close to 1 once the value of becomes appreciably bigger than 1 itself. The t-distribution can be used to calculate probabilities in much the same way that you would calculate probabilities for any other continuous distribution. Pr(a < t < b) would be just the area under the t-probability density function between t = a and t = b. In principle, computation of such an area would involve evaluation of an integral. Evaluating integrals is at best rather tedious, and often, is impossible to do exactly by hand, so it is tempting to try to exploit the properties of the t-distribution (which seem to be so much like the those of the standard normal distribution) to develop tables of values that can be used to calculate probabilities. Unfortunately, this approach would require a separate one-page table for each value of . However, statisticians also realized that most of their applications involving the t-distribution didn't require the computation of probabilities so much as the determination of a few commonly used percentiles of the t-distribution. As a result, it has become conventional to organize t-tables quite differently from the way the standard normal probability table is organized. As you see in the table included at the end of this document, just one row of the table is reserved for each value of . Although most published t-tables cover the range = 1 to = 30, we've given a bit more coverage in our table so you can see how little difference there is between values of the t-percentiles for values of larger than 29 or 30, and values of corresponding z-percentiles. The numbers in the body of the t-table are values of t for the value of given at the top of the column and the value of given at the left of the row. Often, to emphasize the fact that a t-percentile is distinguished by both the value of (the area of the righthand tail it cuts off -- see the diagram to the right as a reminder of the meaning of this subscript notation) and the value of , people use a combined notation: t, , where appropriate numbers would be substituted for each symbol. Example: Use the attached t-table to determine t0.05,9. Answer: To find this value, read the number in the row labeled = 9 and the column headed = 0.05. We get
Page 2 of 5

area = t t

The t-Distribution

David W. Sabo (1999)

t0.05,9. = 1.833 What this number means is that if we had an experiment which produced values of t according to the tdistribution with = 9, then Pr(observe a value of t which is greater than 1.833) = 0.05 Thus, finding percentiles of the t-distribution from the table is just a matter of reading the correct row and column for values of and covered by the table. The table given here covers all of the values of that you will ever need it for. It only covers ten different values of , but these are by far the most commonly used ones in constructing confidence interval estimates (or setting up rejection regions in hypothesis testing), so you should find the table quite adequate for most of the work you do that requires use of the tdistribution. If you must have values of t, , for values of not covered in the table, you could hunt for more extensive tables, or you might consider interpolating between values available in the table given here (not really recommended), or, nowadays, you can use readily available computer programs to produce the values you need (see instructions regarding the use of Excel/97 just below). Calculation of t-probabilities Using the t-table It is very uncommon to need to calculate probabilities for the t-distributed random variable except for regions which are either a single tail, or made up of two identical tails, that is

Pr( t > c)

Pr(t > c) t c

Pr(t < -c)

Pr(t > c) t -c c

We will illustrate how to at least estimate the areas of regions of this type from the standard t-tables. In the next section, we explain briefly how to get more accurate values using functions available in Excel/97. Calculating this sort of probability is required in the computation of so-called p-values for hypotheses tests. Example: Estimate Pr(t > 1.85) for = 17 using the standard t-table. Answer: The figure shows the situation. Looking at the row labeled = 17 in the standard ttable, we see that t0.05,17 = 1.740 indicating that Pr(t > 1.740) = 0.05 for this value of . Similarly, we see that t0.025,17 = 2.110

area = 0.05 t = 1.85 area = 0.025

t0.025,17 = 2.110 t0.05,17 = 1.740

David W. Sabo (1999)

The t-Distribution

Page 3 of 5

indicating that Pr(t > 2.110) = 0.025. We selected these two entries because they correspond to values 1.740 and 2.110 which bracket the number 1.85 appearing in the original question. Thus, at the very least, we can say that for = 17, Pr(t > 1.85) is between 0.025 and 0.05. This is not very precise, but as an estimate of a p-value for a hypothesis test, it is probably adequate. You might think of doing some linear interpolation to get a better value:

0.05

1.85 1.740 ( 0.05 0.025 ) 0.0426 2.110 1.740

(The exact value to this precision is 0.0409, so the linear interpolation has tended to overestimate the probability, as you would expect from the shape of the graph of the density function.) If you must work from standard tables, and you must have better accuracy than simply bracketing the probability between two successive tabulated values, then some sort of interpolation scheme such as the above is necessary. If high accuracy is necessary and you have access to a computer application (such as MS Excel or software applications designed to facilitate statistical calculations), then use that tool to calculate the required probabilities directly. Example: Estimate Pr(t > 2.58) when = 7. Answer: From the earlier figure illustrating this situation, we see immediately that we can use the symmetry of the tdistribution about t = 0 to write Pr(t > 2.58) = Pr (t < -2.58) + Pr(t > 2.58) = 2 x Pr(t > 2.58) Then, from the = 7 row of the standard t-table, we find that the two entries bracketing t = 2.58 give: Pr(t > 2.365) = 0.025 and Pr(t > 2.998) = 0.01 Thus, Pr(t > 2.58) is a value between 0.025 and 0.01. This means that 2 x Pr(t > 2.58) is a value between 2 x 0.025 = 0.05 and 2 x 0.01 = 0.02. Thus, we conclude that Pr(t > 2.58) is a number between 0.05 and 0.02. (Linear interpolation gives 0.0398 and the exact value is 0.0365.) t-distribution Calculations with MS Excel/97 Excel/97 provides two functions related to the t-distribution. TDIST(c, , 1) gives Pr(t > c) for degrees of freedom. TDIST(c, , 2) gives Pr(t > c) for degrees of freedom. TINV(x, ) gives the value of c that satisfies the equation: Pr(t > c) = x. That is, it gives tx/2,. This is the function we used to prepare the table on the next page.

Page 4 of 5

The t-Distribution

David W. Sabo (1999)

MATH 2441

Feb-99

Right-Hand Tail Critical Values for the Student t-distribution


<-------------------- alpha --------------------> v=n-1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 50 60 80 100 150 Infinity 0.20 1.376 1.061 0.978 0.941 0.920 0.906 0.896 0.889 0.883 0.879 0.876 0.873 0.870 0.868 0.866 0.865 0.863 0.862 0.861 0.860 0.859 0.858 0.858 0.857 0.856 0.856 0.855 0.855 0.854 0.854 0.853 0.853 0.853 0.852 0.852 0.852 0.851 0.851 0.851 0.851 0.849 0.848 0.846 0.845 0.844 0.842 0.15 1.963 1.386 1.250 1.190 1.156 1.134 1.119 1.108 1.100 1.093 1.088 1.083 1.079 1.076 1.074 1.071 1.069 1.067 1.066 1.064 1.063 1.061 1.060 1.059 1.058 1.058 1.057 1.056 1.055 1.055 1.054 1.054 1.053 1.052 1.052 1.052 1.051 1.051 1.050 1.050 1.047 1.045 1.043 1.042 1.040 1.036 0.10 3.078 1.886 1.638 1.533 1.476 1.440 1.415 1.397 1.383 1.372 1.363 1.356 1.350 1.345 1.341 1.337 1.333 1.330 1.328 1.325 1.323 1.321 1.319 1.318 1.316 1.315 1.314 1.313 1.311 1.310 1.309 1.309 1.308 1.307 1.306 1.306 1.305 1.304 1.304 1.303 1.299 1.296 1.292 1.290 1.287 1.282 0.05 6.314 2.920 2.353 2.132 2.015 1.943 1.895 1.860 1.833 1.812 1.796 1.782 1.771 1.761 1.753 1.746 1.740 1.734 1.729 1.725 1.721 1.717 1.714 1.711 1.708 1.706 1.703 1.701 1.699 1.697 1.696 1.694 1.692 1.691 1.690 1.688 1.687 1.686 1.685 1.684 1.676 1.671 1.664 1.660 1.655 1.645 0.025 12.706 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 2.228 2.201 2.179 2.160 2.145 2.131 2.120 2.110 2.101 2.093 2.086 2.080 2.074 2.069 2.064 2.060 2.056 2.052 2.048 2.045 2.042 2.040 2.037 2.035 2.032 2.030 2.028 2.026 2.024 2.023 2.021 2.009 2.000 1.990 1.984 1.976 1.960 0.01 31.821 6.965 4.541 3.747 3.365 3.143 2.998 2.896 2.821 2.764 2.718 2.681 2.650 2.624 2.602 2.583 2.567 2.552 2.539 2.528 2.518 2.508 2.500 2.492 2.485 2.479 2.473 2.467 2.462 2.457 2.453 2.449 2.445 2.441 2.438 2.434 2.431 2.429 2.426 2.423 2.403 2.390 2.374 2.364 2.351 2.326 0.005 0.0025 0.001 0.0005 63.656 127.321 318.289 636.578 9.925 14.089 22.328 31.600 5.841 7.453 10.214 12.924 4.604 5.598 7.173 8.610 4.032 4.773 5.894 6.869 3.707 4.317 5.208 5.959 3.499 4.029 4.785 5.408 3.355 3.833 4.501 5.041 3.250 3.690 4.297 4.781 3.169 3.581 4.144 4.587 3.106 3.497 4.025 4.437 3.055 3.428 3.930 4.318 3.012 3.372 3.852 4.221 2.977 3.326 3.787 4.140 2.947 3.286 3.733 4.073 2.921 3.252 3.686 4.015 2.898 3.222 3.646 3.965 2.878 3.197 3.610 3.922 2.861 3.174 3.579 3.883 2.845 3.153 3.552 3.850 2.831 3.135 3.527 3.819 2.819 3.119 3.505 3.792 2.807 3.104 3.485 3.768 2.797 3.091 3.467 3.745 2.787 3.078 3.450 3.725 2.779 3.067 3.435 3.707 2.771 3.057 3.421 3.689 2.763 3.047 3.408 3.674 2.756 3.038 3.396 3.660 2.750 3.030 3.385 3.646 2.744 3.022 3.375 3.633 2.738 3.015 3.365 3.622 2.733 3.008 3.356 3.611 2.728 3.002 3.348 3.601 2.724 2.996 3.340 3.591 2.719 2.990 3.333 3.582 2.715 2.985 3.326 3.574 2.712 2.980 3.319 3.566 2.708 2.976 3.313 3.558 2.704 2.971 3.307 3.551 2.678 2.660 2.639 2.626 2.609 2.576 2.937 2.915 2.887 2.871 2.849 2.807 3.261 3.232 3.195 3.174 3.145 3.090 3.496 3.460 3.416 3.390 3.357 3.290

David W. Sabo (1999)

The t-Distribution

Page 5 of 5

Você também pode gostar