Você está na página 1de 6

1

Handout 7 Introduction to the Normal Distribution


Background Continuous Random Variables Definition. A random variable X is continuous if given any two values a and b that it is possible for X to take on, any value between a and b is a value that X can take on. Example. Consider the random experiment consisting of choosing a GMU student at random. Consider the random variable which associates a student with the students height in inches. This is a continuous random variable. For example, not only are 64 and 66 possible heights, but, at least in principle, any value between 64 and 66 is possible to obtain. Whats the big deal about continuous random variables ? If a random variable X is continuous then there is a curve, called the probability density function, ( or pdf , for short ) so that P( a # X # b ) has the geometric interpretation of being the area under the graph of the probability density function between a and b . Anytime theres a way to think of an abstract idea geometrically, its a blessing ! Random Variables Symmetric about the Mean Definition. A random variable X is symmetric about the mean if given any a $ 0 , P( : # X # : + a ) = P( : - a # X # : ) . In other words, the probability that the random variable is between the mean and a given amount below the mean is the same as the probability that the random variable is between the mean and that same amount above the mean. The random variables that come up in applications are often both continuous and symmetric about the mean. That the random variable is symmetric about the mean, means that the curve is geometrically symmetric about the mean.

2 Remarks / Things to Think About ( 1 ) If a continuous random variable X is symmetric about the mean, P( : # X # 4 ) = P( - 4 # X # : ) = . ( Why ? ) Interpret this geometrically. (2) The function F( x ) = P ( X # x ) is called the cumulative distribution function, ( or cdf for short ). Interpret the cdf geometrically.

Typical Picture of the Probability Density Function for a Random Variable Symmetric about the Mean : ( Say, the height of a randomly chosen GMU male. )

The Most Important Class of Random Variables in the Universe! The most important example of a continuous random variable which is symmetric about the mean is a normal random variable. Normal random variables are ubiquitous random variables from the diameter of ball bearings to the SAT scores of this years freshman class turn out to be normal, or at least approximately normal. Just why the normal distribution is so ubiquitous is better understood in light of the Central Limit Theorem, which will be discussed a bit later. Notation. If a random variable X is normally distributed with mean : and variance F2 , one writes X - N( : , F2 ).

3 Question : Which value of sigma corresponds to which of the three probability density functions graphed below ?

Key Properties of the Normal Distribution Theorem ( 1 ) If X is normally distributed, then for any constants c and d , cX + d is normally distributed. ( 2 ) If X and Y are normally distributed, and a and b are constants, then aX + bY is normally distributed. ----The first of these properties implies that if X - N( : , F2 ) and we define the random variable Z = ( X - : )/F , then Z - N ( 0, 1 ) , i.e, Z is a standard normal random variable. So whats the big deal ? The important point here is that since a normal random variable can always be standardized to obtain a standard normal random variable, probability computations can always be reduced to computations involving the standard normal cdf. For your convenience, a table of the cumulative distribution function for a standard normal random variable is in the TABLES folder on classweb ; printed on the inside front cover of your text is a table of P( 0 # Z # z ) for z $ 0. By symmetry, for any z $ 0, P( Z # z ) = + P( 0 # Z # z ) , so ( if one knows what one is doing ) either table will do for computation, These days, values of the standard normal cdf ( and the inverse function ) are also available via spreadsheet functions, calculator buttons, and computer subroutines ( including MINITAB ) .

4 Remark The table of probabilities for z < 0 are convenient but not really necessary . Why ? To illustrate, find P( Z # - 0.45 ) just using the table giving the probabilities for the values of z with z > 0 .

Computational Example. The height H in inches of a randomly chosen GMU male is approximately normal with : = 68 inches and F = 2.5 inches. Suppose we want the probability P( 68 # H # 73 ) . This is the same as P [ ( 68 - 68 )/2.5 # ( H - 68 )/2.5 # (73 - 68 )/2.5 ] = P ( 0 # Z # 2 ) = P ( Z # 2 ) - P ( Z # 0 ) = 0.9772 - 0.5 = 0.4772. Computational Example. The Stanford - Binet Intelligence test is set up to have a population mean score of 100 . Let us say that for a certain population of individuals the population standard deviation is 25 . Stanford-Binet scores are normally distributed. Find the proportion of the population with a score $ 150 .

P ( IQ $ 150 ) = P [ ( IQ - 100 )/25 $ ( 150 - 100 )/25 ] = P( Z $ 2 ) = 1 - P( Z # 2 ) = 1 - 0.9772 = 0.0228. Computational Example. What is the minimum height for a doorway so that 99% of GMU males can pass through it without bumping their heads ? ( Use the data from the first example above.) We wish to find h so that P( H # h ) = 0.99 . Write P [ ( H - 68 ) / 2.5
# ( h - 68 )/2.5 ] = 0.99

P( Z

# ( h - 68 )/2.5 ) = 0.99

The value of z corresponding to a probability of 0.99 is about 2.33.

5 Hence ; solving for h one obtains

h = 73.825 in.

( Id probably recommend going up to an even 74 inches ...) .

Simple Random Samples of a Normal Random Variable The second statement of the last theorem, generalizes to a linear combination of any number of normal random variables : such a linear combination , aX + bY + cZ + ...., is still normal. So, combining the last theorem, with the Standard Error Theorem, one has :

Theorem If X - N( : , F2 ) and X1 , X2 , X3 , , Xn a simple random sample of size n, then are the observations of X corresponding to . Thus if we let ,

then Z - N( 0 , 1 ).

Since , by the Standard Error Theorem,

, one may also

write

Remarks This theorem is essential for our discussion of confidence intervals for the mean of a normal random variable.

6 Practice Problem ( from a previous old statistics examination ...) Tins of luncheon meat are labeled as containing 43 grams of fat. Of course, this is only a nominal value : the actual fat content ( in grams ) is a normal random variable F. The mean of F can be manipulated by appropriately adjusting the machinery, trimming animal carcasses more ( or less ) carefully of fat, etc. The standard deviation of F is harder to control, and is thought to have a value of approximately 2 grams. It is more profitable to pack tins with fat than lean meat ; nevertheless, given consumer preferences for products with lower fat content, the packing company wishes to have 99% of the tins of luncheon meat it produces contain no more than 43 grams of fat. Find the largest value of :, the mean of F , that will achieve this.

Solution given below .... dont peek too early !

Solution : One wishes to find the largest value of : so that P( F # 43 ) = 0.99. Standardize F to write :

Using a standard normal table ( .... actually I used the MINITAB function ...)

Solving for : gives : = 43 - 2( 2.3263 ) = 38.35 grams. Other Problems to Look At ... Problem Numbers Two and Nine exam2_06k.pdf ( classweb Exam2 folder )

Você também pode gostar