Você está na página 1de 24

Binomial distribution

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The Binomial distribution is an n time repeated Bernoulli trial. The binomial distribution is the basis for the popular binomial test of statistical significance. The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hyper geometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used

Probability mass function

Cumulative distribution function

notation: parameters:

B(n,p) n N0 number of trials p [0,1] success probability in each trial k {0, , n}

support: pmf:

cdf:

mean: median: mode: variance: skewness:

Np np or np (n+1)p or (n+1)p 1 np(1 p)

ex. kurtosis:

entropy:

mgf:

cf:

pgf:

n = 20 p = 0.1 (blue), p = 0.5 (green) and p = 0.8 (red


Binomial distribution for

Binomial distribution for

p = 0.5 with n and k as in Pascal's triangle. = 4) is 70 / 256.

The probability that a ball in a Galton box with 8 layers (n = 8) ends up in the central bin (k

Contents
1 Specification 1.1 Probability mass function 1.2 Cumulative distribution function 2 Mean and variance 3 Mode and median 4 Covariance between two binomials 5 Relationship to other distributions 5.1 Sums of binomials 5.2 Bernoulli distribution 5.3 Poisson binomial distribution 5.4 Normal approximation 5.5 Poisson approximation 6 Limits 7 Examples 7.1 Symmetric binomial distribution (p = 0.5) 7.2 An example from sports

Specification
Probability mass function
In general, if the random variable K follows the binomial distribution with parameters n and p, we write K ~ B(n, p). The probability of getting exactly k successes in n trials is given by the probability mass function:

for k = 0, 1, 2, ..., n, where

is the binomial coefficient (hence the name of the distribution) "n choose k", also denoted C(n, k), nCk, n k nk or Ck. The formula can be understood as follows: we want k successes (p ) and n k failures (1 p) . However, the k successes can occur anywhere among the n trials, and there are C (n, k) different ways of distributing k successes in a sequence of n trials. In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

As a function of k, the expression (k, n, p) has a maximal value, which can be found by calculating

and comparing it to 1. There is always an integer M that satisfies

(k, n, p) is monotone increasing for k < M and monotone decreasing fork > M, with the exception of the case where (n + 1)p is an integer. In this case, there are two values for which is maximal: (n + 1)p and (n + 1)p 1.M is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.

Cumulative distribution function


The cumulative distribution function can be expressed as:

Where

is the "floor" under x, i.e. the greatest integer less than or equal to x.

It can also be represented in terms of the regularized incomplete beta function, as follows:

For k np, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound

and Chernoff's inequality can be used to derive the bound

Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k 3n/8

Mean and variance


If X ~ B (n, p) (that is, X is a binomially distributed random variable), then the expected value of X is

and the variance is

This fact is easily proven as follows. Suppose first that we have a single Bernoulli trial. There are two possible outcomes: 1 and 0, the first occurring with probability p and the second having probability 1 p. The expected value in this trial will be equal to = 1p + 0(1p) = p. The variance in this trial is 2 2 2 calculated similarly: = (1p) p + (0p) (1p) = p (1p). The generic binomial distribution is a sum of n independent Bernoulli trials. The mean and the variance of such distributions are equal to the sums of means and variances of each individual trial:

Mode and median


Usually the mode of a binomial B (n,p) distribution is equal to (n + 1)p, where is the floor function. However when (n + 1) p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1) p and (n + 1) p 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:

In general, there is no single formula to find the median for a binomial distribution, and it may even be non-unique. However several special results have been established: If np is an integer, then the mean, median, and mode coincide and equal np.

Any median m must lie within the interval np m np. A median m cannot lie too far away from the mean: |m np| min {ln 2, max {p, 1 p}}. The median is unique and equal to m = round (np) in cases when either p 1 ln 2 or p ln 2 or |m np| min {p, 1 p} (except for the case when p = and n is odd). When p = 1/2 and n is odd, any number m in the interval (n 1) m (n + 1) is a median of the binomial distribution. If p = 1/2 and Nis even, then m = n/2 is the unique median.

Covariance between two binomials


If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful. Using the definition of covariance, in the case n = 1 we have

The first term is non-zero only when both X and Y are one, and X and Y are equal to the two probabilities. Defining pB as the probability of both happening at the same time, this gives

And for n such trials again due to independence

If X and Y are the same variable, this reduces to the variance formula given above.

Relationship to other distributions Sums of binomials


If X ~ B (n, p) and Y ~ B(m, p) are independent binomial variables, then X + Y is again a binomial variable; its distribution is

Bernoulli distribution
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. Symbolically, X ~ B (1, p) has the same meaning as X ~ Bern (p). Conversely, any binomial distribution, B (n, p), is the sum of n independent Bernoulli trials, Bern (p), each with the same probability p.

Poisson binomial distribution

The binomial distribution is a special case of the Poisson binomial distribution, which is a sum of n independent non-identical trials Bern (pi). If X has the Poisson binomial distribution with p1 = = pn =p then X ~ B (n, p).

Normal approximation
If n is large enough, then the skew of the distribution is not too great. In this case, if a suitable continuity correction is used, then an excellent approximation to B (n, p) is given by the normal distribution

The approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or [6] 1. Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one: One rule is that both x=np and n (1 p) must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large n until n is very large (ex: x=11, n=7752). That rule is that for n > 5 the normal approximation is adequate if

Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 [citation needed] standard deviations of its mean is within the range of possible values, that is if

Also as the approximation generally improves, it can be shown that the inflection points occur at

The following is an example of applying a continuity correction: Suppose one wishes to calculate Pr(X 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X 8) is approximated by Pr(Y 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results. This approximation, known as de MoivreLaplace theorem, is a huge time-saver (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since B(n, p) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis of a hypothesis test, a "proportion z-test," for the value [7] of p using x/n, the sample proportion and estimator of p, in a common test statistic. For example, suppose you randomly sample n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If you sampled groups of n people repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population 1/2 and with standard deviation = (p(1 p)/n) . Large sample sizes n are good because the standard

deviation, as a proportion of the expected value, gets smaller, which allows a more precise estimate of the unknown parameter p.

Poisson approximation
The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter = np can be used as an approximation to B (n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if n 20 and p 0.05, or if n 100 and np 10.

Binomial PDF and normal approximation for n = 6 and p = 0.5

Limits
As n approaches and p approaches 0 while np remains fixed at > 0 or at least np approaches > 0, then the Binomial (n, p) distribution approaches the Poisson distribution with expected value . As n approaches while p remains fixed, the distribution of

approaches the normal distribution with expected value 0 and variance 1. This result is sometimes loosely stated by saying that the distribution of X approaches the normal distribution with expected value np and variance np (1 p). That loose statement cannot be taken literally because the thing asserted to be approached actually depends on the value of n, and n is approaching infinity. This result is a specific case of the Central Limit Theorem.

Examples
An elementary example is this: roll a standard die ten times and count the number of fours. The distribution of this random number is a binomial distribution with n = 10 and p = 1/6.

Symmetric binomial distribution (p = 0.5)


This example illustrates the central limit theorem in the example of fair coin tosses. A coin is fair if it shows heads and tails with equal probability p = 1/2. If we toss a fair coin n times, the number k of times the coin shows 'heads' is a random variable that is binomially distributed. The first picture below shows the binomial distribution for several values of n as a function of k.

Binomial distributions with p = 0.5 and n= 4, 16, 64

Moved mean to zero

Rescaled with standard deviation

These distributions are reflection symmetric with respect to the line k = n/2, that is, the probability mass function satisfies (k; n, 1/2) = (n k; n, 1/2). In the middle picture, the distributions were shifted so that their mean is zero.

The width of the distribution is proportional to the standard deviation . The value of the shifted functions at k = 0 is their respective maximum and proportional to 1 / . Hence, binomial distributions with different values of n can be rescaled by multiplying the function values with and dividing the x-axis by . This is depicted in the third picture above. The picture on the right shows shifted and normalized binomial distributions, now for more and larger values of n, in order to visualize that the function values converge to a common curve. By using Stirling's approximation of the binomial coefficients, one gets that this curve is a standard normal distribution:

This is the probability density function of the standard normal distribution . The central limit theorem generalizes the above to limits of distributions that are not necessarily binomially distributed. The second picture on the right shows the same data, but uses a logarithmic scale, which is sometimes advisable to use in applications.

Rescaled and shifted binomial distributions with p = 0.5 for n = 4, 6, 8, 12, 16, 23, 32, 46

The same data as above on a logarithmic scale

An example from sports


A soccer player makes multiple attempts to score goals. If she has a shooting success probability of 0.25 and takes 4 shots in a match, then the number of goals she scores can be modeled as B(4, 0.25). Note that p represents the probability of any given shot becoming a goal, and 1 p represents the probability of failure. The probability of the player scoring 0, 1, 2, 3, or 4 goals on 4 shots is:

Poisson distribution
In probability theory and statistics, the Poisson distribution (pronounced[pwas ]) (or Poisson law of small numbers ) is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. (The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.) Poisson
Probability mass function
[1]

The horizontal axis is the index k, the number of occurrences. The function is only defined at integer values of k. The connecting lines are only guides for the eye.

Cumulative distribution function

The horizontal axis is the index k, the number of occurrences. The CDF is discontinuous at the integers of k and flat everywhere else because a variable that is Poisson distributed only takes on integer values.

notation:

parameters: > 0 (real) support: pmf: k {0, 1, 2, 3, ...}

cdf:

for

or

(where function and

is the Incomplete gamma is the floor function)

mean: median:

mode:

variance: skewness:

ex.kurtosis:

entropy:

(for

large )

mgf:

cf:

History
The distribution was first introduced by Simon Denis Poisson (17811840) and published, together with his probability theory, in 1837 in his work Recherches sur la probabilit des jugements en matire criminelle et en [2] matire civile(Research on the Probability of Judgments in Criminal and Civil Matters). The work focused on certain random variables N that count, among other things, the number of discrete occurrences (sometimes called arrivals) that take place during a time-interval of given length. A famous quote of Poisson is: "Life is good for only two things: Discovering mathematics and teaching mathematics." After deriving his distribution in late 1837, the first practical application of this distribution was done by Ladislaus Bortkiewicz in 1898 when he was given the task to investigate the number of soldiers of the Prussian army killed [3] accidentally by horse kick; this experiment introduced Poisson distribution to the field of reliability engineering.

Applications
Applications of Poisson distribution can be found in every field related to counting: Electrical system example: telephone calls arriving in a system.

Astronomy example: the number of stars in system of space. Biology example: the number of mutations on a given strand of DNA. A classic example of Poisson distribution is the nuclear decay of atoms. The decay of a radioactive sample is a case in point because, once a particle decays, it does not decay again.

The distribution equation


If the expected number of occurrences in a given interval is , then the probability that there are exactly k occurrences (k being a non-negative integer,k = 0, 1, 2, ...) is equal to

where e is the base of the natural logarithm (e = 2.71828...) k is the number of occurrences of an event the probability of which is given by the function k! is the factorial of k is a positive real number, equal to the expected number of occurrences during the given interval. For instance, if the events occur on average 4 times per minute, and one is interested in the probability of an event occurring k times in a 10 minute interval, one would use a Poisson distribution as the model with = 104 = 40. As a function of k, this is the probability mass function. The Poisson distribution can be derived as a limiting case of the binomial distribution. The Poisson distribution can be applied to systems with a large number of possible events, each of which is rare. The Poisson distribution is sometimes called a Poissonian.

Poisson noise and characterizing small occurrences


The parameter is not only the mean number of occurrences E[k], but also its variance its mean with a standard deviation in electronics) as shot noise. (see Table). Thus, the number of observed occurrences fluctuates about . These fluctuations are denoted asPoisson noise or (particularly

The correlation of the mean and standard deviation in counting independent discrete occurrences is useful scientifically. By monitoring how the fluctuations vary with the mean signal, one can estimate the contribution of a single occurrence, even if that contribution is too small to be detected directly. For example, the charge e on an electron can be estimated by correlating the magnitude of an electric current with itsshot noise. If N electrons pass a point in a given time t on the average, the mean current is I = eN / t; since the current fluctuations should be of the order (i.e., the standard deviation of the Poisson process), the charge e can be estimated from

the ratio . An everyday example is the graininess that appears as photographs are enlarged; the graininess is due to Poisson fluctuations in the number of reduced silver grains, not to the individual grains themselves. By correlating the graininess with the degree of enlargement, one can estimate the contribution of an individual grain

(which is otherwise too small to be seen unaided). Many other molecular applications of Poisson noise have been developed, e.g., estimating the number density of receptor molecules in a cell membrane.

Related distributions
If a Skellam distribution. If and are independent, then the difference Y = X1 X2 follows

and

are independent, and Y = X1 + X2, then the distribution .

of X1 conditional on Y = y is abinomial. Specifically, More generally, if X1, X2,..., Xn are independent Poisson random variables with parameters 1, 2,..., n then

The Poisson distribution can be derived as a limiting case to the binomial distribution as the number of trials goes to infinity and theexpected number of successes remains fixed see law of rare events below. Therefore it can be used as an approximation of thebinomial distribution if n is sufficiently large and p is sufficiently small. There is a rule of thumb stating that the Poisson distribution is a good approximation of the binomial distribution if n is at least 20 [4] and p is smaller than or equal to 0.05, and an excellent approximation if n 100 and np 10. For sufficiently large values of , (say >1000), the normal distribution with mean and variance (standard deviation ), is an excellent approximation to the Poisson distribution. If is greater than about 10, then the normal distribution is a good approximation if an appropriate continuity correction is performed, i.e., P(X x), where (lower-case) x is a non-negative integer, is replaced by P(X x + 0.5).

Variance-stabilizing transformation: When a variable is Poisson distributed, its square root is approximately normally distributed with expected value of about and variance of about 1/4. Under this transformation, the convergence to normality is far faster than the untransformed variable. Other, slightly more complicated, variance [6] stabilizing transformations are available, one of which is Anscombe transform. See Data transformation (statistics) for more general uses of transformations. If the number of arrivals in any given time interval [0,t] follows the Poisson distribution, with mean = t, then the lengths of the inter-arrival times follow the Exponential distribution, with mean 1 / .
[5]

Occurrence
The Poisson distribution arises in connection with Poisson processes. It applies to various phenomena of discrete properties (that is, those that may happen 0, 1, 2, 3, ... times during a given period of time or in a given area) whenever the probability of the phenomenon happening is constant in time or space. Examples of events that may be modelled as a Poisson distribution include:

The number of soldiers killed by horse-kicks each year in each corps in the Prussian cavalry. This example was made famous by a book of Ladislaus Josephovich Bortkiewicz (18681931). The number of yeast cells used when brewing Guinness beer. This example was made famous by William Sealy [7] Gosset (18761937). The number of phone calls arriving at a call centre per minute. The number of goals in sports involving two competing teams. The number of deaths per year in a given age group. The number of jumps in a stock price in a given time interval. Under an assumption of homogeneity, the number of times a web server is accessed per minute. The number of mutations in a given stretch of DNA after a certain amount of radiation. The proportion of cells that will be infected at a given multiplicity of infection.

How does this distribution arise? The law of rare events


In several of the above examplessuch as, the number of mutations in a given sequence of DNAthe events being counted are actually the outcomes of discrete trials, and would more precisely be modelled using the binomial distribution, that is

In such cases n is very large and p is very small (and so the expectation np is of intermediate magnitude). Then the distribution may be approximated by the less cumbersome Poisson distribution

This is sometimes known as the law of rare events, since each of the nindividual Bernoulli events rarely occurs. The name may be misleading because the total count of success events in a Poisson process need not be rare if the parameter np is not small. For example, the number of telephone calls to a busy switchboard in one hour follows a Poisson distribution with the events appearing frequent to the operator, but they are rare from the point of view of the average member of the population who is very unlikely to make a call to that switchboard in that hour. [edit]Proof We will prove that, for fixed , if

then for each fixed k

To see the connection with the above discussion, for any Binomial random variable with large n and small p set

= np. Note that the expectation E(Xn) = is fixed with respect to n.


First, recall from calculus

then since p = / n in this case, we have

Next, note that

where we have taken the limit of each of the terms independently, which is permitted since there is a fixed number of terms with respect to n(there are k of them). Consequently, we have shown that

. [edit]Generalization We have shown that if

where pn = / n, then such that

in distribution. This holds in the more general situation that pn is any sequence

Comparison of the Poisson distribution (black dots) and thebinomial distribution with n=10 (red line), n=20 (blue line), n=1000 (green line). All distributions have a mean of 5. The horizontal axis shows the number of events k. Notice that as n gets larger, the Poisson distribution becomes an increasingly better approximation for the binomial distribution with the same mean.

2-dimensional Poisson process

where e is the base of the natural logarithm (e = 2.71828...) k is the number of occurrences of an event - the probability of which is given by the function k! is the factorial of k D is the 2-dimensional region |D| is the area of the region N(D) is the number of points in the process in region D

The expected value of a Poisson-distributed random variable is equal to and so is its variance. The higher moments of the Poisson distribution are Touchard polynomials in , whose coefficients have a combinatorial meaning. In fact, when the expected value of the Poisson distribution is 1, then Dobinski's formula says that the nth moment equals the number of partitions of a set of size n. The mode of a Poisson-distributed random variable with non-integer is equal to , which is the largest integer less than or equal to . This is also written as floor(). When is a positive integer, the modes are and 1. Given one event (or any number) the expected number of other events is independent so still . If reproductive success follows a Poisson distribution with expected number of offspring , then for a given individual the expected number of (half)siblings (per parent) is also . If fullsiblings are rare total expected sibs are 2. Sums of Poisson-distributed random variables: If follow a Poisson distribution with parameter and Xi are independent, then

also follows a Poisson distribution whose parameter is the sum of the component parameters. A converse is Raikov's theorem, which says that if the sum of two independent random variables is Poisson-distributed, then so is each of those two independent random variables. The sum of normalised square deviations is approximately distributed as chi-squared if the mean is of a moderate size ( > 5 is suggested).
[8]

If

are observations from independent Poisson distributions with

means

then

The moment-generating function of the Poisson distribution with expected value is

All of the cumulants of the Poisson distribution are equal to the expected value . The nth factorial moment of the n Poisson distribution is . The Poisson distributions are infinitely divisible probability distributions. The directed Kullback-Leibler divergence between Pois() and Pois(0) is given by

Upper bound for the tail probability of a Poisson random variable XPois(). bound argument.

[9]

The proof uses a Chernoff

Similarly,

[edit]Evaluating

the Poisson Distribution

Although the Poisson distribution is limited by , the numerator and denominator of f(k,) can reach extreme values for large values of k or . If the Poisson distribution is evaluated on a computer with limited precision by first evaluating its numerator and denominator and then dividing the two, then a significant loss of precision may occur. For example, with the common double precision a complete loss of precision occurs if f(150,150) is evaluated in this manner. A more robust evaluation method is:

[edit]Generating

Poisson-distributed random variables

A simple algorithm to generate random Poisson-distributed numbers (pseudo-random number sampling) has been given by Knuth (see References below):

algorithm poisson random number (Knuth): init: Let L e, k 0 and p 1. do: k k + 1. Generate uniform random number u in [0,1] and let p p u. while p > L.

return k 1.
While simple, the complexity is linear in . There are many other algorithms to overcome this. Some are given in Ahrens & Dieter, see References below. Also, for large values of , there may be numerical stability issues because of the term e . One solution for large values of is Rejection sampling, another is to use a Gaussian approximation to the Poisson. Inverse transform sampling is simple and efficient for small values of , and requires only one uniform random number u per sample. Cumulative probabilities are examined in turn until one exceeds u. Parameter estimation

Maximum likelihood
Given a sample of n measured values ki we wish to estimate the value of the parameter of the Poisson population from which the sample was drawn. To calculate the maximum likelihood value, we form the log-likelihood function

Take the derivative of L with respect to and equate it to zero:

Solving for yields a stationary point, which if the second derivative is negative is the maximum-likelihood estimate of :

Checking the second derivative, it is found that it is negative for all and ki greater than zero, therefore this stationary point is indeed a maximum of the initial likelihood function:

Since each observation has expectation so does this sample mean. Therefore it is an unbiased estimator of . It is also an efficient estimator, i.e. its estimation variance achieves the CramrRao lower bound (CRLB). Hence it is MVUE. Also it can be proved that the sample mean is complete and sufficient statistic for .

[edit]Bayesian

inference

In Bayesian inference, the conjugate prior for the rate parameter of the Poisson distribution is the Gamma distribution. Let

denote that is distributed according to the Gamma density g parameterized in terms of a shape parameter and an inverse scale parameter:

Then, given the same sample of n measured values ki as before, and a prior of Gamma(, ), the posterior distribution is

The posterior mean E[] approaches the maximum likelihood estimate

in the limit as

The posterior predictive distribution of additional data is a Gamma-Poisson (i.e. negative binomial) distribution. [edit]Confidence

interval

A simple and rapid method to calculate an approximate confidence interval for the estimation of is proposed in Guerriero et al. (2009). This method provides a good approximation of the confidence interval limits, for samples containing at least 15 20 elements. Denoting by N the number of sampled points or events and by L the length of sample line (or the time interval), the upper and lower limits of the 95% confidence interval are given by:

The "law of small numbers" The word law is sometimes used as a synonym of probability distribution, and convergence in law means convergence in distribution. Accordingly, the Poisson distribution is sometimes called the law of small numbers because it is the probability distribution of the number of occurrences of an event that happens rarely but has very many opportunities to happen. The Law of Small Numbers is a book by Ladislaus Bortkiewicz about the Poisson distribution, published in 1898. Some have suggested that the Poisson distribution should have been called the Bortkiewicz distribution.

Você também pode gostar