Francais | English | Espanõl

Binomial distribution

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Binomial
Probability mass function
Cumulative distribution function
Parameters <math>n \geq 0</math> number of trials (integer)
<math>0\leq p \leq 1</math> success probability (real)
Support <math>k \in \{0,\dots,n\}\!</math>
Probability mass function (pmf) <math>{n\choose k} p^k (1-p)^{n-k} \!</math>
Cumulative distribution function (cdf) <math>I_{1-p}(n-\lfloor k\rfloor, 1+\lfloor k\rfloor) \!</math>
Mean <math>np\!</math>
Median one of <math>\{\lfloor np\rfloor-1, \lfloor np\rfloor, \lfloor np\rfloor+1\}</math>
Mode <math>\lfloor (n+1)\,p\rfloor\!</math>
Variance <math>np(1-p)\!</math>
Skewness <math>\frac{1-2p}{\sqrt{np(1-p)
Excess Kurtosis {{{kurtosis}}}
Entropy {{{entropy}}}
mgf {{{mgf}}}
Char. func. {{{char}}}
\!</math>|
 kurtosis   =<math>\frac{1-6p(1-p)}{np(1-p)}\!</math>|
 entropy    =<math> \frac{1}{2} \ln \left( 2 \pi n e p (1-p) \right) + O \left( \frac{1}{n} \right) </math>|
 mgf        =<math>(1-p + pe^t)^n \!</math>|
 char       =<math>(1-p + pe^{it})^n \!</math>|

}}

In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. Such a success/failure experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1, then the binomial distribution is the Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

Contents

[edit] Example

A typical example is the following: assume 5% of the population is green-eyed. You pick 500 people randomly. The number of green-eyed people you pick is a random variable X which follows a binomial distribution with n = 500 and p = 0.05 (when picking the people with replacement).

[edit] Specification

[edit] Probability mass function

In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(n, p). The probability of getting exactly k successes is given by the probability mass function:

<math>f(k;n,p)={n\choose k}p^k(1-p)^{n-k}\,</math>

for k=0,1,2,...,n and where

<math>{n\choose k}=\frac{n!}{k!(n-k)!}</math>

is the binomial coefficient (hence the name of the distribution) "n choose k" (also denoted C(n, k) or nCk). The formula can be understood as follows: we want k successes (pk) and nk failures (1 − p)nk. However, the k successes can occur anywhere among the n trials, and there are C(n, k) different ways of distributing k successes in a sequence of n trials.

In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as

<math>f(k;n,p)=f(n-k;n,1-p).\,\!</math>

So, one must look to a different k and a different p (the binomial is not symmetrical in general).

[edit] Cumulative distribution function

The cumulative distribution function can be expressed in terms of the regularized incomplete beta function, as follows:

<math> F(k;n,p) = \Pr(X \le k) = I_{1-p}(n-k, k+1) \!</math>

provided k is an integer and 0 ≤ k ≤ n. If x is not necessarily an integer or not necessarily positive, one can express it thus:

<math>F(x;n,p) = \Pr(X \le x) = \sum_{j=0}^{\operatorname{Floor}(x)} {n\choose j}p^j(1-p)^{n-j}</math>

For knp, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound

<math> F(k;n,p) \leq \exp\left(-2 \frac{(np-k)^2}{n}\right), \!</math>

and Chernoff's inequality can be used to derive the bound

<math> F(k;n,p) \leq \exp\left(-\frac{1}{2\,p} \frac{(np-k)^2}{n}\right). \!</math>

[edit] Mean, standard deviation, and mode

If X ~ B(n, p) (that is, X is a binomially distributed random variable), then the expected value of X is

<math>\operatorname{E}(X)=np\,\!</math>

and the variance is

<math>\operatorname{Var}(X)=np(1-p).\,\!</math>

This fact is easily proven as follows. Suppose first that we have exactly one Bernoulli trial. We have two possible outcomes, 1 and 0, with the first having probability p and the second having probability 1 − p; the mean for this trial is given by μ = p. Using the definition of variance, we have

<math>\sigma^2= \left(1 - p\right)^2p + (0-p)^2(1 - p) = p(1-p).</math>

Now suppose that we want the variance for n such trials (i.e. for the general binomial distribution). Since the trials are independent, we may add the variances for each trial, giving

<math>\sigma^2_n = \sum_{k=1}^n \sigma^2 = np(1 - p). \quad \Box</math>

The most likely value or mode of X is given by the largest integer less than or equal to (n + 1)p; if m = (n + 1)p is itself an integer, then m − 1 and m are both modes.

[edit] Relations to other distributions

[edit] Sums of binomials

If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables, then X + Y is again a binomial variable; its distribution is

<math>X+Y \sim B(n+m, p).\,</math>

[edit] Normal approximation

Binomial PDF and normal approximation for n = 6 and p = 0.5.

If n is large enough, the skew of the distribution is not too great, and a suitable continuity correction is used, then an excellent approximation to B(n, p) is given by the normal distribution

<math> \operatorname{N}(np, np(1-p)).\,\!</math>

Various rules of thumb may be used to decide whether n is large enough. One rule is that both np and n(1 − p) must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10. Another commonly used rule holds that the above normal approximation is appropriate only if

<math>\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1-p)} \in [0,n].</math>

The following is an example of applying a continuity correction: Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction. Warning: The normal approximation gives inaccurate results unless a continuity correction is used.

This approximation is a huge time-saver (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1733. Nowadays, it can be seen as a consequence of the central limit theorem since B(n, p) is a sum of n independent, identically distributed 0-1 indicator variables.

For example, suppose you randomly sample n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If you sampled groups of n people repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)1/2. Large sample sizes n are good because the standard deviation gets smaller, which allows a more precise estimate of the unknown parameter p.

[edit] Poisson approximation

The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to one rule of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, and also if n ≥ 100 and np ≤ 10.<ref>NIST/SEMATECH, '6.3.3.1. Counts Control Charts', e-Handbook of Statistical Methods, <http://www.itl.nist.gov/div898/handbook/pmc/section3/pmc331.htm> [accessed 25 October 2006]</ref>

[edit] Limits of binomial distributions

  • As n approaches ∞ and p approaches 0 while np remains fixed at λ > 0 or at least np approaches λ > 0, then the Binomial(np) distribution approaches the Poisson distribution with expected value λ.
  • As n approaches ∞ while p remains fixed, the distribution of
<math>{X-np \over \sqrt{np(1-p)\ }}</math>
approaches the normal distribution with expected value 0 and variance 1.

[edit] References

<references/>

  • Abdi, H. "[1] ((2007). Binomial Distribution: Binomial and Sign Tests.. In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage.".
  • Cheatam & Steele, "Uniform Distributive Norms", Los Angeles: Time-Warner, 1998.

[edit] See also

Image:Bvn-small.png Probability distributions

view  talk  edit</span>  ]

Univariate Multivariate
Discrete: BernoullibinomialBoltzmanncompound PoissondegenerateGauss-Kuzmingeometrichypergeometriclogarithmicnegative binomialparabolic fractalPoissonRademacherSkellamuniformYule-SimonzetaZipfZipf-Mandelbrot Ewensmultinomial
Continuous: BetaBeta primeCauchychi-squareDirac delta functionErlangexponentialexponential powerFfadingFisher's zFisher-TippettGammageneralized extreme valuegeneralized hyperbolicgeneralized inverse GaussianHalf-LogisticHotelling's T-squarehyperbolic secanthyper-exponentialhypoexponentialinverse chi-squareinverse gaussianinverse gammaKumaraswamyLandauLaplaceLévyLévy skew alpha-stablelogisticlog-normalMaxwell-BoltzmannMaxwell speednormal (Gaussian)ParetoPearsonpolarraised cosineRayleighrelativistic Breit-WignerRiceStudent's ttriangulartype-1 Gumbeltype-2 GumbeluniformVoigtvon MisesWeibullWigner semicircleWilks' lambda DirichletKentmatrix normalmultivariate normalvon Mises-FisherWigner quasiWishart
Miscellaneous: Cantorconditionalexponential familyinfinitely divisiblelocation-scale familymarginalmaximum entropy phase-typeposterior priorquasisamplingsingular
</center>

[edit] External links

da:Binomialfordeling de:Binomialverteilung es:Distribución binomial fr:Loi binomiale it:Variabile casuale binomiale he:התפלגות בינומית lt:Binominis skirstinys nl:Binomiale verdeling ja:二項分布 hu:Binomiális eloszlás pl:Rozkład dwumianowy pt:Distribuição binomial ru:Биномиальное распределение su:Sebaran binomial fi:Binomijakauma sv:Binomialfördelning zh:二項分佈

Personal tools