Francais | English | Espanõl

Yule-Simon distribution

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Yule-Simon
Probability mass function
Image:Yule-Simon distribution PMF.png
Yule-Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Cumulative distribution function
Image:Yule-Simon distribution CMF.png
Yule-Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.)
Parameters <math>\rho>0\,</math> shape (real)
Support <math>k \in \{1,2,\dots\}\,</math>
Probability mass function (pmf) <math>\rho\,\mathrm{B}(k, \rho+1)\,</math>
Cumulative distribution function (cdf) <math>1 - k\,\mathrm{B}(k, \rho+1)\,</math>
Mean <math>\frac{\rho}{\rho-1}\,</math> for <math>\rho>1\,</math>
Median
Mode <math>1\,</math>
Variance <math>\frac{\rho^2}{(\rho-1)^2\;(\rho-2)}\,</math> for <math>\rho>2\,</math>
Skewness <math>\frac{(\rho+1)^2\;\sqrt{\rho-2
Excess Kurtosis {{{kurtosis}}}
Entropy {{{entropy}}}
mgf {{{mgf}}}
Char. func. {{{char}}}
{(\rho-3)\;\rho}\,</math> for <math>\rho>3\,</math>|
 kurtosis   =<math>\rho+3+\frac{11\rho^3-49\rho-22} {(\rho-4)\;(\rho-3)\;\rho}\,</math> for <math>\rho>4\,</math>|
 entropy    =|
 mgf        =<math>\frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^t)\,e^t \,</math>|
 char       =<math>\frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^{i\,t})\,e^{i\,t} \,</math>|

}} In probability and statistics, the Yule-Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution.

The probability mass function of the Yule-Simon(ρ) distribution is

<math>f(k;\rho) = \rho\,\mathrm{B}(k, \rho+1), \,</math>

for integer <math>k \geq 1</math> and real <math>\rho > 0</math>, where <math>\mathrm{B}</math> is the beta function. Equivalently the pmf can be written in terms of the falling factorial as

<math>
f(k;\rho) = \frac{\rho\,\Gamma(\rho+1)}{(k+\rho)^{\underline{\rho+1}}}
,

\,</math>

where <math>\Gamma</math> is the gamma function. Thus, if <math>\rho</math> is an integer,

<math>
f(k;\rho) = \frac{\rho\,\rho!\,(k-1)!}{(k+\rho)!}
.

\,</math>

The probability mass function f has the property that for sufficiently large k we have

<math>
f(k;\rho)
\approx \frac{\rho\,\Gamma(\rho+1)}{k^{\rho+1}}
\propto \frac{1}{k^{\rho+1}}
.

\,</math>

This means that the tail of the Yule-Simon distribution is a realization of Zipf's law: <math>f(k;\rho)</math> can be used to model, for example, the relative frequency of the <math>k</math>th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of <math>k</math>.

[edit] Occurrence

The Yule-Simon distribution arises as a continuous mixture of geometric distributions. Specifically, assume that <math>W</math> follows an exponential distribution with scale <math>1/\rho</math> or rate <math>\rho</math>:

<math>W \sim \mathrm{Exponential}(\rho)\,</math>
<math>h(w;\rho) = \rho \, \exp(-\rho\,w)\,</math>

Then a Yule-Simon distributed variable <math>K</math> has the following geometric distribution:

<math>K \sim \mathrm{Geometric}(\exp(-W))\,</math>

The pmf of a geometric distribution is

<math>g(k; p) = p \, (1-p)^{k-1}\,</math>

for <math>k\in\{1,2,\dots\}</math>. The Yule-Simon pmf is then the following exponential-geometric mixture distribution:

<math>f(k;\rho)
= \int_0^{\infty} \,\,\, g(k;\exp(-w))\,h(w;\rho)\,dw

\,</math>

[edit] Generalizations

Simon also hinted at a two-parameter generalization of the Yule-Simon distribution, in which the beta function is replaced by an incomplete beta function. The probability mass function of the generalized Yule-Simon(ρ, α) distribution is defined as

<math>
f(k;\rho,\alpha) = \frac{\rho}{1-\alpha^{\rho}} \;
       \mathrm{B}_{1-\alpha}(k, \rho+1)
,
\,</math>

with <math>0 \leq \alpha < 1</math>. For <math>\alpha = 0</math> the ordinary Yule-Simon(ρ) distribution is obtained as a special case.

Plot of the Yule-Simon(1) distribution (red) and its asymptotic Zipf law (blue)

[edit] References

  • Herbert A. Simon, On a Class of Skew Distribution Functions, Biometrika 42(3/4): 425–440, December 1955.
  • Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)
Image:Bvn-small.png Probability distributions

view  talk  edit</span>  ]

Univariate Multivariate
Discrete: BernoullibinomialBoltzmanncompound PoissondegenerateGauss-Kuzmingeometrichypergeometriclogarithmicnegative binomialparabolic fractalPoissonRademacherSkellamuniformYule-SimonzetaZipfZipf-Mandelbrot Ewensmultinomial
Continuous: BetaBeta primeCauchychi-squareDirac delta functionErlangexponentialexponential powerFfadingFisher's zFisher-TippettGammageneralized extreme valuegeneralized hyperbolicgeneralized inverse GaussianHalf-LogisticHotelling's T-squarehyperbolic secanthyper-exponentialhypoexponentialinverse chi-squareinverse gaussianinverse gammaKumaraswamyLandauLaplaceLévyLévy skew alpha-stablelogisticlog-normalMaxwell-BoltzmannMaxwell speednormal (Gaussian)ParetoPearsonpolarraised cosineRayleighrelativistic Breit-WignerRiceStudent's ttriangulartype-1 Gumbeltype-2 GumbeluniformVoigtvon MisesWeibullWigner semicircleWilks' lambda DirichletKentmatrix normalmultivariate normalvon Mises-FisherWigner quasiWishart
Miscellaneous: Cantorconditionalexponential familyinfinitely divisiblelocation-scale familymarginalmaximum entropy phase-typeposterior priorquasisamplingsingular
</center>
Personal tools