Yule-Simon distribution
From Wikipedia, the free encyclopedia
| Probability mass function Image:Yule-Simon distribution PMF.png Yule-Simon PMF on a log-log scale. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) | |
| Cumulative distribution function Image:Yule-Simon distribution CMF.png Yule-Simon CMF. (Note that the function is only defined at integer values of k. The connecting lines do not indicate continuity.) | |
| Parameters | <math>\rho>0\,</math> shape (real) |
|---|---|
| Support | <math>k \in \{1,2,\dots\}\,</math> |
| Probability mass function (pmf) | <math>\rho\,\mathrm{B}(k, \rho+1)\,</math> |
| Cumulative distribution function (cdf) | <math>1 - k\,\mathrm{B}(k, \rho+1)\,</math> |
| Mean | <math>\frac{\rho}{\rho-1}\,</math> for <math>\rho>1\,</math> |
| Median | |
| Mode | <math>1\,</math> |
| Variance | <math>\frac{\rho^2}{(\rho-1)^2\;(\rho-2)}\,</math> for <math>\rho>2\,</math> |
| Skewness | <math>\frac{(\rho+1)^2\;\sqrt{\rho-2 |
| Excess Kurtosis | {{{kurtosis}}} |
| Entropy | {{{entropy}}} |
| mgf | {{{mgf}}} |
| Char. func. | {{{char}}} |
kurtosis =<math>\rho+3+\frac{11\rho^3-49\rho-22} {(\rho-4)\;(\rho-3)\;\rho}\,</math> for <math>\rho>4\,</math>|
entropy =|
mgf =<math>\frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^t)\,e^t \,</math>|
char =<math>\frac{\rho}{\rho+1}\;{}_2F_1(1,1; \rho+2; e^{i\,t})\,e^{i\,t} \,</math>|
}} In probability and statistics, the Yule-Simon distribution is a discrete probability distribution named after Udny Yule and Herbert Simon. Simon originally called it the Yule distribution.
The probability mass function of the Yule-Simon(ρ) distribution is
- <math>f(k;\rho) = \rho\,\mathrm{B}(k, \rho+1), \,</math>
for integer <math>k \geq 1</math> and real <math>\rho > 0</math>, where <math>\mathrm{B}</math> is the beta function. Equivalently the pmf can be written in terms of the falling factorial as
- <math>
f(k;\rho) = \frac{\rho\,\Gamma(\rho+1)}{(k+\rho)^{\underline{\rho+1}}}
,
\,</math>
where <math>\Gamma</math> is the gamma function. Thus, if <math>\rho</math> is an integer,
- <math>
f(k;\rho) = \frac{\rho\,\rho!\,(k-1)!}{(k+\rho)!}
.
\,</math>
The probability mass function f has the property that for sufficiently large k we have
- <math>
f(k;\rho)
\approx \frac{\rho\,\Gamma(\rho+1)}{k^{\rho+1}}
\propto \frac{1}{k^{\rho+1}}
.
\,</math>
This means that the tail of the Yule-Simon distribution is a realization of Zipf's law: <math>f(k;\rho)</math> can be used to model, for example, the relative frequency of the <math>k</math>th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of <math>k</math>.
[edit] Occurrence
The Yule-Simon distribution arises as a continuous mixture of geometric distributions. Specifically, assume that <math>W</math> follows an exponential distribution with scale <math>1/\rho</math> or rate <math>\rho</math>:
- <math>W \sim \mathrm{Exponential}(\rho)\,</math>
- <math>h(w;\rho) = \rho \, \exp(-\rho\,w)\,</math>
Then a Yule-Simon distributed variable <math>K</math> has the following geometric distribution:
- <math>K \sim \mathrm{Geometric}(\exp(-W))\,</math>
The pmf of a geometric distribution is
- <math>g(k; p) = p \, (1-p)^{k-1}\,</math>
for <math>k\in\{1,2,\dots\}</math>. The Yule-Simon pmf is then the following exponential-geometric mixture distribution:
- <math>f(k;\rho)
= \int_0^{\infty} \,\,\, g(k;\exp(-w))\,h(w;\rho)\,dw
\,</math>
[edit] Generalizations
Simon also hinted at a two-parameter generalization of the Yule-Simon distribution, in which the beta function is replaced by an incomplete beta function. The probability mass function of the generalized Yule-Simon(ρ, α) distribution is defined as
- <math>
f(k;\rho,\alpha) = \frac{\rho}{1-\alpha^{\rho}} \;
\mathrm{B}_{1-\alpha}(k, \rho+1)
,
\,</math>
with <math>0 \leq \alpha < 1</math>. For <math>\alpha = 0</math> the ordinary Yule-Simon(ρ) distribution is obtained as a special case.
[edit] References
- Herbert A. Simon, On a Class of Skew Distribution Functions, Biometrika 42(3/4): 425–440, December 1955.
- Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)


