Zipf-Mandelbrot law
From Wikipedia, the free encyclopedia
| Probability mass function | |
| Cumulative distribution function | |
| Parameters | <math>N \in \{1,2,3\ldots\}</math> (integer) <math>q \in [0;\infty)</math> (real) <math>s>0\,</math> (real) |
|---|---|
| Support | <math>k \in \{0,1,2,\ldots,N\}</math> |
| Probability mass function (pmf) | <math>\frac{1/(k+q)^s}{H_{N,q,s |
| Cumulative distribution function (cdf) | {{{cdf}}} |
| Mean | {{{mean}}} |
| Median | {{{median}}} |
| Mode | {{{mode}}} |
| Variance | {{{variance}}} |
| Skewness | {{{skewness}}} |
| Excess Kurtosis | {{{kurtosis}}} |
| Entropy | {{{entropy}}} |
| mgf | {{{mgf}}} |
| Char. func. | {{{char}}} |
cdf =<math>\frac{H_{k,q,s}}{H_{N,q,s}}</math>|
mean =<math>\frac{H_{N,q,s-1}}{H_{N,q,s}}-q</math>|
median =N/A|
mode =<math>\frac{1/(1+q)^s}{H_{N,q,s}}</math>|
variance =|
skewness =|
kurtosis =|
entropy =|
mgf =|
char =|
}} In probability theory and statistics, the Zipf-Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the Harvard linguistics professor George Kingsley Zipf (1902-1950) who suggested a simpler distribution called Zipf's law, and the mathematician Benoît Mandelbrot (born November 20, 1924), who subsequently generalized it.
The probability mass function is given by:
- <math>f(k;N,q,s)=\frac{1/(k+q)^s}{H_{N,q,s}}</math>
where <math>H_{N,q,s}</math> is given by:
- <math>H_{N,q,s}=\sum_{i=1}^N \frac{1}{(i+q)^s}</math>
which may be thought of as a generalization of a harmonic number. In the limit as <math>N</math> approaches infinity, this becomes the Hurwitz zeta function <math>\zeta(q,s)</math>. For finite <math>N</math> and <math>q=0</math> the Zipf-Mandelbrot law becomes Zipf's law. For infinite <math>N</math> and <math>q=0</math> it becomes a Zeta distribution.
[edit] Applications
The distribution of words ranked by their frequency in a random corpus of text is generally a power-law distribution, known as Zipf's law.
If one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Gelbukh and Sidorov 2001).
[edit] External links
- Z. K. Silagadze: Citations and the Zipf-Mandelbrot's law
- NIST: Zipf's law
- W. Li's References on Zipf's law
- Gelbukh and Sidorov 2001: Zipf and Heaps Laws’ Coefficients Depend on Language

