Francais | English | Espanõl

Zipf-Mandelbrot law

From Wikipedia, the free encyclopedia

Jump to: navigation, search
Zipf-Mandelbrot
Probability mass function
Cumulative distribution function
Parameters <math>N \in \{1,2,3\ldots\}</math> (integer)
<math>q \in [0;\infty)</math> (real)
<math>s>0\,</math> (real)
Support <math>k \in \{0,1,2,\ldots,N\}</math>
Probability mass function (pmf) <math>\frac{1/(k+q)^s}{H_{N,q,s
Cumulative distribution function (cdf) {{{cdf}}}
Mean {{{mean}}}
Median {{{median}}}
Mode {{{mode}}}
Variance {{{variance}}}
Skewness {{{skewness}}}
Excess Kurtosis {{{kurtosis}}}
Entropy {{{entropy}}}
mgf {{{mgf}}}
Char. func. {{{char}}}
</math>|
 cdf        =<math>\frac{H_{k,q,s}}{H_{N,q,s}}</math>|
 mean       =<math>\frac{H_{N,q,s-1}}{H_{N,q,s}}-q</math>|
 median     =N/A|
 mode       =<math>\frac{1/(1+q)^s}{H_{N,q,s}}</math>|
 variance   =|
 skewness   =|
 kurtosis   =|
 entropy    =|
 mgf        =|
 char       =|

}} In probability theory and statistics, the Zipf-Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the Harvard linguistics professor George Kingsley Zipf (1902-1950) who suggested a simpler distribution called Zipf's law, and the mathematician Benoît Mandelbrot (born November 20, 1924), who subsequently generalized it.

The probability mass function is given by:

<math>f(k;N,q,s)=\frac{1/(k+q)^s}{H_{N,q,s}}</math>

where <math>H_{N,q,s}</math> is given by:

<math>H_{N,q,s}=\sum_{i=1}^N \frac{1}{(i+q)^s}</math>

which may be thought of as a generalization of a harmonic number. In the limit as <math>N</math> approaches infinity, this becomes the Hurwitz zeta function <math>\zeta(q,s)</math>. For finite <math>N</math> and <math>q=0</math> the Zipf-Mandelbrot law becomes Zipf's law. For infinite <math>N</math> and <math>q=0</math> it becomes a Zeta distribution.

[edit] Applications

The distribution of words ranked by their frequency in a random corpus of text is generally a power-law distribution, known as Zipf's law.

If one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Gelbukh and Sidorov 2001).

[edit] External links

Image:Bvn-small.png Probability distributions

view  talk  edit</span>  ]

Univariate Multivariate
Discrete: BernoullibinomialBoltzmanncompound PoissondegenerateGauss-Kuzmingeometrichypergeometriclogarithmicnegative binomialparabolic fractalPoissonRademacherSkellamuniformYule-SimonzetaZipfZipf-Mandelbrot Ewensmultinomial
Continuous: BetaBeta primeCauchychi-squareDirac delta functionErlangexponentialexponential powerFfadingFisher's zFisher-TippettGammageneralized extreme valuegeneralized hyperbolicgeneralized inverse GaussianHalf-LogisticHotelling's T-squarehyperbolic secanthyper-exponentialhypoexponentialinverse chi-squareinverse gaussianinverse gammaKumaraswamyLandauLaplaceLévyLévy skew alpha-stablelogisticlog-normalMaxwell-BoltzmannMaxwell speednormal (Gaussian)ParetoPearsonpolarraised cosineRayleighrelativistic Breit-WignerRiceStudent's ttriangulartype-1 Gumbeltype-2 GumbeluniformVoigtvon MisesWeibullWigner semicircleWilks' lambda DirichletKentmatrix normalmultivariate normalvon Mises-FisherWigner quasiWishart
Miscellaneous: Cantorconditionalexponential familyinfinitely divisiblelocation-scale familymarginalmaximum entropy phase-typeposterior priorquasisamplingsingular
</center>
Personal tools