Yule–Simon distribution
|
Probability mass function
| |
|
Cumulative distribution function
| |
| Parameters |
shape (real) |
|---|---|
| Support |
![]() |
| pmf |
![]() |
| CDF |
![]() |
| Mean |
for ![]() |
| Mode |
![]() |
| Variance |
for ![]() |
| Skewness |
for ![]() |
| Ex. kurtosis |
for ![]() |
| MGF |
![]() |
| CF |
![]() |
In probability and statistics, the Yule–Simon distribution is a discrete probability distribution named after Udny Yule and Herbert A. Simon. Simon originally called it the Yule distribution.[1]
The probability mass function (pmf) of the Yule–Simon (ρ) distribution is
,
for integer
and real
, where
is the beta function. Equivalently the pmf can be written in terms of the falling factorial as
,
where
is the gamma function. Thus, if
is an integer,
.
The parameter
can be estimated using a fixed point algorithm.[2]
The probability mass function f has the property that for sufficiently large k we have
.

This means that the tail of the Yule–Simon distribution is a realization of Zipf's law:
can be used to model, for example, the relative frequency of the
th most frequent word in a large collection of text, which according to Zipf's law is inversely proportional to a (typically small) power of
.
Occurrence
The Yule–Simon distribution arose originally as the limiting distribution of a particular stochastic process studied by Yule as a model for the distribution of biological taxa and subtaxa.[3] Simon dubbed this process the "Yule process" but it is more commonly known today as a preferential attachment process. The preferential attachment process is an urn process in which balls are added to a growing number of urns, each ball being allocated to an urn with probability linear in the number the urn already contains.
The distribution also arises as a compound distribution, in which the parameter of a geometric distribution is treated as a function of random variable having an exponential distribution. Specifically, assume that
follows an exponential distribution with scale
or rate
:
,
with density
.
Then a Yule–Simon distributed variable K has the following geometric distribution conditional on W:
The pmf of a geometric distribution is
for
. The Yule–Simon pmf is then the following exponential-geometric compound distribution:
.
The following recurrence relation holds:
Generalizations
The two-parameter generalization of the original Yule distribution replaces the beta function with an incomplete beta function. The probability mass function of the generalized Yule–Simon(ρ, α) distribution is defined as
with
. For
the ordinary Yule–Simon(ρ) distribution is obtained as a special case. The use of the incomplete beta function has the effect of introducing an exponential cutoff in the upper tail.
Bibliography
- Colin Rose and Murray D. Smith, Mathematical Statistics with Mathematica. New York: Springer, 2002, ISBN 0-387-95234-9. (See page 107, where it is called the "Yule distribution".)
References
- ↑ Simon, H. A. (1955). "On a class of skew distribution functions". Biometrika 42 (3–4): 425–440. doi:10.1093/biomet/42.3-4.425.
- ↑ Garcia Garcia, Juan Manuel (2011). "A fixed-point algorithm to estimate the Yule-Simon distribution parameter". Applied Mathematics and Computation 217 (21): 8560–8566. doi:10.1016/j.amc.2011.03.092.
- ↑ Yule, G. U. (1925). "A Mathematical Theory of Evolution, based on the Conclusions of Dr. J. C. Willis, F.R.S". Philosophical Transactions of the Royal Society B 213 (402–410): 21–87. doi:10.1098/rstb.1925.0002.


shape (

for 

for 
for 
for 




![\left\{\begin{array}{l}
k P(k)=(\alpha +k+1) P(k+1), \\[10pt]
P(1)=\alpha B(\alpha +1,1)
\end{array}\right\}](../I/m/6fd3d3705cace51ba66e7755500db5e1.png)
