Variance
From Wikipedia, the free encyclopedia
- This article is about mathematics. Other meaning: variance (land use).
In probability theory and statistics, the variance of a random variable (or equivalently, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. Where the expected value shows the location of the distribution, the variance indicates the scale of the values. A more understandable measure is the square root of the variance, called the standard deviation. As its name implies it gives in a standard form an indication of the possible deviations from the mean.
The variance of a real-valued random variable is its second central moment, and it also happens to be its second cumulant.
Contents |
[edit] Definition
If <math>\mu = \operatorname{E}(X)</math> is the expected value (mean) of the random variable X, then the variance is
- <math>\operatorname{var}(X) = \operatorname{E}( ( X - \mu ) ^ 2 ).</math>
That is, it is the expected value of the square of the deviation of X from its own mean. In plain language, it can be expressed as "The average of the square of the distance of each data point from the mean". It is thus the mean squared deviation. The variance of random variable X is typically designated as <math>\operatorname{var}(X)</math>, <math>\sigma_X^2</math>, or simply <math>\sigma^2</math>.
Note that the above definition can be used for both discrete and continuous random variables.
Many distributions, such as the Cauchy distribution, do not have a variance because the relevant integral diverges. In particular, if a distribution does not have an expected value, it does not have a variance either. The converse is not true: there are distributions for which the expected value exists, but the variance does not.
[edit] Properties
If the variance is defined, we can conclude that it is never negative because the squares are positive or zero. The unit of variance is the square of the unit of observation. For example, the variance of a set of heights measured in centimeters will be given in square centimeters. This fact is inconvenient and has motivated many statisticians to instead use the square root of the variance, known as the standard deviation, as a summary of dispersion.
It can be proven easily from the definition that the variance does not depend on the mean value <math>\mu</math>. That is, if the variable is "displaced" an amount b by taking X + b, the variance of the resulting random variable is left untouched. By contrast, if the variable is multiplied by a scaling factor a, the variance is multiplied by a2. More formally, if a and b are real constants and X is a random variable whose variance is defined, then
- <math>\operatorname{var}(aX+b)=a^2\operatorname{var}(X).</math>
Another formula for the variance that follows in a straightforward manner from the linearity of expected values and the above definition is:
- <math>\operatorname{var}(X)= \operatorname{E}(X^2 - 2\,X\,\operatorname{E}(X) + (\operatorname{E}(X))^2)</math>
- <math>=\operatorname{E}(X^2) - 2(\operatorname{E}(X))^2 + (\operatorname{E}(X))^2</math>
- <math>=\operatorname{E}(X^2) - (\operatorname{E}(X))^2</math>
This is often used to calculate the variance in practice.
One reason for the use of the variance in preference to other measures of dispersion is that the variance of the sum (or the difference) of independent random variables is the sum of their variances. A weaker condition than independence, called uncorrelatedness also suffices. In general,
- <math>\operatorname{var}(aX+bY) =a^2 \operatorname{var}(X) + b^2 \operatorname{var}(Y) + 2ab\, \operatorname{cov}(X, Y).</math>
Here <math>\operatorname{cov}</math> is the covariance, which is zero for independent random variables (if it exists).
[edit] Approximating the variance of a function
The Delta method uses second-order Taylor expansions to approximate the variance of a function of one or more random variables. For example, the approximate variance of a function of one variable is given by
- <math>\operatorname{var}\left[f(X)\right]\approx \left(f'(\operatorname{E}\left[X\right])\right)^2\operatorname{var}\left[X\right]</math>
provided that <math>f(\cdot)</math> is twice differentiable and that the mean and variance of <math>X</math> are finite.
[edit] Population variance and sample variance
In general, the population variance of a finite population of size N is given by
- <math>\sigma^2 = \frac 1N \sum_{i=1}^N
\left(x_i - \overline{x} \right)^ 2 \,</math>
or if the population is an abstract population with probability distribution Pr:
- <math>\sigma^2 = \sum_{i=1}^N
\left(x_i - \overline{x} \right)^ 2 \, \Pr(x_i),</math>
where <math>\overline{x}</math> is the population mean. This is merely a special case of the general definition of variance introduced above, but restricted to finite populations.
In many practical situations, the true variance of a population is not known a priori and must be computed somehow. When dealing with large finite populations, it is almost never possible to find the exact value of the population variance, due to time, cost, and other resource constraints. When dealing with infinite populations, this is generally impossible.
A common method is estimating the variance of large (finite or infinite) populations from a sample. We take a sample <math>(y_1,\dots,y_n)</math> of n values from the population, and estimate the variance on the basis of this sample. There are several good estimators. Two of them are well known:
- <math>s_n^2 = \frac 1n \sum_{i=1}^n \left(y_i - \overline{y} \right)^ 2 = \frac{1}{n} \sum_{i=1}^{n}y_i^2 - \overline{y}^2,</math>
and
- <math>s^2 = \frac{1}{n-1} \sum_{i=1}^n\left(y_i - \overline{y} \right)^ 2 = \frac{1}{n-1}\sum_{i=1}^n y_i^2 - \frac{n}{n-1} \overline{y}^2,</math>
Both are referred to as sample variance. Most advanced electronic calculators can calculate both <math>s_n^2</math> and <math>s^2</math>at the press of a button, in which case that button is usually labelled <math>\sigma^2</math> or <math>\sigma_n^2</math> for <math>s_n^2</math> and <math>\sigma_{n-1}^2</math> for <math>s^2</math>.
The two estimators only differ slightly as we see, and for larger values of the sample size n the difference is negligible. The second one is an unbiased estimator of the population variance, meaning that in a large number of repetitions its average value tends to the right value of the population variance. The first one may be seen as the variance of the sample considered as a population.
One common source of confusion is that the term sample variance may refer to either the unbiased estimator <math>s^2</math> of the population variance, or to the variance <math>\sigma^2</math> of the sample viewed as a finite population. Both can be used to estimate the true population variance. Apart from theoretical considerations, it doesn't really matter which one is used, as for small sample sizes both are inaccurate and for large values of n they are practically the same. Naively computing the variance by dividing by n instead of n-1 slightly underestimates the population variance.
In practice, for large <math>n</math>, the distinction is often a minor one. In the course of statistical measurements, sample sizes so small as to warrant the use of the unbiased variance virtually never occur. In this context Press et al.<ref>Press, W. H., Teukolsky, S. A., Vetterling, W. T. & Flannery, B. P. (1986) Numerical recipes: The art of scientific computing. Cambridge: Cambridge University Press. (online)</ref> commented that if the difference between n and n−1 ever matters to you, then you are probably up to no good anyway - e.g., trying to substantiate a questionable hypothesis with marginal data.
[edit] Distribution of the sample variance
Being a function of random variables, the sample variance is itself a random variable, and it is natural to study its distribution. In the case that <math>y_i</math> are independent Gaussian realizations, Cochran's theorem shows that <math>s^2</math> follows a scaled chi-square distribution:
- <math>
(n-1)\frac{s^2}{\sigma^2}\sim\chi^2_{n-1} </math>
As a direct consequence, it follows that <math> \operatorname{E}(s^2)=\sigma^2</math>.
However, even in the absence of the Gaussian assumption, it is still possible to prove that <math>s^2</math> is unbiased for <math>\sigma^2</math>:
[edit] An unbiased estimator
We will demonstrate why <math>s^2</math> is an unbiased estimator of the population variance. An estimator <math>\hat{\theta}</math> for a parameter <math>\theta</math> is unbiased if <math>\operatorname{E}( \hat{\theta}) = \theta</math>. Therefore, to prove that <math>s^2</math> is unbiased, we will show that <math>\operatorname{E}( s^2) = \sigma^2</math>. As an assumption, the population which the <math>x_i</math> are drawn from has mean <math>\mu</math> and variance <math>\sigma^2</math>.
- <math> \operatorname{E} ( s^2 )
= \operatorname{E} \left( \frac{1}{n-1} \sum_{i=1}^n \left( x_i - \overline{x} \right) ^ 2 \right)
</math>
- <math>
= \frac{1}{n-1} \sum_{i=1}^n \operatorname{E} \left( \left( x_i - \overline{x} \right) ^ 2 \right)
</math>
- <math>
= \frac{1}{n-1} \sum_{i=1}^n \operatorname{E} \left( \left( (x_i - \mu) - (\overline{x} - \mu) \right) ^ 2 \right)
</math>
- <math>
= \frac{1}{n-1} \sum_{i=1}^n \left\{ \operatorname{E} \left( (x_i - \mu)^2 \right)
- 2 \operatorname{E} \left( (x_i - \mu) (\overline{x} - \mu) \right)
+ \operatorname{E} \left( (\overline{x} - \mu) ^ 2 \right) \right\}
</math>
- <math>
= \frac{1}{n-1} \sum_{i=1}^n \left[ \sigma^2
- 2 \left( \frac{1}{n} \sum_{j=1}^n \operatorname{E} \left( (x_i - \mu) (x_j - \mu) \right) \right)
+ \frac{1}{n^2} \sum_{j=1}^n \sum_{k=1}^n \operatorname{E} \left( (x_j - \mu) (x_k - \mu) \right) \right]
</math>
- <math>
= \frac{1}{n-1} \sum_{i=1}^n \left( \sigma^2
- \frac{2 \sigma^2}{n}
+ \frac{\sigma^2}{n} \right)
</math>
- <math>
= \frac{1}{n-1} \sum_{i=1}^n \frac{(n-1)\sigma^2}{n} </math>
- <math>
= \frac{(n-1)\sigma^2}{n-1} = \sigma^2
</math>
See also algorithms for calculating variance.
[edit] Alternative proof
- <math>\operatorname{E}\left( \sum_{i=1}^n {(X_i-\overline{X})^2}\right)
=\operatorname{E}\left( \sum_{i=1}^n {X_i^2}\right) - n\operatorname{E}\left( \overline{X}^2 \right) </math>
- <math>
=n\operatorname{E}\left(X_i^2\right) - \frac{1}{n} \operatorname{E}\left(\left(\sum_{i=1}^n X_i\right)^2\right) </math>
- <math>
=n(\operatorname{var}\left(X_i\right) + (\operatorname{E}\left(X_i\right))^2) - \frac{1}{n} \operatorname{E}\left(\left(\sum_{i=1}^n X_i\right)^2\right) </math>
- <math>
=n\sigma^2 + \frac{1}{n}\left( n\operatorname{E}\left(X_i\right) \right)^2 - \frac{1}{n}\operatorname{E}\left(\left(\sum_{i=1}^n X_i\right)^2\right) </math>
- <math>
=n\sigma^2 - \frac{1}{n}\left[ \operatorname{E}\left(\left(\sum_{i=1}^n X_i\right)^2\right) - \left(\operatorname{E}\left(\sum_{i=1}^n X_i\right)\right)^2\right] </math>
- <math>
=n\sigma^2 - \frac{1}{n}\operatorname{var}\left(\sum_{i=1}^n X_i\right) =n\sigma^2 - \frac{1}{n}(n\sigma^2) =(n-1)\sigma^2. </math>
[edit] Generalizations
If <math>X</math> is a vector-valued random variable, with values in <math>\mathbb{R}^n</math>, and thought of as a column vector, then the natural generalization of variance is <math>\operatorname{E}((X - \mu)(X - \mu)^\operatorname{T})</math>, where <math>\mu = \operatorname{E}(X)</math> and <math>X^\operatorname{T}</math> is the transpose of <math>X</math>, and so is a row vector. This variance is a positive semi-definite square matrix, commonly referred to as the covariance matrix.
If <math>X</math> is a complex-valued random variable, with values in <math>\mathbb{C}</math>, then its variance is <math>\operatorname{E}((X - \mu)(X - \mu)^*)</math>, where <math>X^*</math> is the complex conjugate of <math>X</math>. This variance is a nonnegative real number.
[edit] History
The term variance was first introduced by Ronald Fisher in his 1918 paper The Correlation Between Relatives on the Supposition of Mendelian Inheritance.
[edit] Moment of inertia
The variance of a probability distribution is analogous to the moment of inertia in classical mechanics of a corresponding mass distribution along a line, with respect to rotation about its center of mass. It is because of this analogy that such things as the variance are called moments of probability distributions. (The covariance matrix is analogous to the moment of inertia tensor for 2- and 3-D mass distributions.)
[edit] See also
- an inequality on location and scale parameters
- expected value
- kurtosis
- law of total variance
- skewness
- semivariance
- standard deviation
- statistical dispersion
- true variance
- explained variance and unexplained variance
[edit] References
<references />
[edit] External links
- Fisher's original paper (pdf format)cs:Rozptyl (statistika)
da:Varians de:Varianz el:διακύμανση es:Varianza eo:Varianco fr:Variance (statistiques) gl:Varianza ko:분산 it:Varianza he:שונות lt:Dispersija nl:Variantie ja:分散 no:Varians pl:Wariancja pt:Variância ru:Дисперсия случайной величины su:Varian fi:Varianssi sv:Varians vi:Phương sai tr:Varyans zh:方差 uk:Дисперсія випадкової величини

