Metropolis-Hastings algorithm
From Wikipedia, the free encyclopedia
In mathematics and physics, the Metropolis-Hastings algorithm is an algorithm used to generate a sequence of samples from a probability distribution that is difficult to directly sample from. This sequence can be used in Markov chain Monte Carlo to approximate the distribution (as with a histogram), or to compute an integral (such as an expected value). It is named after Nicholas Metropolis, who published it in 1953 for the specific case of the Boltzmann distribution, and W.K. Hastings,[1] who generalized it in 1970. The Gibbs sampling algorithm is a special case of the Metropolis-Hastings algorithm.
The Metropolis-Hastings algorithm can draw samples from any probability distribution <math>p(x)</math>, requiring only that the density can be calculated at <math>x</math>. The algorithm generates a Markov chain in which each state <math>x^t</math> depends only on the previous state <math>x^{t-1}</math>. The algorithm uses a proposal density <math>Q(x'; x^t)</math>, which depends on the current state <math>x^t</math>, to generate a new proposed sample <math>x'</math>. This proposal is 'accepted' as the next value (<math>x^{t+1}</math>=<math>x'</math>) if <math>u</math> drawn from <math>U(0,1)</math> is
- <math>
u < \frac{P(x')Q(x^t|x')}{P(x^t)Q(x'|x^t)} </math>
For example, the proposal density could be a Gaussian function centred on the current state <math>x^t</math>
- <math>
Q( x'; x^t ) \sim N( x^t, \sigma^2 I) </math>
reading <math>Q(x'; x^t)</math> as the probability density function for <math>x'</math> given the previous value <math>x^t</math>. This proposal density would generate samples centred around the current state with variance <math>\sigma^2 I</math>. The original Metropolis algorithm calls for the proposal density to be symmetric ( <math>Q(x; y) = Q(y; x)</math> ); generalization by Hastings lifts this restriction. It is allowed for <math>Q(x', x^t)</math> not to depend on <math>x'</math> at all, in which case the algorithm is called "Independence Chain Metropolis-Hastings" ( as opposed to "Random Walk Metropolis-Hastings" ). Independence chain M-H algorithm with suitable proposal density function can offer higher accuracy than random walk version, but it requires some a priori knowledge of the distribution.
Now, we draw a new proposal state <math>x'</math> with probability <math>Q(x'; x^t)</math> and then calculate a value
- <math>
a = a_1 a_2\, </math>
where
- <math>
a_1 = \frac{P(x')}{P(x^t)} </math>
is the likelihood ratio between the proposed sample <math>x'</math> and the previous sample <math>x^t</math>, and
- <math>
a_2 = \frac{Q( x^t; x' )}{Q(x';x^t)} </math>
is the ratio of the proposal density in two directions (from <math>x^t</math> to <math>x'</math> and vice versa). This is equal to 1 if the proposal density is symmetric. Then the new state <math>x^{t+1}</math> is chosen with the rule
- <math>
x^{t+1} = \left \{
\begin{matrix}
x' & \mbox{if }a > 1 \\
x'\mbox{ with probability }a, & \mbox{if }a < 1
\end{matrix}
\right. </math>
The Markov chain is started from a random initial value <math>x^0</math> and the algorithm is run for many iterations until this initial state is "forgotten". These samples, which are discarded, are known as burn-in. The algorithm works best if the proposal density matches the shape of the target distribution <math>p(x)</math>, that is <math>Q(x'; x^t) \approx p(x')</math>, but in most cases this is unknown. If a Gaussian proposal is used the variance parameter <math>\sigma^2</math> has to be tuned during the burn-in period. This is usually done by calculating the acceptance rate, which is the fraction of proposed samples that is accepted in a window of the last <math>N</math> samples. This is usually set to be around 60%. If the proposal steps are too small the chain will mix slowly (i.e., it will move around the space slowly and converge slowly to <math>p(x)</math>). If the proposal steps are too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density so <math>a_1</math> will be very small.
[edit] See also
[edit] References
- Bernd A. Berg. "Markov Chain Monte Carlo Simulations and Their Statistical Analysis". Singapore, World Scientific 2004.
- Chib, Siddhartha and Edward Greenberg: "Understanding the Metropolis–Hastings Algorithm". American Statistician, 49(4), 327–335, 1995
- W.K. Hastings. "Monte Carlo Sampling Methods Using Markov Chains and Their Applications", Biometrika, 57(1):97-109, 1970.
- N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller. "Equations of State Calculations by Fast Computing Machines". Journal of Chemical Physics, 21(6):1087-1092, 1953. [2]


