Normal-inverse-Wishart distribution

normal-inverse-Wishart
Notation	$(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu)$
Parameters	$\boldsymbol\mu_0\in\mathbb{R}^D\,$ location (vector of real) $\lambda > 0\,$ (real) $\boldsymbol\Psi \in\mathbb{R}^{D\times D}$ inverse scale matrix (pos. def.) $\nu > D-1\,$ (real)
Support	$\boldsymbol\mu\in\mathbb{R}^D ; \boldsymbol\Sigma \in\mathbb{R}^{D\times D}$ covariance matrix (pos. def.)
PDF	$f(\boldsymbol\mu,\boldsymbol\Sigma\|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}(\boldsymbol\mu\|\boldsymbol\mu_0,\tfrac{1}{\lambda}\boldsymbol\Sigma)\ \mathcal{W}^{-1}(\boldsymbol\Sigma\|\boldsymbol\Psi,\nu)$

In probability theory and statistics, the normal-inverse-Wishart distribution (or Gaussian-inverse-Wishart distribution) is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix (the inverse of the precision matrix).^[1]

Definition

Suppose

\boldsymbol\mu|\boldsymbol\mu_0,\lambda,\boldsymbol\Sigma \sim \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right)

has a multivariate normal distribution with mean $\boldsymbol\mu_0$ and covariance matrix $\tfrac{1}{\lambda}\boldsymbol\Sigma$ , where

\boldsymbol\Sigma|\boldsymbol\Psi,\nu \sim \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

has an inverse Wishart distribution. Then $(\boldsymbol\mu,\boldsymbol\Sigma)$ has a normal-inverse-Wishart distribution, denoted as

(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) .

Characterization

Probability density function

f(\boldsymbol\mu,\boldsymbol\Sigma|\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu) = \mathcal{N}\left(\boldsymbol\mu\Big|\boldsymbol\mu_0,\frac{1}{\lambda}\boldsymbol\Sigma\right) \mathcal{W}^{-1}(\boldsymbol\Sigma|\boldsymbol\Psi,\nu)

Properties

Marginal distributions

By construction, the marginal distribution over $\boldsymbol\Sigma$ is an inverse Wishart distribution, and the conditional distribution over $\boldsymbol\mu$ given $\boldsymbol\Sigma$ is a multivariate normal distribution. The marginal distribution over $\boldsymbol\mu$ is a multivariate t-distribution.

Posterior distribution of the parameters

Suppose the sampling density is a multivariate normal distribution

\boldsymbol{y_i}|\boldsymbol\mu,\boldsymbol\Sigma \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)

where $\boldsymbol{y}$ is an $n\times p$ matrix and $\boldsymbol{y_i}$ (of length $p$ ) is row $i$ of the matrix .

With the mean and covariance matrix of the sampling distribution is unknown, we can place a Normal-Inverse-Wishart prior on the mean and covariance parameters jointly

(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu).

The resulting posterior distribution for the mean and covariance matrix will also be a Nomal-Inverse-Wishart

(\boldsymbol\mu,\boldsymbol\Sigma|y) \sim \mathrm{NIW}(\boldsymbol\mu_n,\lambda_n,\boldsymbol\Psi_n,\nu_n),

where

\boldsymbol\mu_n = \frac{\lambda\boldsymbol\mu_0 + n \bar{\boldsymbol y}}{\lambda+n}

\lambda_n = \lambda + n

\nu_n = \nu + n

\boldsymbol\Psi_n^{-1} = \boldsymbol{\Psi^{-1} + S} +\frac{\lambda n}{\lambda+n} (\boldsymbol{\bar{y}-\mu_0})^T(\boldsymbol{\bar{y}-\mu_0}) ~~~\mathrm{ with, }~~\boldsymbol{S}= \sum_{i=1}^{n} (\boldsymbol{y_i-\bar{y}})^T(\boldsymbol{y_i-\bar{y}})

To sample from the joint posterior of $(\boldsymbol\mu,\boldsymbol\Sigma)$ , one simply draws samples from $\boldsymbol\Sigma|\boldsymbol y \sim \mathcal{W}^{-1}(\boldsymbol\Psi_n,\nu_n)$ , then draw $\boldsymbol\mu | \boldsymbol{\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu_n,\boldsymbol\Sigma/\nu_n)$ . To draw from the posterior predictive of a new observation, draw $\boldsymbol\tilde{y}|\boldsymbol{\mu,\Sigma,y} \sim \mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)$ , given the already drawn values of $\boldsymbol\mu$ and $\boldsymbol\Sigma$ .^[2]

Generating normal-inverse-Wishart random variates

Generation of random variates is straightforward:

Sample $\boldsymbol\Sigma$ from an inverse Wishart distribution with parameters $\boldsymbol\Psi$ and $\nu$
Sample $\boldsymbol\mu$ from a multivariate normal distribution with mean $\boldsymbol\mu_0$ and variance $\boldsymbol \tfrac{1}{\lambda} \boldsymbol\Sigma$

Related distributions

The normal-Wishart distribution is essentially the same distribution parameterized by precision rather than variance. If $(\boldsymbol\mu,\boldsymbol\Sigma) \sim \mathrm{NIW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi,\nu)$ then $(\boldsymbol\mu,\boldsymbol\Sigma^{-1}) \sim \mathrm{NW}(\boldsymbol\mu_0,\lambda,\boldsymbol\Psi^{-1},\nu)$ .
The normal-inverse-gamma distribution is the one-dimensional equivalent.
The multivariate normal distribution and inverse Wishart distribution are the component distributions out of which this distribution is made.

Notes

↑ Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution."
↑ Gelman, Andrew, et al. Bayesian data analysis. Vol. 2, p.73. Boca Raton, FL, USA: Chapman & Hall/CRC, 2014.

References

Bishop, Christopher M. (2006). Pattern Recognition and Machine Learning. Springer Science+Business Media.
Murphy, Kevin P. (2007). "Conjugate Bayesian analysis of the Gaussian distribution."

Probability distributions

List

Discrete univariate with finite support	Benford Bernoulli beta-binomial binomial categorical hypergeometric Poisson binomial Rademacher discrete uniform Zipf Zipf–Mandelbrot

Discrete univariate with infinite support	beta negative binomial Borel Conway–Maxwell–Poisson discrete phase-type Delaporte extended negative binomial Gauss–Kuzmin geometric logarithmic negative binomial parabolic fractal Poisson Skellam Yule–Simon zeta

Continuous univariate supported on a bounded interval	arcsine ARGUS Balding–Nichols Bates beta beta rectangular Irwin–Hall Kumaraswamy logit-normal noncentral beta raised cosine reciprocal triangular U-quadratic uniform Wigner semicircle

Continuous univariate supported on a semi-infinite interval	Benini Benktander 1st kind Benktander 2nd kind beta prime Burr chi-squared chi Dagum Davis exponential-logarithmic Erlang exponential F folded normal Flory–Schulz Fréchet gamma gamma/Gompertz generalized inverse Gaussian Gompertz half-logistic half-normal Hotelling's T-squared hyper-Erlang hyperexponential hypoexponential inverse chi-squared scaled inverse chi-squared inverse Gaussian inverse gamma Kolmogorov Lévy log-Cauchy log-Laplace log-logistic log-normal matrix-exponential Maxwell–Boltzmann Maxwell–Jüttner Mittag-Leffler Nakagami noncentral chi-squared Pareto phase-type poly-Weibull Rayleigh relativistic Breit–Wigner Rice shifted Gompertz truncated normal type-2 Gumbel Weibull Discrete Weibull Wilks's lambda

Continuous univariate supported on the whole real line	Cauchy exponential power Fisher's z Gaussian q generalized normal generalized hyperbolic geometric stable Gumbel Holtsmark hyperbolic secant Johnson's S_U Landau Laplace asymmetric Laplace logistic noncentral t normal (Gaussian) normal-inverse Gaussian skew normal slash stable Student's t type-1 Gumbel Tracy–Widom variance-gamma Voigt

Continuous univariate with support whose type varies	generalized extreme value generalized Pareto Tukey lambda q-Gaussian q-exponential q-Weibull shifted log-logistic

Mixed continuous-discrete univariate	rectified Gaussian

Multivariate (joint)	Discrete Ewens multinomial Dirichlet-multinomial negative multinomial Continuous Dirichlet generalized Dirichlet multivariate normal multivariate stable multivariate t normal-inverse-gamma normal-gamma Matrix-valued inverse matrix gamma inverse-Wishart matrix normal matrix t matrix gamma normal-inverse-Wishart normal-Wishart Wishart

Directional	Univariate (circular) directional Circular uniform univariate von Mises wrapped normal wrapped Cauchy wrapped exponential wrapped asymmetric Laplace wrapped Lévy Bivariate (spherical) Kent Bivariate (toroidal) bivariate von Mises Multivariate von Mises–Fisher Bingham

Degenerate and singular	Degenerate Dirac delta function Singular Cantor

Families	Circular compound Poisson elliptical exponential natural exponential location-scale maximum entropy mixture Pearson Tweedie wrapped

This article is issued from Wikipedia - version of the Tuesday, April 19, 2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.