Bayesian Inference from Symplectic Geometric Viewpoint

Abstract

The purpose of this article is to formulate Bayesian updating from dynamical viewpoint. We prove that Bayesian updating for population mean vectors of multivariate normal distributions can be expressed as an affine symplectic transformation on a phase space with the canonical symplectic structure.

Keywords

Share and Cite:

Noda, T. and Matsuyama, H. (2019) Bayesian Inference from Symplectic Geometric Viewpoint. Advances in Pure Mathematics, 9, 827-831. doi: 10.4236/apm.2019.910039.

1. Introduction

In Bayesian statistics, we consider parameters in models as random variables and its probability distributions, and we calculate the posterior distribution by using Bayes’ theorem.

In Hamiltonian dynamical system, any time evolution is defined by Hamiltonian equations and expressed by canonical transformations (or symplectic diffeomorphisms) on phase spaces. Phase spaces and equations of motion are abstract symplectic manifolds and Hamiltonian vector fields respectively. Under time evolution for Hamiltonian dynamical system, Hamiltonian functions and the phase volume are preserved. These are direct consequences of skew-symmetricity of symplectic structures. In the case where the dimension of a phase space is greater than or equal to 4, there are other conserved quantities that are called symplectic capacities. Symplectic capacities are far from trivial and are deep result in symplectic geometry. For detail, see     .

In this paper we prove that Bayesian updating for multivariate normal population mean vector can be expressed by an affine symplectic diffeomorphism (affine canonical transformation). The main result is the following.

Theorem 1. There exists a linear symplectic diffeomorphism ${\Phi }_{1}$ on ${ℝ}^{2n}$ such that the first component of the composition $T\left({\mu }_{n},0\right)\circ {\Phi }_{1}\circ T\left(-{\mu }_{0},0\right)$ maps a prior distribution to the posterior, where $T\left(a,b\right)$ is the parallel translation $T:\left(x,y\right)↦\left(x+a,y+b\right)$ on ${ℝ}^{2n}$ .

In this theorem to reformulate Bayesian updating from symplectic geometric viewpoint, we consider the cotangent space of which its base space contains population mean $\mu$ . The reason why we use the cotangent space is as follows. If we assume $\mu$ is a point in ${ℝ}^{2n}$ , to express Bayes’ theorem by an affine symplectic transformation on ${ℝ}^{2n}$ , in the case where variance is known, we have to find an element $\Phi$ in $\text{Sp}\left(2n,ℝ\right)$ such that ${\Phi }^{\text{T}}{\Lambda }_{0}^{-1}\Phi ={\Lambda }_{0}^{-1}+n{\Sigma }^{-1}$ . (Note that $\Phi ,\Sigma ,\Lambda$ are all $2n×2n$ .) However usually in Bayesian updating, we fix a type of the posterior which is a section of density function, and then we normalize it:

$p\left(\theta |y\right)=\frac{l\left(\theta |y\right)p\left(\theta \right)}{p\left(y\right)}\propto l\left(\theta |y\right)p\left(\theta \right).$ (1)

It is well know that any canonical transformations are volume-preserving. Hence we cannot expect the existence of desired transformation. Moreover, in this case the population mean $\mu$ is in ${ℝ}^{2n}$ , so we can only treat even dimensional case. The key to getting rid of this drawback is to use of Lagrangian submanifolds. Consider a symplectic manifold which has a Lagrangian submanifold containing $\mu$ , and construct desired canonical transformation on the total space. Canonical transformations on the total space may change a measure on Lagrangian submanifolds.

There is another approach to Bayesian inference from symplectic-contact geometric viewpoint due to Mori   . Mori considers the square of the parameter space of normal distributions and its Lagrangian submanifold to describe Bayes’ theorem by Hamiltonian follows, and he simultaneously gives Bayesian updating for mean and variance in univariate case. Taking account of Mori’s considerations, we should use the Poincaré type symplectic form to express a Bayesian updating for covariant matrices while we use the canonical symplectic structure on ${ℝ}^{2n}$ for mean vectors. For information geometry and a relation between information geometry and symplectic geomerty, see    .

2. Bayesian Updating

In this section we review Bayes’ theorem for multivariate normal distributions. For detail, see e.g.,  .

Consider a posterior distribution of mean vector for a multivariate normal distribution with covariant matrix. Fix a positive definite symmetric matrix $\Sigma$ . First we treat the case of covariance matrix $\Sigma$ is known. Let a prior distribution $p\left(\mu \right)$ of $\mu$ is distributed $N\left({\mu }_{0},{\Lambda }_{0}\right)$ :

$p\left(\mu \right)\propto \mathrm{exp}\left[-\frac{1}{2}{\left(\mu -{\mu }_{0}\right)}^{\text{T}}{\Lambda }_{0}^{-1}\left(\mu -{\mu }_{0}\right)\right].$ (2)

The posterier distribution with sample y is $\mu |y,\Sigma ~N\left({\mu }_{n},{\Lambda }_{n}\right)$ and

$p\left(\mu |y,\Sigma \right)\propto \mathrm{exp}\left[-\frac{1}{2}{\left(\mu -{\mu }_{n}\right)}^{\text{T}}{\Lambda }_{n}^{-1}\left(\mu -{\mu }_{n}\right)\right],$ (3)

where

${\mu }_{n}={\left({\Lambda }_{0}^{-1}+n{\Sigma }^{-1}\right)}^{-1}\left({\Lambda }_{0}^{-1}{\mu }_{0}+{\Sigma }^{-1}\stackrel{¯}{y}\right),\text{ }{\Lambda }_{n}^{-1}={\Lambda }_{0}^{-1}+n{\Sigma }^{-1}.$ (4)

Next we consider the case of the variance $\Sigma$ is unknown. If we denote a priori distribution of $\mu$ by $\mu |\Sigma ~N\left({\mu }_{0},\Sigma /{k}_{0}\right)$ , $\Sigma ~IW\left({\nu }_{0},{\Lambda }_{0}\right)$ , then the posterior is $\mu |y,\Sigma \sim N\left({\mu }_{n},\Sigma /{k}_{n}\right)$ , $\Sigma |y~IW\left({\nu }_{n},{\Lambda }_{n}\right)$ , where

${\nu }_{n}=\frac{{k}_{0}}{{k}_{0}+n}{\mu }_{0}+\frac{n}{{k}_{0}+n}\stackrel{¯}{y},\text{\hspace{0.17em}}\text{\hspace{0.17em}}{k}_{n}={k}_{0}+n,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\nu }_{n}={\nu }_{0}+n,\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\Lambda }_{n}={\Lambda }_{n}+S+\frac{{k}_{0}n}{{k}_{0}+n}{S}_{0}.$ (5)

3. Symplectic Group and Affine Canonical Transformation

In this section we review properties of the symplectic group and Hamiltonian flows.

Denote the set of all linear symplectic transformations on ${ℝ}^{2n}$ by

$\text{Sp}\left(2n,ℝ\right)=\left\{S\in \text{GL}\left(2n,ℝ\right);{S}^{\text{T}}JS=J\right\}$ (6)

and call the symplectic group, where $J=\left[\begin{array}{cc}0& {I}_{n}\\ -{I}_{n}& 0\end{array}\right]$ .

Let z be a vector in ${ℝ}^{2n}$ and ${\omega }_{0}$ be the canonical symplectic structure on 2n dimensional vector space ${ℝ}^{2n}$ , then a necessary and sufficient condition for ${\omega }_{0}\left(z,{z}^{\prime }\right)={\omega }_{0}\left(Sz,S{z}^{\prime }\right)$ is ${S}^{\text{T}}JS=J$ . For any $S\in \text{Sp}\left(2n,ℝ\right)$ we have $\mathrm{det}S=1$ , where $\mathrm{det}S$ denotes the determinant of matrix S. We also have $\text{Sp}\left(2n,ℝ\right)=\text{SL}\left(2,ℝ\right)$ for $n=1$ . In general $\text{Sp}\left(2n,ℝ\right)$ is a connected Lie group of dimension $n\left(2n+1\right)$ , and the Lie algebra is given by

$\mathfrak{s}\mathfrak{p}\left(n,ℝ\right)=\left\{M\in M\left(2n,ℝ\right);JM+{M}^{\text{T}}J=0\right\}$ . If we write $S\in \text{Sp}\left(2n,ℝ\right)$ in terms of $n×n$ block matrices by $S=\left[\begin{array}{cc}A& B\\ C& D\end{array}\right]$ , then

${A}^{\text{T}}C={C}^{\text{T}}A,\text{ }{B}^{\text{T}}D={D}^{\text{T}}B,\text{ }{A}^{\text{T}}D-{C}^{\text{T}}B={I}_{n}.$ (7)

Hence the inverse matrix of S is given by ${S}^{-1}=\left[\begin{array}{cc}{D}^{\text{T}}& -{B}^{\text{T}}\\ -{C}^{\text{T}}& {A}^{\text{T}}\end{array}\right]$ . For details, see Abraham-Marsden  and de Gosson  .

If we consider time evolutions of ${\rho }_{0}~N\left(0,\Sigma \right)$ by a Hamiltonian flow, the resulting function is distributed multivariate normal.

Lemma 1. If we evolve a density function such that ${\rho }_{0}~N\left(0,\Sigma \right)$ by a linear Hamiltonian system with transition matrix ${\Phi }_{t}\in \text{Sp}\left(2n,ℝ\right)$ , then we have ${\rho }_{t}~N\left(0,\Sigma \left(t\right)\right)$ , where $\Sigma \left(t\right)={\Phi }_{t}\Sigma {\Phi }_{t}^{\text{T}}$ .

For any Hamiltonian equation

$\left\{\begin{array}{l}{\stackrel{˙}{q}}_{j}=\frac{\partial H}{\partial {p}_{j}}\hfill \\ {\stackrel{˙}{p}}_{j}=-\frac{\partial H}{\partial {q}_{j}}\hfill \end{array}$ (8)

the transition matrix is given by ${\Phi }_{t}=\mathrm{exp}\left(t\left[\begin{array}{cc}0& \frac{\partial H}{\partial {p}_{j}}\\ -\frac{\partial H}{\partial {q}_{j}}& 0\end{array}\right]\right)$ and satisfies ${\Phi }_{t}\in \text{Sp}\left(2n,ℝ\right)$ for any t.

Lemma 1 shows that if we evolve a density function distributed a normal distribution by Hamiltonian equations, then the result is also distributed a normal distribution whose variance is obtained to original variance by multiplying the transition matrix from left and its transpose from right. The proof is straightforward as follows. By ${\Phi }_{-t}={\Phi }_{t}^{-1}$ and $\mathrm{det}{\Phi }_{t}=1$ , we have

$\begin{array}{c}{\rho }_{t}={\rho }_{0}\left({\Phi }_{-t}x\right)={\left(\frac{1}{2\text{π}}\right)}^{n}{\left(\mathrm{det}\Sigma \right)}^{-1/2}\mathrm{exp}\left[-\frac{1}{2}{x}^{\text{T}}{\Phi }_{-t}^{\text{T}}{\Sigma }^{-1}{\Phi }_{-t}x\right]\\ ={\left(\frac{1}{2\text{π}}\right)}^{n}{\left(\mathrm{det}\left({\Phi }_{t}\Sigma {\Phi }_{t}^{T}\right)\right)}^{-1/2}\mathrm{exp}\left[-\frac{1}{2}{x}^{\text{T}}{\left({\Phi }_{t}\Sigma {\Phi }_{t}^{\text{T}}\right)}^{-1}x\right].\end{array}$

4. Proof of the Theorem

To prove the theorem, we explicitly construct affine symplectic diffeomorphisms.

First we consider known variance case. Let ${\omega }_{0}$ be the canonical symplectic structure on ${ℝ}^{2n}={ℝ}^{n}×{ℝ}^{n}$ . We consider $\mu$ is in the first factor of ${ℝ}^{2n}$ , and take the matrix $\left[\begin{array}{cc}{\Lambda }_{0}^{-1}& 0\\ 0& {I}_{n}\end{array}\right]$ which corresponds to apriori distribution. Let

${\Phi }_{1}={S}^{\text{T}}=\left[\begin{array}{cc}{I}_{n}& -{\left({\Lambda }_{0}^{-1}+n{\Sigma }^{-1}\right)}^{-1}{\left(n{\Sigma }^{-1}\right)}^{1/2}\\ {\left(n{\Sigma }^{-1}\right)}^{1/2}& {I}_{n}-{\left(n{\Sigma }^{-1}\right)}^{1/2}{\left({\Lambda }_{0}^{-1}+n{\Sigma }^{-1}\right)}^{-1}{\left(n{\Sigma }^{-1}\right)}^{1/2}\end{array}\right]$ (9)

then we have ${\Phi }_{1}\in \text{Sp}\left(2n,ℝ\right)$ and

$S\left[\begin{array}{cc}{\Lambda }_{0}^{-1}& 0\\ 0& {I}_{n}\end{array}\right]{S}^{\text{T}}=\left[\begin{array}{cc}{\Lambda }_{n}^{-1}& 0\\ 0& {I}_{n}-{\left(n{\Sigma }^{-1}\right)}^{1/2}{\left({\Lambda }_{0}^{-1}+n{\Sigma }^{-1}\right)}^{-1}{\left(n{\Sigma }^{-1}\right)}^{1/2}\end{array}\right].$ (10)

Hence the Bayesian updating can be expressed as

$T\left({\mu }_{n},0\right)\circ {\stackrel{˜}{\Phi }}_{1}\circ T\left(-{\mu }_{0},0\right),$ (11)

where $T\left(a,b\right)$ denotes the parallel translation $T:\left(x,y\right)↦\left(x+a,y+b\right)$ on ${ℝ}^{2n}$ , and $\circ$ denotes the composition of maps.

In the case where unknown variance, we take a matrix $\left[\begin{array}{cc}{k}_{0}{\Sigma }^{-1}& 0\\ 0& {I}_{n}\end{array}\right]$ as apriori distribution and set

${\Phi }_{1}={S}^{\text{T}}=\left[\begin{array}{cc}{I}_{n}& {n}^{1/2}{\Sigma }^{-1/2}\\ -{\left(2{k}_{0}\right)}^{-1}{n}^{1/2}{\Sigma }^{1/2}& \left(1-\frac{n}{2{k}_{0}}\right)\Sigma \end{array}\right]\in \text{Sp}\left(2n,ℝ\right),$ (12)

then the desired transformation is given by

$T\left({\mu }_{n},0\right)\circ {\stackrel{˜}{\Phi }}_{1}\circ T\left(-{\mu }_{0},0\right).$ (13)

5. Conclusion

In this paper we show that Bayesian updating can be expressed by an affine symplectic diffeomorphism on ${ℝ}^{2n}$ whose base space contains a population mean vector. Bayesian updating is widely used in several areas, and recently it is usual to use computers to determine the posterior, implicitly. However our theorem expresses the posterior explicitly and concretely, and gives a dynamical interpretation of Bayesian updating.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

  Abraham, R. and Marsden, J.E. (1987) Foundations of Mechanics. Second Edition, Addison-Wesley Publishing Company, Inc., Boston.  De Gosson, M.A. (2006) Symplectic Geometry and Quantum Mechanics. Birkhäuser, Basel. https://doi.org/10.1007/3-7643-7575-2  Hofer, H. and Zehnder, E. (1994) Symplectic Invariants and Hamiltonian Dynamics. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-8540-9  Weinstein, A. (1981) Symplectic Geometry. Bulletin of the American Mathematical Society, 5, 1-13. https://doi.org/10.1090/S0273-0979-1981-14911-9  Mori, A. (2018) Information Geometry in a Global Setting. Hiroshima Mathematical Journal, 48, 291-305. https://doi.org/10.32917/hmj/1544238029  Mori, A. A Congruence Theorem for Alpha-Connections on the Space of T-Distributions and Its Application.  Amari, S. and Nagaoka, H. (2000) Methods of Information Geometry. Translations of Mathematical Monographs, Vol. 191, American Mathematical Society, Providence, Oxford University Press, Oxford, Translated from the 1993 Japanese Original by Daishi Harada.  Boumuki, N. and Noda, T. (2016) On Gradient and Hamiltonian Flows on Even Dimensional Dually Flat Spaces. Fundamental Journal of Mathematics and Mathematical Sciences, 6, 51-66.  Noda, T. (2011) Symplectic Structures on Statistical Manifolds. Journal of the Australian Mathematical Society, 90, 371-384. https://doi.org/10.1017/S1446788711001285  Lesaffre, E. and Lawson, A.B. (2012) Bayesian Biostatistics (Statistics in Practice). Wiley, Hoboken. https://doi.org/10.1002/9781119942412     customer@scirp.org +86 18163351462(WhatsApp) 1655362766  Paper Publishing WeChat 