^{1}

^{*}

^{2}

The purpose of this article is to formulate Bayesian updating from dynamical viewpoint. We prove that Bayesian updating for population mean vectors of multivariate normal distributions can be expressed as an affine symplectic transformation on a phase space with the canonical symplectic structure.

In Bayesian statistics, we consider parameters in models as random variables and its probability distributions, and we calculate the posterior distribution by using Bayes’ theorem.

In Hamiltonian dynamical system, any time evolution is defined by Hamiltonian equations and expressed by canonical transformations (or symplectic diffeomorphisms) on phase spaces. Phase spaces and equations of motion are abstract symplectic manifolds and Hamiltonian vector fields respectively. Under time evolution for Hamiltonian dynamical system, Hamiltonian functions and the phase volume are preserved. These are direct consequences of skew-symmetricity of symplectic structures. In the case where the dimension of a phase space is greater than or equal to 4, there are other conserved quantities that are called symplectic capacities. Symplectic capacities are far from trivial and are deep result in symplectic geometry. For detail, see [

In this paper we prove that Bayesian updating for multivariate normal population mean vector can be expressed by an affine symplectic diffeomorphism (affine canonical transformation). The main result is the following.

Theorem 1. There exists a linear symplectic diffeomorphism Φ 1 on ℝ 2 n such that the first component of the composition T ( μ n ,0 ) ∘ Φ 1 ∘ T ( − μ 0 ,0 ) maps a prior distribution to the posterior, where T ( a , b ) is the parallel translation T : ( x , y ) ↦ ( x + a , y + b ) on ℝ 2 n .

In this theorem to reformulate Bayesian updating from symplectic geometric viewpoint, we consider the cotangent space of which its base space contains population mean μ . The reason why we use the cotangent space is as follows. If we assume μ is a point in ℝ 2 n , to express Bayes’ theorem by an affine symplectic transformation on ℝ 2 n , in the case where variance is known, we have to find an element Φ in Sp ( 2 n , ℝ ) such that Φ T Λ 0 − 1 Φ = Λ 0 − 1 + n Σ − 1 . (Note that Φ , Σ , Λ are all 2 n × 2 n .) However usually in Bayesian updating, we fix a type of the posterior which is a section of density function, and then we normalize it:

p ( θ | y ) = l ( θ | y ) p ( θ ) p ( y ) ∝ l ( θ | y ) p ( θ ) . (1)

It is well know that any canonical transformations are volume-preserving. Hence we cannot expect the existence of desired transformation. Moreover, in this case the population mean μ is in ℝ 2 n , so we can only treat even dimensional case. The key to getting rid of this drawback is to use of Lagrangian submanifolds. Consider a symplectic manifold which has a Lagrangian submanifold containing μ , and construct desired canonical transformation on the total space. Canonical transformations on the total space may change a measure on Lagrangian submanifolds.

There is another approach to Bayesian inference from symplectic-contact geometric viewpoint due to Mori [

In this section we review Bayes’ theorem for multivariate normal distributions. For detail, see e.g., [

Consider a posterior distribution of mean vector for a multivariate normal distribution with covariant matrix. Fix a positive definite symmetric matrix Σ . First we treat the case of covariance matrix Σ is known. Let a prior distribution p ( μ ) of μ is distributed N ( μ 0 , Λ 0 ) :

p ( μ ) ∝ exp [ − 1 2 ( μ − μ 0 ) T Λ 0 − 1 ( μ − μ 0 ) ] . (2)

The posterier distribution with sample y is μ | y , Σ ~ N ( μ n , Λ n ) and

p ( μ | y , Σ ) ∝ exp [ − 1 2 ( μ − μ n ) T Λ n − 1 ( μ − μ n ) ] , (3)

where

μ n = ( Λ 0 − 1 + n Σ − 1 ) − 1 ( Λ 0 − 1 μ 0 + Σ − 1 y ¯ ) , Λ n − 1 = Λ 0 − 1 + n Σ − 1 . (4)

Next we consider the case of the variance Σ is unknown. If we denote a priori distribution of μ by μ | Σ ~ N ( μ 0 , Σ / k 0 ) , Σ ~ I W ( ν 0 , Λ 0 ) , then the posterior is μ | y , Σ ∼ N ( μ n , Σ / k n ) , Σ | y ~ I W ( ν n , Λ n ) , where

ν n = k 0 k 0 + n μ 0 + n k 0 + n y ¯ , k n = k 0 + n , ν n = ν 0 + n , Λ n = Λ n + S + k 0 n k 0 + n S 0 . (5)

In this section we review properties of the symplectic group and Hamiltonian flows.

Denote the set of all linear symplectic transformations on ℝ 2 n by

Sp ( 2 n , ℝ ) = { S ∈ GL ( 2 n , ℝ ) ; S T J S = J } (6)

and call the symplectic group, where J = [ 0 I n − I n 0 ] .

Let z be a vector in ℝ 2 n and ω 0 be the canonical symplectic structure on 2n dimensional vector space ℝ 2 n , then a necessary and sufficient condition for ω 0 ( z , z ′ ) = ω 0 ( S z , S z ′ ) is S T J S = J . For any S ∈ Sp ( 2 n , ℝ ) we have det S = 1 , where det S denotes the determinant of matrix S. We also have Sp ( 2 n , ℝ ) = SL ( 2, ℝ ) for n = 1 . In general Sp ( 2 n , ℝ ) is a connected Lie group of dimension n ( 2 n + 1 ) , and the Lie algebra is given by

s p ( n , ℝ ) = { M ∈ M ( 2 n , ℝ ) ; J M + M T J = 0 } . If we write S ∈ Sp ( 2 n , ℝ ) in terms of n × n block matrices by S = [ A B C D ] , then

A T C = C T A , B T D = D T B , A T D − C T B = I n . (7)

Hence the inverse matrix of S is given by S − 1 = [ D T − B T − C T A T ] . For details, see Abraham-Marsden [

If we consider time evolutions of ρ 0 ~ N ( 0, Σ ) by a Hamiltonian flow, the resulting function is distributed multivariate normal.

Lemma 1. If we evolve a density function such that ρ 0 ~ N ( 0, Σ ) by a linear Hamiltonian system with transition matrix Φ t ∈ Sp ( 2 n , ℝ ) , then we have ρ t ~ N ( 0, Σ ( t ) ) , where Σ ( t ) = Φ t Σ Φ t T .

For any Hamiltonian equation

{ q ˙ j = ∂ H ∂ p j p ˙ j = − ∂ H ∂ q j (8)

the transition matrix is given by Φ t = exp ( t [ 0 ∂ H ∂ p j − ∂ H ∂ q j 0 ] ) and satisfies Φ t ∈ Sp ( 2 n , ℝ ) for any t.

Lemma 1 shows that if we evolve a density function distributed a normal distribution by Hamiltonian equations, then the result is also distributed a normal distribution whose variance is obtained to original variance by multiplying the transition matrix from left and its transpose from right. The proof is straightforward as follows. By Φ − t = Φ t − 1 and det Φ t = 1 , we have

ρ t = ρ 0 ( Φ − t x ) = ( 1 2 π ) n ( det Σ ) − 1 / 2 exp [ − 1 2 x T Φ − t T Σ − 1 Φ − t x ] = ( 1 2 π ) n ( det ( Φ t Σ Φ t T ) ) − 1 / 2 exp [ − 1 2 x T ( Φ t Σ Φ t T ) − 1 x ] .

To prove the theorem, we explicitly construct affine symplectic diffeomorphisms.

First we consider known variance case. Let ω 0 be the canonical symplectic structure on ℝ 2 n = ℝ n × ℝ n . We consider μ is in the first factor of ℝ 2 n , and take the matrix [ Λ 0 − 1 0 0 I n ] which corresponds to apriori distribution. Let

Φ 1 = S T = [ I n − ( Λ 0 − 1 + n Σ − 1 ) − 1 ( n Σ − 1 ) 1 / 2 ( n Σ − 1 ) 1 / 2 I n − ( n Σ − 1 ) 1 / 2 ( Λ 0 − 1 + n Σ − 1 ) − 1 ( n Σ − 1 ) 1 / 2 ] (9)

then we have Φ 1 ∈ Sp ( 2 n , ℝ ) and

S [ Λ 0 − 1 0 0 I n ] S T = [ Λ n − 1 0 0 I n − ( n Σ − 1 ) 1 / 2 ( Λ 0 − 1 + n Σ − 1 ) − 1 ( n Σ − 1 ) 1 / 2 ] . (10)

Hence the Bayesian updating can be expressed as

T ( μ n ,0 ) ∘ Φ ˜ 1 ∘ T ( − μ 0 ,0 ) , (11)

where T ( a , b ) denotes the parallel translation T : ( x , y ) ↦ ( x + a , y + b ) on ℝ 2 n , and ∘ denotes the composition of maps.

In the case where unknown variance, we take a matrix [ k 0 Σ − 1 0 0 I n ] as apriori distribution and set

Φ 1 = S T = [ I n n 1 / 2 Σ − 1 / 2 − ( 2 k 0 ) − 1 n 1 / 2 Σ 1 / 2 ( 1 − n 2 k 0 ) Σ ] ∈ Sp ( 2 n , ℝ ) , (12)

then the desired transformation is given by

T ( μ n ,0 ) ∘ Φ ˜ 1 ∘ T ( − μ 0 ,0 ) . (13)

In this paper we show that Bayesian updating can be expressed by an affine symplectic diffeomorphism on ℝ 2 n whose base space contains a population mean vector. Bayesian updating is widely used in several areas, and recently it is usual to use computers to determine the posterior, implicitly. However our theorem expresses the posterior explicitly and concretely, and gives a dynamical interpretation of Bayesian updating.

The authors declare no conflicts of interest regarding the publication of this paper.

Noda, T. and Matsuyama, H. (2019) Bayesian Inference from Symplectic Geometric Viewpoint. Advances in Pure Mathematics, 9, 827-831. https://doi.org/10.4236/apm.2019.910039