Accelerated Generation of Spatiotemporal Atlas through Fast Template Matching

A spatiotemporal atlas refers to a standard image sequence that represents the general motion pattern of the targeted anatomy across a group of subjects. Recent years have witnessed an increasing interest in using spatiotemporal atlas for scientific research and clinical applications in image processing, data analysis and medical imaging. However, the generation of spatiotemporal atlas is often time-consuming and computationally expensive due to the nonlinear image registration procedures involved. This research targets at accelerating the generation of spatiotemporal atlas by formulating the atlas generation procedure as a multi-level modulation (M-ary) classification problem. In particular, we have implemented a fast template matching method based on singular value decomposition, and applied it to generate high quality spatiotemporal atlas with reasonable time and computational complexity. The performance has been systematically evaluated on public accessible data sets. The results and conclusions hold promise for further developing advanced algorithms for accelerating generation of spatiotemporal atlas.


Introduction
The understanding of muscle structure and muscular movements is the foundation for many scientific researches and clinical applications in image processing, medical imaging and human physiology. However, it is generally challenging to determine whether an observed anatomical structure, whether it be the brain, the tongue, the heart or the limb, is "normal" because there exists a great difference in the underlying anatomical structures for even a small group of subjects. In addition to the great inter-subject differences, significant variance also exists in the soft tissue anatomy on even a single subject. For instance, a recent imaging-based research has indicated that the size and shape of the human heart vary significantly at multiple cardiac phases of a heartbeat, among different heart beats and across various subjects [1]. Given these potential sources of anatomical differences during muscular motion, it has long been a dream for scientists to establish a standard sequence of images representing the "expected" muscle structure and motion pattern for a targeted group of subjects. If such a standard sequence of images, i.e., a spatiotemporal atlas, were to exist, then it serves as the ground truth for quantitatively interpreting the observed muscle movement and accurately characterizing the motion variability of a specific subject as versus to the general motion pattern [2].
Despite the usefulness of spatiotemporal atlas, its construction often involves procedures that are time-consuming and computationally expensive. In order to create a set of high quality spatiotemporal atlas, initial image formation procedures need to be first performed on each subject from the targeted group. Upon completion of this initial step, it is often essential to define a common space towards which accurate mappings of all subjects can be mapped into [3]. After this common space is successfully defined, group-wise registration techniques are often employed to extract the "expected" anatomical features and motion patterns [2]. Several methods have been proposed to facilitate this step, including group-wise registration with kernel regression [4], group-wise registration with individual growth model [5], template alignment techniques [6] and diffeomorphic image registration [2]. It is worth noting that each of the above mentioned steps involves multiple image formation and image processing procedures that are often nonlinear, memory inefficient and computationally expensive. Specifically in this paper, the author constructed a small training set of spatiotemporal atlas using a diffeomorphic image registration method with Lipschitz-norm-based temporal alignment routine [2].
Construction of high quality spatiotemporal atlas is even more challenging when the underlying image data are undersampled. The quality of the constructed spatiotemporal atlas depends heavily on the data quality from the image formation steps. Therefore, the quality of the resulting atlas images is very likely compromised if they are constructed upon initial images that are contaminated by geometric distortion, imaging artifacts and noise. This situation, unfortunately, is often seen with clinical applications that involve fast medical imaging, where sparse sampling is performed on the image formation step due to certain physical or physiological considerations [2]. Figure 1 illustrates representative atlas images constructed based on sparse-sampled Fourier space (k-space) data from a dynamic magnetic resonance imaging (MRI) experiment. The top and bottom portions of the initial images are cropped to enable better spatial alignment under subject motion. The lower jaw of the subject in time frame 1001 and 2076 and the nose tip of the subject in time frame 4118 suffer from significant geometric distortion. These geometric distortions mainly result from sparse sampling in the image formation step and significantly compromises the ensuing quantitative analysis of anatomical features on the resulting atlas image sequence. As can be seen, the subject's lower jaw and nose tip in the constructed atlas image suffer from significant geometric distortion, which renders the atlas less useful for quantitative analysis on the underlying anatomical features. To remedy the compromised image quality from sparse sampling, anatomical models need to be incorporated to compensate for the image artifacts and distortions from the image formation step [6].
This paper targets at the above mentioned technical challenges and aims to develop a practical method for generating high quality spatiotemporal atlas images within reasonable computation time and memory requirements. The authors formulate the atlas generation problem as multi-level modulation (M-ary) classification problem. Specifically, a small subset of spatiotemporal atlas images is first constructed as training set using a diffeomorphic registration routine as mentioned above. With this training set, atlas construction proceeds as choosing an atlas image that is "closest" to the one in the training set. A fast template matching method based on Fourier space samples has been chosen to perform this task. The principles underlying these two methods are given in the METHODS section and evaluations of their performance are given in the RESULTS section of this paper. Concluding remarks will also be given at the end of the paper with detailed discussion on the theoretical, numerical and algorithmic problems occur during the formation of the atlas.

Constructing the Training Set of Spatiotemporal Atlas
Diffeomorphic group-wise image registration lies at the core of constructing the training set of the spatiotemporal atlas. Specifically, the initial images for registration are first obtained from a numerical phantom constructed for evaluating dynamic MRI data sampling and reconstruction methods [7]. The group-wise image registration procedure is performed with the large deformation diffeomorphic metric mapping (LDDMM) algorithm [8]: let I init represent an initial image sequence obtained from an image formation step, I atlas represent a constructed atlas image sequence on an open and bounded image domain Ω, φ (x, t) denote a continuous differentiable function representing the diffeomorphic transformation from I init to I atlas parameterized over time t. It is also assumed that φ (x, t) can be computed by integrating over a time-dependent velocity field v defined as follows [8], where the function φ (x, t) also satisfies the following integration, Given Equation (1) and Equation (2), it has been indicated in [2] that the diffeomorphic image registration procedure can be formulated as a convex optimization problem that aims at minimizing a energy functional as follows where the first term in the energy functional is a data consistency term and the second term is a regularization term with λ as a regularization parameter. To better leverage the spatial correlation between I init to I atlas , [2] defines a pair of diffeomorphism over φ 1 and φ 2 and leads to an improved energy functional as follows, where Ψ (·) represents a similarity measure. By introducing diffeomorphism into the energy functional, the above formulation guarantees that the forward and inverse mappings between I init to I atlas are "symmetric". This optimization problem is convex and its global optimum can be obtained with a gradient-descentbased routine [2]. Implementation of this registration routine is based on the ANTs open source software library [9] and is revised from an existing implementation in [2].

Template Matching Functional MRI
Template matching often refers to a range of image processing methods that match a subset of a given image to the targeted template image. Although the underlying principles of template matching are not especially advanced, template matching has found great use and has proven effective in many applications such as object tracking, feature recognition and video matching. Multiple components directly interact with the performance of a template matching algorithm-prototype templates, probability distribution, similarity measure and feature space. Details of each of these three components are given in the following paragraphs.
1) Prototype Templates: Prototype templates are usually created in two ways: a) directly generate prototype templates from an existing physiological model. b) extract prototype templates from predetermined representative images [2] [3]. In this report, the author chooses the latter approach because having a set of predetermined template images (which are available from publicly accessible data bases) removes the complicated procedures of developing and validating the quality of physiological models [5]. Specifically in this paper, initial prototype templates are created from the training set of the spatiotemporal atlas with manual adjustment on image orientation and contrast using the image processing toolbox in Matlab 2014b (Mathworks, Natick, MA). Three representative prototype templates are shown in Figure 2. As can be seen, the prototype template images are free from geometric distortion and imaging artifacts.
2) Probability Distribution: The underlying probability distribution is critical towards accurately defining how well the targeted image "matches" with the template. The similarity measure is obviously important in the decision rule because its value directly influences the likelihood ratio and therefore the decision boundary (under Gaussian assumption as introduced in class). The risk associated with a certain decision rule (i.e., generating a incorrect frame of spatiotemporal atlas) can only be minimized on the condition that the underlying probability distribution is properly defined. Despite the importance of probability distribution, defining an appropriate distribution is not an easy task for the problem of generating a spatiotemporal atlas. This is because: 1) There exists a certain level of spatial and temporal misalignment between the targeted images and the prototype templates. 2) The targeted image data are often given as Fourier space (k-space) data.
3) The situation gets worse when the targeted image data are only partially available, or sparsely sampled. Aiming at these difficulties, it has been developed in [10] an estimation scheme for determining the probability distribution from multiple incomplete features in both image and feature domains. Details of the probability density function projection theorem and the extension to the atlas-generation problem in this report are given in the following paragraphs.
The probability density function projection theorem provides a setting where the algorithm can work with features in both domains the image domain I and the feature domain Z. For the proposed problem in this project, we choose the feature domain Z to be the undersampled Fourier domain, i.e., the sparsely sampled k-space data from the MRI machine. The Neyman-Fisher factorization theorem [11] states that if Z = ψ (I) is a sufficient statistics for hypothesis H, then the posterior probability p (I|H) can be factorized into the product of two terms as follows: Figure 2. Representative prototype template frames (template 3001, 3021 and 3041) constructed by performing manual adjustment of image orientation and contrast on the training set of spatiotemporal atlas.
Significance of the Neyman-Fisher factorization theorem lies in that it removes the dependence of p (I|H) and separates it into the product of a function g, whose dependence on H is obtained via ψ (I), and another function h. This allows the cross-talk between the raw data domain and the image data domain to be more manageable. In the case of having M prototype templates, it has been indicated [12] that the posterior probability can be defined as where a specific feature set z j is extracted for each hypothesis H j . It is noticed that the term) p (z j |H j ) lies at the denominator and, therefore, its accuracy may significantly impact the accuracy of the resulting p (I|H j ) even a slight error in p (z j |H 0 ) may cause p (I|H j ) to vary to a large amount. Considering this, it is suggested in [12] to choose an analytical probability density function as a reference hypothesis. A number of potential analytical solutions for a number of statistics have also been provided as references [12]. Also, according to the Neyman-Fisher factorization theorem [11], the reference hypothesis H 0 can be arbitrarily defined as long as z j represents a sufficient statistic for H 0 and H j In practice, however, it has been demonstrated [12] that careful choice over the reference hypothesis H 0 can significantly improve the accuracy of determining the probability distribution. [5] [12], these definitions are invariably based on either the vector norm or the matrix norm. Specifically, if we assume a vector x representing a patch in the incoming image data (an putting it into a column), The performance of various vector norms has been evaluated and the Frobenius norm (with p = 2 in the above expression) has been chosen due to its computational convenience. Also, if we choose to overload the p  notation with matrix norm, the lp norm of a mapping A from the image space to the feature space given by, In the context of this paper, p can take on various non-negative values. In particular, the case of p = 1 corresponds to the template matching approach by finding the mean absolute difference (MAD) and the case of p = 2 corresponds to the template matching approach by finding the mean squared errors (MSE). It Journal of Computer and Communications should be noted that there is no limitation on the form of the mapping A as long as its induced norm can be properly defined. 4) Feature Space: Defining an appropriate feature space is important for a practical template matching technique. As mentioned in the above paragraphs, the incoming image data for construction of the spatiotemporal atlas are provided in forms of k-space. The relation between the image space data and the k-space data are given by the Fourier transform relationship as follows, where d (k, t) represents the acquired data in k-space along time, Ω represents the spatial support of the spatiotem-poral image function I (x, t), k represents the coordinates in the Fourier space (where data are acquired from) and η (x, t) represents the measurement noise. In a more practical setting, sparse sampling is applied to acquire the k-space and the relation between the image space data and the acquired data is given by where U represents an undersampling operator that sparsely collects samples in the k-space along time. Figure 3 illustrates the relationship between a representative image in Ω, its associated image in the k-space and the sampling trajectory in the k-space. As can be seen, the image data in k-space are only partially acquired with a sparse spiral trajectory. Therefore, it is difficult to define an appropriate similarity measure based on the data acquired from this trajectory. As can be seen with Equation (10) and Figure 3, the provided data for the construction of spatiotemporal atlas are acquired in the k-space. Mathematically, the k-space data can be interpreted as the integration of all the "spatial frequency" components in the image space I. In this case, it is challenging to define and separate anatomical features from the acquired data because each sample contains information from all the features from I. Considering this, this project proposes to apply singular value decomposition (SVD) on the acquired k-space data to extract features. The author decides to use SVD to extract anatomical features from the acquired data because: 1) numerical procedures in calculating SVD are independent of the spatial coordinates of the acquired data d (k, t). Using SVD removes the difficulties of interpolating the non-Cartesian sparse Figure 3. The relationship between an image in Ω, its associated image in the k-space and the associated sampling trajectory in the k-space. As can be seen, the sampling trajectory is a sparse spiral trajectory that only collects sparse, non-Cartesian samples in the k-space. Journal of Computer and Communications sampled data onto a Cartesian grid; and 2) SVD has proven useful for extracting the general data features as a variation of the principal component analysis. Mathematically, this can be expressed as where d (k,t) represents the acquired data organized into a matrix, whose column space represents the spatial domain and row space represents the temporal domain, L represents the order of decomposition, d l (k) represents the spatial subspace of d (k, t), σ l represents the singular values corresponding to each index l and d l (t) represents the temporal subspace of d (k, t). Mathematically, Equation ′ , instead of ( ) l d k are chosen as the feature space because the template matching problem mainly attempts to match an incoming image towards the subject motion at a specific time point. However, the template matching problem can also be formulated as one that matches the spatial features towards the prototype template. Therefore, it would be also reasonable to define feature space over ( ) l d k . The evaluation and comparison between these two methods will be evaluated in the future.
In Figure 4, nonlinear image registration procedures that are computationally expensive (left); the proposed fast template matching methods in this report (middle). The absolute difference between these two images is shown on the right. It is obvious that the difference is mainly due to contrast difference between the image-registration-based image and the prototype image. This result demonstrates the effectiveness of the fast template matching. The schematic diagram of how the spatiotemporal atlas was created from numerical phantom and training set of images are shown in Figure 5.

Results
To demonstrate the effectiveness of the method introduced in this paper, a comparison between two spatiotemporal atlas images has been performed. As seen in Figure 4, the image on the left was constructed with expensive nonlinear image processing steps. The image in the middle, however, was constructed using the fast template matching method introduced in this paper. The image on the right illustrates the absolute difference between the previous two images. It is obvious that the image created with template matching inherits the anatomical features from the prototype templates and matches well with the image generated with computationally expensive image processing. In addition, the differences are mainly caused by the contrast difference, but do not contain significant structural misalignment or geometric distortion.
To demonstrate how well the spatiotemporal atlas generated from fast template matching represent the true movements of the subject, a temporal profile is given in Figure 6 across a vertical strip across the tongue tip. As can be seen, the first 600 frames of the spatiotemporal atlas are taken directly as the prototype templates, while the rest of the temporal frames of the spatiotemporal atlas are generated by the fast template matching algorithm. It is obvious that there is no significant temporal blurring or spatial distortion occurred in the generated frames, especially when compared with those in the prototype frames. In addition, the temporal dynamics of the subject are well represented as repetitions of the motion in the prototype templates. This result demonstrates that the template matching algorithm is capable of generating high quality spatiotemporal atlas.

Discussion
An important motivation of this project is to accelerate the generation of spatiotemporal atlas. To illustrate the speed up available from the proposed template matching algorithm, a comparative study on computation time was performed (versus generating the spatiotemporal atlas using expensive nonlinear image registration procedures) using the "tic" and "toc" commands in Matlab2014b (Mathworks, Natick, MA). In particular, comparison was performed on the time spent on generating a total of 100 frames of spatiotemporal atlas. The total computation time for the nonlinear image registration method was 10.53 hours on a personal computer with an Intel i5 CPU and 6 GB of RAM. As contrast, the template matching algorithm took 1.51 hours with an acceleration factor of 7. As Figure 6. The temporal profile of the generated spatiotemporal atlas. the number of frames in the spatiotemporal atlas increase, it is reasonable to expect an even larger number of acceleration factor. This comparative study shows the effectiveness to apply the template matching algorithm to reduce computation time.
An M-ary classification problem lies at the core of the generation of spatiotemporal atlas. There exist many algorithms to perform this task and many have proven useful. For instance, one potential algorithm to realize this goal is the artificial neural networks. However, the proposed template matching algorithm outperforms artificial neural networks in terms of computation complexity for obvious reasons. In addition, the template matching algorithm is much easier to implement-the author has attempted to implement the neural network methods based on a combination of Matlab 2014b (Mathworks, Natick, MA) and ArrayFire (a GPU jacket software for Matlab), but was forced to stop this attempt because of the "segmentation fault" problem during computation. Even though such problems can be solved by updating software versions and computation hardware, the template matching algorithm still stands out as a practical solution to generating spatiotemporal atlases in a reliable fashion.
A representative frame of the spatiotemporal atlas is shown on the left. A vertical strip across the tongue tip (yellow dashed line) is plotted along time to form the temporal profile. As can be seen, the first 600 frames of the spatiotemporal atlas are taken directly as the prototype templates, while the rest of the temporal frames of the spatiotemporal atlas are generated by the fast template matching algorithm. There is no significant temporal blurring or spatial distortion in the generated frames compared with those in the prototype frames.
It will be useful to compare the obtained result with an underlying ground truth. As mentioned in the METHODS section of the report, the probability distribution is defined in an empirical fashion because of the incoming data are sub-optimal-spatial distortion, temporal misalignment, contamination from noise and incomplete data from sparse sampling. Considering these practical difficulties, the author decides to examine the "correctness" of the generated spatiotemporal atlas by directly comparing the result with the traditional method-generating spatiotemporal atlas using computationally-expensive image registration methods. Based on the results in Figure 6, the proposed template matching algorithm is capable of generating spatiotemporal atlas free of spatial distortion and temporal misalignment. Therefore, it is reasonable to assume that the determined probability distribution is at least near optimal.

Conclusion
This paper focuses on applying the principles of statistical learning and pattern recognition to accelerate the generation of spatiotemporal atlas. Specifically, the paper investigates the fast template matching algorithm and applies it to generate high quality spatiotemporal atlas within reasonable time and computation complexity. Unlike existing methods that focus on the image domain in the generation of spatiotemporal atlas, the template matching algorithm introduced in this research allows prototype templates to be matched based on incomplete samples of the image from the Fourier space. In particular, the singular value decomposition is performed to extract features from the sparse-sampled data and template matching is performed in this feature space. Conceptual feasibility of the method has been validated. Practical effectiveness of the method has been evaluated on publicly accessible data sets. The results demonstrate that fast template matching algorithm is capable of generating high quality spatiotemporal atlas from sparse-sampled data in short computation time. This study provides a practical method of accelerating the generation of spatiotemporal atlas. This allows it to serve as the ground truth for quantitatively interpreting the observed muscle movement in medical imaging, as well as accurately characterizing the motion variability of a specific subject as versus to the general motion pattern in medical research and clinical applications.