^{1}

^{*}

^{2}

^{1}

A spatiotemporal atlas refers to a standard image sequence that represents the general motion pattern of the targeted anatomy across a group of subjects. Recent years have witnessed an increasing interest in using spatiotemporal atlas for scientific research and clinical applications in image processing, data analysis and medical imaging. However, the generation of spatiotemporal atlas is often time-consuming and computationally expensive due to the nonlinear image registration procedures involved. This research targets at accelerating the generation of spatiotemporal atlas by formulating the atlas generation procedure as a multi-level modulation (M-ary) classification problem. In particular, we have implemented a fast template matching method based on singular value decomposition, and applied it to generate high quality spatiotemporal atlas with reasonable time and computational complexity. The performance has been systematically evaluated on public accessible data sets. The results and conclusions hold promise for further developing advanced algorithms for accelerating generation of spatiotemporal atlas.

The understanding of muscle structure and muscular movements is the foundation for many scientific researches and clinical applications in image processing, medical imaging and human physiology. However, it is generally challenging to determine whether an observed anatomical structure, whether it be the brain, the tongue, the heart or the limb, is “normal” because there exists a great difference in the underlying anatomical structures for even a small group of subjects. In addition to the great inter-subject differences, significant variance also exists in the soft tissue anatomy on even a single subject. For instance, a recent imaging-based research has indicated that the size and shape of the human heart vary significantly at multiple cardiac phases of a heartbeat, among different heart beats and across various subjects [

Despite the usefulness of spatiotemporal atlas, its construction often involves procedures that are time-consuming and computationally expensive. In order to create a set of high quality spatiotemporal atlas, initial image formation procedures need to be first performed on each subject from the targeted group. Upon completion of this initial step, it is often essential to define a common space towards which accurate mappings of all subjects can be mapped into [

Construction of high quality spatiotemporal atlas is even more challenging when the underlying image data are undersampled. The quality of the constructed spatiotemporal atlas depends heavily on the data quality from the image formation steps. Therefore, the quality of the resulting atlas images is very likely compromised if they are constructed upon initial images that are contaminated by geometric distortion, imaging artifacts and noise. This situation, unfortunately, is often seen with clinical applications that involve fast medical imaging, where sparse sampling is performed on the image formation step due to certain physical or physiological considerations [

sampling in the image formation step and significantly compromises the ensuing quantitative analysis of anatomical features on the resulting atlas image sequence. As can be seen, the subject’s lower jaw and nose tip in the constructed atlas image suffer from significant geometric distortion, which renders the atlas less useful for quantitative analysis on the underlying anatomical features. To remedy the compromised image quality from sparse sampling, anatomical models need to be incorporated to compensate for the image artifacts and distortions from the image formation step [

This paper targets at the above mentioned technical challenges and aims to develop a practical method for generating high quality spatiotemporal atlas images within reasonable computation time and memory requirements. The authors formulate the atlas generation problem as multi-level modulation (M-ary) classification problem. Specifically, a small subset of spatiotemporal atlas images is first constructed as training set using a diffeomorphic registration routine as mentioned above. With this training set, atlas construction proceeds as choosing an atlas image that is “closest” to the one in the training set. A fast template matching method based on Fourier space samples has been chosen to perform this task. The principles underlying these two methods are given in the METHODS section and evaluations of their performance are given in the RESULTS section of this paper. Concluding remarks will also be given at the end of the paper with detailed discussion on the theoretical, numerical and algorithmic problems occur during the formation of the atlas.

Diffeomorphic group-wise image registration lies at the core of constructing the training set of the spatiotemporal atlas. Specifically, the initial images for registration are first obtained from a numerical phantom constructed for evaluating dynamic MRI data sampling and reconstruction methods [_{init} represent an initial image sequence obtained from an image formation step, I_{atlas} represent a constructed atlas image sequence on an open and bounded image domain Ω, φ (x, t) denote a continuous differentiable function representing the diffeomorphic transformation from I_{i}_{nit} to I_{atlas} parameterized over time t. It is also assumed that φ (x, t) can be computed by integrating over a time-dependent velocity field v defined as follows [

d φ ( x , t ) d t = v ( φ ( x , t ) , t ) (1)

where the function φ (x, t) also satisfies the following integration,

φ ( x , 1 ) = φ ( x , 0 ) + ∫ 0 1 v ( φ ( x , t ) ) d t . (2)

Given Equation (1) and Equation (2), it has been indicated in [

φ * = arg min ∫ Ω ‖ I init φ − 1 ( x , 1 ) − I atlas ‖ 2 2 d Ω + λ ∫ 0 1 ‖ v ( φ , t ) ‖ Ω 2 d t , (3)

where the first term in the energy functional is a data consistency term and the second term is a regularization term with λ as a regularization parameter. To better leverage the spatial correlation between I_{i}_{nit} to I_{atlas}, [_{1} and φ_{2} and leads to an improved energy functional as follows,

φ * = arg min ∫ Ω ( ‖ I init φ − 1 ( x , 1 ) − I atlas ‖ 2 2 + λ 1 Ψ ( I init φ 2 − 1 ( x , 0.5 ) ) ) d Ω + λ 2 ∫ 0 1 ‖ v ( φ , t ) ‖ Ω 2 d t , (4)

where Ψ (·) represents a similarity measure. By introducing diffeomorphism into the energy functional, the above formulation guarantees that the forward and inverse mappings between I_{init} to I_{atlas} are “symmetric”. This optimization problem is convex and its global optimum can be obtained with a gradient-descent-based routine [

Template matching often refers to a range of image processing methods that match a subset of a given image to the targeted template image. Although the underlying principles of template matching are not especially advanced, template matching has found great use and has proven effective in many applications such as object tracking, feature recognition and video matching. Multiple components directly interact with the performance of a template matching algorithm—prototype templates, probability distribution, similarity measure and feature space. Details of each of these three components are given in the following paragraphs.

1) Prototype Templates: Prototype templates are usually created in two ways: a) directly generate prototype templates from an existing physiological model. b) extract prototype templates from predetermined representative images [

2) Probability Distribution: The underlying probability distribution is critical towards accurately defining how well the targeted image “matches” with the template. The similarity measure is obviously important in the decision rule because its value directly influences the likelihood ratio and therefore the decision boundary (under Gaussian assumption as introduced in class). The risk associated with a certain decision rule (i.e., generating a incorrect frame of spatiotemporal atlas) can only be minimized on the condition that the underlying probability distribution is properly defined.

Despite the importance of probability distribution, defining an appropriate distribution is not an easy task for the problem of generating a spatiotemporal atlas. This is because: 1) There exists a certain level of spatial and temporal misalignment between the targeted images and the prototype templates. 2) The targeted image data are often given as Fourier space (k-space) data. 3) The situation gets worse when the targeted image data are only partially available, or sparsely sampled. Aiming at these difficulties, it has been developed in [

The probability density function projection theorem provides a setting where the algorithm can work with features in both domains the image domain I and the feature domain Z. For the proposed problem in this project, we choose the feature domain Z to be the undersampled Fourier domain, i.e., the sparsely sampled k-space data from the MRI machine. The Neyman-Fisher factorization theorem [

p ( I | H ) = g ( ψ ( I ) | H ) h ( I ) . (5)

Significance of the Neyman-Fisher factorization theorem lies in that it removes the dependence of p (I|H) and separates it into the product of a function g, whose dependence on H is obtained via ψ (I), and another function h. This allows the cross-talk between the raw data domain and the image data domain to be more manageable. In the case of having M prototype templates, it has been indicated [

p ( I | H j ) = g ( I | H 0 ) p ( z j | H j ) / p ( z j | H 0 ) (6)

where a specific feature set z_{j} is extracted for each hypothesis H_{j}. It is noticed that the term) p (z_{j}|H_{j}) lies at the denominator and, therefore, its accuracy may significantly impact the accuracy of the resulting p (I|H_{j}) even a slight error in p (z_{j}|H_{0}) may cause p (I|H_{j}) to vary to a large amount. Considering this, it is suggested in [_{0} can be arbitrarily defined as long as z_{j} represents a sufficient statistic for H_{0} and H_{j} In practice, however, it has been demonstrated [_{0} can significantly improve the accuracy of determining the probability distribution.

3) Similarity Measure: Similarity measure should be a real-valued function that quantifies the similarity between two objects. In the context of this project, the similarity measure should be a function of spatial location, temporal location, image perspective, image contrast and image orientations. Although various definitions of similarity measures have been defined in the literature [

‖ x ‖ p p = ∑ i | x i | p . (7)

The performance of various vector norms has been evaluated and the Frobenius norm (with p = 2 in the above expression) has been chosen due to its computational convenience. Also, if we choose to overload the ‖ · ‖ p notation with matrix norm, the lp norm of a mapping A from the image space to the feature space given by,

‖ A ‖ p = sup ‖ A x ‖ p ‖ x ‖ p . (8)

In the context of this paper, p can take on various non-negative values. In particular, the case of p = 1 corresponds to the template matching approach by finding the mean absolute difference (MAD) and the case of p = 2 corresponds to the template matching approach by finding the mean squared errors (MSE). It should be noted that there is no limitation on the form of the mapping A as long as its induced norm can be properly defined.

4) Feature Space: Defining an appropriate feature space is important for a practical template matching technique. As mentioned in the above paragraphs, the incoming image data for construction of the spatiotemporal atlas are provided in forms of k-space. The relation between the image space data and the k-space data are given by the Fourier transform relationship as follows,

d ( k , t ) = ∫ Ω I ( x , t ) e − i 2 π k x d x + η ( x , t ) , (9)

where d (k, t) represents the acquired data in k-space along time, Ω represents the spatial support of the spatiotem-poral image function I (x, t), k represents the coordinates in the Fourier space (where data are acquired from) and η (x, t) represents the measurement noise. In a more practical setting, sparse sampling is applied to acquire the k-space and the relation between the image space data and the acquired data is given by

d ( k , t ) = U ( ∫ Ω I ( x , t ) e − i 2 π k x d x + η ( x , t ) ) , (10)

where U represents an undersampling operator that sparsely collects samples in the k-space along time.

As can be seen with Equation (10) and

sampled data onto a Cartesian grid; and 2) SVD has proven useful for extracting the general data features as a variation of the principal component analysis. Mathematically, this can be expressed as

d ( k , t ) = ∑ l = 1 L d l ( k ) σ l d l ( t ) , (11)

where d (k,t) represents the acquired data organized into a matrix, whose column space represents the spatial domain and row space represents the temporal domain, L represents the order of decomposition, d_{l }(k) represents the spatial subspace of d (k, t), σ_{l} represents the singular values corresponding to each index l and d_{l }(t) represents the temporal subspace of d (k, t). Mathematically, Equation (11) can be written in a more condensed form,

d ( k , t ) = ∑ l = 1 L d l ( k ) d ′ l ( t ) (12)

where d ′ l ( t ) are chosen as the feature space in the context of this project. It should be noted that d ′ l ( t ) , instead of d l ( k ) are chosen as the feature space because the template matching problem mainly attempts to match an incoming image towards the subject motion at a speciﬁc time point. However, the template matching problem can also be formulated as one that matches the spatial features towards the prototype template. Therefore, it would be also reasonable to deﬁne feature space over d l ( k ) . The evaluation and comparison between these two methods will be evaluated in the future.

In

To demonstrate the effectiveness of the method introduced in this paper, a comparison between two spatiotemporal atlas images has been performed. As seen in

To demonstrate how well the spatiotemporal atlas generated from fast template matching represent the true movements of the subject, a temporal proﬁle is given in

An important motivation of this project is to accelerate the generation of spatiotemporal atlas. To illustrate the speed up available from the proposed template matching algorithm, a comparative study on computation time was performed (versus generating the spatiotemporal atlas using expensive nonlinear image registration procedures) using the “tic” and “toc” commands in Matlab2014b (Mathworks, Natick, MA). In particular, comparison was performed on the time spent on generating a total of 100 frames of spatiotemporal atlas. The total computation time for the nonlinear image registration method was 10.53 hours on a personal computer with an Intel i5 CPU and 6 GB of RAM. As contrast, the template matching algorithm took 1.51 hours with an acceleration factor of 7. As

the number of frames in the spatiotemporal atlas increase, it is reasonable to expect an even larger number of acceleration factor. This comparative study shows the effectiveness to apply the template matching algorithm to reduce computation time.

An M-ary classiﬁcation problem lies at the core of the generation of spatiotemporal atlas. There exist many algorithms to perform this task and many have proven useful. For instance, one potential algorithm to realize this goal is the artiﬁcial neural networks. However, the proposed template matching algorithm outperforms artiﬁcial neural networks in terms of computation complexity for obvious reasons. In addition, the template matching algorithm is much easier to implement—the author has attempted to implement the neural network methods based on a combination of Matlab 2014b (Mathworks, Natick, MA) and ArrayFire (a GPU jacket software for Matlab), but was forced to stop this attempt because of the “segmentation fault” problem during computation. Even though such problems can be solved by updating software versions and computation hardware, the template matching algorithm still stands out as a practical solution to generating spatiotemporal atlases in a reliable fashion.

A representative frame of the spatiotemporal atlas is shown on the left. A vertical strip across the tongue tip (yellow dashed line) is plotted along time to form the temporal proﬁle. As can be seen, the ﬁrst 600 frames of the spatiotemporal atlas are taken directly as the prototype templates, while the rest of the temporal frames of the spatiotemporal atlas are generated by the fast template matching algorithm. There is no signiﬁcant temporal blurring or spatial distortion in the generated frames compared with those in the prototype frames.

It will be useful to compare the obtained result with an underlying ground truth. As mentioned in the METHODS section of the report, the probability distribution is deﬁned in an empirical fashion because of the incoming data are sub-optimal—spatial distortion, temporal misalignment, contamination from noise and incomplete data from sparse sampling. Considering these practical difﬁculties, the author decides to examine the “correctness” of the generated spatiotemporal atlas by directly comparing the result with the traditional method—generating spatiotemporal atlas using computationally-expensive image registration methods. Based on the results in

This paper focuses on applying the principles of statistical learning and pattern recognition to accelerate the generation of spatiotemporal atlas. Speciﬁcally, the paper investigates the fast template matching algorithm and applies it to generate high quality spatiotemporal atlas within reasonable time and computation complexity. Unlike existing methods that focus on the image domain in the generation of spatiotemporal atlas, the template matching algorithm introduced in this research allows prototype templates to be matched based on incomplete samples of the image from the Fourier space. In particular, the singular value decomposition is performed to extract features from the sparse-sampled data and template matching is performed in this feature space. Conceptual feasibility of the method has been validated. Practical effectiveness of the method has been evaluated on publicly accessible data sets. The results demonstrate that fast template matching algorithm is capable of generating high quality spatiotemporal atlas from sparse-sampled data in short computation time. This study provides a practical method of accelerating the generation of spatiotemporal atlas. This allows it to serve as the ground truth for quantitatively interpreting the observed muscle movement in medical imaging, as well as accurately characterizing the motion variability of a specific subject as versus to the general motion pattern in medical research and clinical applications.

The authors declare no conflicts of interest regarding the publication of this paper.

Zhang, L.J., Zhao, L.F. and Liu, G. (2020) Accelerated Generation of Spatiotemporal Atlas through Fast Template Matching. Journal of Computer and Communications, 8, 16-27. https://doi.org/10.4236/jcc.2020.81002