Computer Tomography and Ultrasonography Image Registration Based on the Cooperation of GPU and CPU

Image registration is wildly used in the biomedical image, but there are too many textures and noises in the biomedical image to get a precise image registration. In order to get the excellent registration performance, it needs more complex image processing, and it will spend expensive computation cost. For the real time issue, this paper proposes edge gradient direction image registration applied to Computer Tomography (CT) image and Ultrasonography (US) image based on the cooperation of Graphic Processor Unit (GPU) and Central Processor Unit (CPU). GPU can significantly reduce the computation time. First, the CT image slice is extracted from the CT volume by the region growing and the interpolation algorithm. Secondly, the image pre-processing is employed to reduce the image noises and enhance the image features. There are two kinds of the image pre-processing algorithms invoked in this paper: 1) median filtering and 2) anisotropic diffusion. Last but not least, the image edge gradient information is obtained by Canny operator, and the similarity measurement based on gradient direction is employed to evaluate the similarity between the CT and the US images. The experimental results show that the proposed architecture can distinctively improve the efficiency and are more suitably applied to the real world.


Introduction
Image registration is wildly used in the biomedical image.The image registration between computed tomography (CT) and ultrasonography (US) has benefits for a variety of clinical application.Especially, the key challenge in surgery of abdominal organ is the registration of pre-operative planning and intra-operative navigation, because the CT and US images of abdominal organ images are obtained from the same patient but at different times and with different imaging modality.Due to the speckle noise and distortion in US images, accurate registration of CT and US is still a challenging problem.
There are two categories of image registration: 1) area-based and 2) feature-based.In the area-based method, the Normalized Cross Correlation (NCC) is the most popular one.NCC is used to evaluate the similarity between the template image and the scene image.Besides, NCC often calculates the similarity pixel by pixel to deal with the image registration problems, which will increase the computation cost.On the contrary, the fea-ture-based method is proposed to reduce the computation cost.Kown et al [1] proposed a robust hierarchical distance to compare edge maps in a multi-level pyramid structure.Steger [2] presented the edge gradient direction as similarity measure to partially recognize occluded objects which is very robust against illumination frustration.The similarity measure in this literature uses the inner product of normalized edge gradients to obtain the similarity value.
For the real time issue, the data size of the CT and US images are abundant, so the area-based and feature-based methods may work well in biomedical image registration applications.However, the computation request is not efficient in biomedical image registration problems.Furthermore, CPU is not suitable to process large amount of data.The parallel processor architecture is proposed to call GPU.It is proposed to deal with the huge amount data.The performance of GPU is more efficient to dispose in biomedical image registration application.Thus, the cooperation of GPU and CPU architecture can increase the efficiency.In this paper, CT and US image registration based on the hybrid architecture of GPU and CPU is proposed.
The signal to noise ratio is different because imaging principle of CT and US technique are different.In this study, the image pre-processing for US image includes median filter and anisotropic diffusion algorithm are used to reduce the noises and enhance the image quality.
The contributions of this paper include: 1) using the hybrid architecture of GPU and CPU to save the computation time, 2) applying the anisotropic diffusion algorithm to reduce the noise and keep the feature of image edge, 3) defining two kinds of the indices, Fiducial registration error (FRE) and target registration error(TRE) to verify the accuracy of image registration.
The rest of the paper is organized as follows: Section 2 provides architecture of proposed image registration.Section 3~6 shows the detail of each procedure in the proposed algorithm.Section 7 discusses the experimental results of the proposed algorithm.Section 8 discusses analysis of registration error.Finally, conclusions are presented in Section 9.

Architecture of the Proposed Image Registration Algorithm
The architecture of the proposed algorithm is shown in Figure 1.There are two phases in this architecture: 1) training phase and 2) matching phase.In the training phase, we used CPU to deal with region growing, slice and canny algorithm.In this study, CT data is three-dimensional information and the specified organ, liver organ, from a set of CT images which had been extracted by using the region growing method.Through extraction on the specified organ image, there is a virtual slice that we need.Consequently, the Canny edge detection is used to extract the edge gradient.Those gradients information will be fed into the matching phase, and gradients information which had been extracted from CT image will be a template in the matching phase.
In the matching phase, we used GPU to deal with median filter, anisotropic diffusion and registration algorithm.The two preprocessing operators, median filter and anisotropic diffusion are used to enhance the feature and reduce the noise on the US image.Sobel edge detection is employed to get the gradients US image.Finally, the similarity is calculated by the gradients of template and search image.

Median Filter
Median filter is often used to reduce the impulse noise on an image.Since there is the speckle noise in US images, it can be mostly improved through the median filter.

Anisotropic Diffusion
The anisotropic diffusion algorithm [3] is adapted to enhance the images and remove noise.Anisotropic diffusion process image based on a linear diffusion from principle of heat conduction is given as where div is the divergence operator, and Δ are respectively the gradient and Laplacian operators.


The linear diffusion can be discretely implemented using eight-neighbors, the formula is shown in

I x y t I x y t C I x y t I x y t
where represent the gradients of eight neighbors, C is diffusion coefficient, the diffusion coefficient function is represented as The parameter k is a constant; it acts as an edge strength threshold.If the k value is too large, the diffusion process will over-smooth and result in a blurred image.On the contrary, if the k value is too small, the diffusion process will stop the smoothing in early iterations and will yield a restored image similar to the original image.

Region Growing Algorithm
This paper adopts the study of Chen et al. [4], who proposes a segmentation method based on the improved region growing method combined with centroid detection and intensity distribution analysis to dismember volume shape of liver from CT volume images.Through the region growing method, the volume region growing method is mainly to combine with sets of continuous CT images to construct its volume character.It is assumed that CT data has m slices.The seed point location, , is selected and set in the first slice by the doctor.In general, this seed point will be set in characteristic of the region, and will start growing from the seed point.When the intensity of a seed point and its neighbor points are similar, those neighbor ones are added into the region.Subsequently, the center of gravity in the previous slice is inherited to be the seed point in the next slice.Through this way, continuous images are able to establish CT volume.The center of gravity of the growing region in nth slice can be calculated as where represents the coordinates of region growing result in n slice.N represents the area of region range.The result of volume region growing is shown in ( , Rx Ry )

Extraction of Any Angle Slice
Hu [5] proposed a slice algorithm, using the normal vector and the interior point to form the virtual clipping plane.As shown in Figure 3, the plane can be represented by the point n  0 0 0 ( , , ) P x y z and a unit normal vector as In 3D space, the arbitrary plane of unit normal vector can be determined by the direction angle . The direction cosine of vector OP can be calculated as

Image Registration Algorithm
The gradient based similarity measure is used to evaluate the similarity between the CT image and the US image.In the matching process, the template model should be compared to the search image at all locations using a similarity measure.The idea behind similarity measure is to take the sum of all normalized dot products of gradient direction of the template image and search the image over all points in the model data set.Extracting the edges of specified organ in CT image, the gradient of the selected edges along with the coordinate information are stored as the template model which contains a set of points ( , ) . These points are relative to center of gravity and the edge points of the object and corresponding gradients , ( , ) . The points and the corresponding gradients are respectively denoted as and , , , , in candidate objects.If there is no rotation angle between template and search image, the similarity measure of gradient based is defined as follows ( , ) where n is amount of the edge point.In contrast, if there is a rotation angle between template and search image, the similarity measure will be revised as , , 1 ( , , ) where  is the rotation angle; the is the gradients after rotation; the points ( , are the rotated location of points in search image.
If it perfectly matched between the template model and the search image, this correction coefficient will be 1.

CUDA Introduction
CUDA (Compute Unified Device Architecture) is a novel technology of general-purpose computing on the GPU, which are SIMD (Single Instruction, Multiple Data) device that is inherently data-parallel.We only take the float-point operation as an example, and GPU's computation speed is several times faster than CPU's.
For the Programming Framework, CUDA also provides the magnitude of performance and simpler software development by using the standard C language [6].

Features of CUDA GPU
CUDA GPU has the following advantages: (1) General programming environment: CUDA uses C programming tools and C compiler, which make the program have better compatibility.
(2) Higher bandwidth: We take GeForce GTX470 as the developing platform in this paper, the bandwidth is 96.4GB/s between GPU and device memory, and 2GB/s between host and device memory.
For hardware, CUDA device contains many stream multi-processors (SM), and each SM also has several stream processors.Each SM can operate four types of memories: constant memory, texture memory and global memory which can communicate with host memory except on-chip shared memory.These cores use on-chip shared memories and cache to accelerate memory access, as shown in Figure 4.
For software, in CUDA program any call to a GPU function from a CPU function must include an execution configuration [7], which determines the block count, thread count in a thread-block, and the amount of shared memory.Each block can only run on one multi-processor, but each multi-processor can allocate multiple blocks.In addition, those blocks can concurrently run on a multi-processor depending on the occupancy of the registers and shared memory of each SM.

Implementation of Parallel Computing
As shown in Figure 1, the median filter, anisotropic diffusion and registration algorithm will be implemented with parallel processing technique.

Median Filter
The US image as input image is divided into subimages and assigned to SM.Each SM stores assigned blocks, consisting a lot of threads, to do the parallel computation of convolution of median filter in global memory.After that, the result of image is kept in global memory.

Anisotropic Diffusion
Using the image of median filter in section 6.3.1, we make thread and pixel to be in one-to-one correspondence.At first, each thread calculates gradients of eight neighbors I  for any located pixel, and then calculates the diffusion coefficient C. Subsequently, each thread is going to operate the equation ( 2

Registration
The edge data of template and search image divided into subimages are fed as input data and input image in this step.We assign each subimage to SM, and the each pixel is processed by the corresponding thread.Then these threads play the role of center of gravity to operate the value of similarity measure in parallel.Subsequently, we use the compact operation to eliminate the low similarity value, and then thread checks whether the value of similarity measure is maximum or not.The maximal value is the target location in the registration process.The operation of registration is shown in Figure 6.

Experimental Results
Experiments were carried out with two kinds of test images for the performance verification.In this study, two cases, shown in Figures 7(a)-(d), are used to verify the registration performance.The results of registration are shown in Figures 8(a)-(b).Here, the performance of efficiency is listed in Table 1.According to the experimental results, the object is successful recognized in the US image, and computation cost is dramatically reduced using GPU.The computation advantage is 23.36 times.

Analysis of Registration Error
Fiducial registration error (FRE) and target registration error (TRE) [8][9][10] are adapted to evaluate the accuracy of registration results.
1) FRE: distance of corresponding fiducial points after registration.
2) TRE: distance of corresponding target points after registration.
Therefore, the location of fiducial points and target points are indicated by doctors.Doctors laid position of vessel, bifurcations, and the lesions to be the target point.Besides, Doctors put other locations to be fiducial points, mostly the abdominal aorta and inferior vena cava.Then the fiducial points and target points are used to calculate Root Mean Square Deviation (RMSD) by 2 , , 1 ( ) where n is number of point pairs,

Conclusions
It is a tremendous challenge in registration of CT and US images because the CT and US images of abdominal organ images are obtained from the same patient but at different times and with different imaging modality.In this paper, through the proposed method, CT and US can

Figure 1 .
Figure 1.Architecture of the proposed algorithm.
) of anisotropic diffusion.The operation of anisotropic diffusion is shown in Figure 5.

Figure 6 .
Figure 6.Procedure of registration in CUDA.

Figure 9 .
Figure 9. (a) Case1-Target and fiducial points of US (b) Case1-Target and fiducial points of CT (c) Case2-Target and fiducial points of US (d) Case2-Target and fiducial points of CT.
The size of these two CT and US images are 512 512, and 680

Table 2 . Registration error in Case1 and Case2.
According to experimental results, the computation consuming is improved by using the configuration of cooperated GPU and CPU to get computation advantage around 23.36 times.That implies the architecture of GPU and CPU can be applied in time critical applications.