Markerless Respiratory Motion Tracking Using Single Depth Camera

The aim of this study is to propose a novel system that has an ability to detect intra-fractional motion during radiotherapy treatment in real-time using three-dimensional surface taken by a depth camera, Microsoft Kinect v1. Our approach introduces three new aspects for three-dimensional surface tracking in radiotherapy treatment. The first aspect is a new algorithm for noise reduction of depth values. Ueda’s algorithm was implemented and enabling a fast least square regression of depth values. The second aspect is an application for detection of patient’s motion at multiple points in thracoabdominal regions. The third aspect is an estimation of three-dimensional surface from multiple depth values. For evaluation of noise reduction by Ueda’s algorithm, two respiratory patterns are measured by the Kinect as well as a laser range meter. The resulting cross correlation coefficients between the laser range meter and the Kinect were 0.982 for abdominal respiration and 0.995 for breath holding. Moreover, the mean cross correlation coefficients between the signals of our system and the signals of Anzai with respect to participant’s respiratory motion were 0.90 for thoracic respiration and 0.93 for abdominal respiration, respectively. These results proved that the performance of the developed system was comparable to existing motion monitoring devices. Reconstruction of three-dimensional surface also enabled us to detect the irregular motion and breathing arrest by comparing the averaged depth with predefined threshold values.


Introduction
Intensity Modulated Radiotherapy (IMRT), Volumetric Modulated Arc Therapy (VMAT), and Stereotactic Body Radiotherapy (SBRT), which are in common to deliver higher dose to a specific target, have become widely used techniques recently.If higher dose delivery to normal tissue occurs due to an irregular motion of a patient, this may cause the worse damage than that with the conventional radiotherapy technique.Thus, realtime patient monitoring during radiotherapy treatment is essential for high precision radiotherapy.
Patient motions are classified into two categories: inter-fractional patient motion and intra-fractional motion.Inter-fractional patient motion is a setup error or a reproducibility error of body contour between treatments.Intra-fractional motion is a movement during treatment due to respiratory motion, irregular motion, and physiological migration of organ.In prostate IMRT, changes in bladder and rectum filling volume cause the prostate intra-fractional motion [1] [2].
A patient's immobilization and an anchorage or respiratory monitoring systems have been invented to minimize the intra-fractional motion.Immobilization of patient such as abdominal compression used in lung SBRT is popular technique for reducing the amplitude of respiratory-induced tumor motion [3].Those techniques require, however, enormous efforts for elder patients.Additionally, attaching respiratory monitoring sensors are a time-consuming process that leads to prolonged treatment time for patients.These systems can monitor only one point per patient although amplitudes and phases of respiration intricately change at multiple locations simultaneously.
Noninvasive patient motion monitoring and re-positioning devices have been used recently; e.g.multiple infrared external markers placed on the chest and abdominal surface of a patient with ceiling mounted cameras [4] or CCD cameras [5].The researches on temporal and spatial accuracy of video-based or laser/camera-based three-dimensional optical surface imaging devices such as Sentinel [6] or VisionRT [7] have been reported.These systems can acquire the surface image of the patient during radiotherapy and compare it with a reference surface for patient re-positioning in real-time [8]- [11].In this way, detection of patient motion and re-positioning without radiation exposure in real-time has become a hot topic, while these researches focused on the patient re-positioning accuracy, i.e. reducing the inter-fractional patient motion.
We have been developing a noninvasive motion monitoring system for radiotherapy to reduce intra-fractional patient motion using a consumer camera.To develop the system, we selected a depth camera, Kinect version 1 (v1) released by Microsoft [12].Kinect v1 can measure the distance to a target (which is called "depth") in real time using infrared random-dot patterns with the covering range from 0.8 m to 4.0 m.The depth information is obtained as a depth image.Kinect is used for the human detection in environment, location estimation and motion tracking [13]- [15].Therefore, Kinect has strong potential for noncontact and noninvasive motion monitoring during radiotherapy.
Motion monitoring applications based on depth cameras have been reported for radiotherapy or radiation diagnosis.Xia et al. showed that a respiratory curve was measured from depth image taken by Kinect v1 [16].They evaluated the cross correlation coefficients between detected respiratory curves by a Kinect and a strain gauge system.They also reported an applicator monitoring system for localization of image-guided brachytherapy.Their approach also needs the reflective markers for detection [17].
Aoki et al. [18] measured the volume of region of interest in thoraco-abdominal through respiratory motion.They found the correlation between the volume obtained by Kinect and the respiratory curve obtained by a spirometer for measuring expiratory flow volume.Obviously, this approach could not distinguish the difference between the thoracic respiratory motion and the abdominal respiratory motion.
Jochen et al. reconstructed respiratory curve from three-dimensional surface measured by a time of flight (TOF) method [19].Micro HeB et al. developed a method to detect a global motion of the whole human torso surface using dual-Kinect for respiratory motion correction in PET images [20].
In this paper, we propose a contact-less and noninvasive intra-fractional motion monitoring system.Our approach introduces three new aspects for three-dimensional surface tracking.The first aspect is an implementation of an algorithm originally proposed by Ueda [21] [22] for noise reduction and fast regression of depth values.This algorithm calculates a mean value of an ROI with same time interval and fitted a quadratic curve to them.The second aspect is an application for detection of patient's respiratory motions at multiple points.Although amplitudes and phases of respiratory curve change at multiple points simultaneously, recent respiratory management systems measure the participant's motion or respiratory curves at only one point.The third aspect is an estimation of the three-dimensional surface from depth values at multiple regions over the whole thoracoabdominal area.

Method Overview
In this section, we will present a novel approach for intra-fractional motion monitoring system using three-dimensional surface based on the depth data from Kinect.First, Ueda's algorithm was applied to filter the depth values from Kinect for real-time monitoring.We examined the performance of this algorithm by measuring the oscillating phantom.Two breathing patterns measured by our system were also compared with those by a laser range meter as a further performance test.As a next step, four types of motions from a participant were acquired to validate our system to track the respiratory motions of a participant.The cross correlation coefficients of respiratory curves between our system and a commercially available strain gauge system are calculated for comparison.Finally, we describe a method for three-dimensional surface reconstruction based on a multiple regions measurement.

Setup
Our system was composed of a Kinect v1 sensor and a laptop PC (Windows8).Kinect has an infrared sensor and a projector emitting a random dot patterns.Kinect detects the shift of patterns, and calculates a depth image based on triangulation method, which is also known as Light Coding [23].Using Kinect SDK we directly obtained depth values, which is proportional to the distance to the target.The depth image is 640 pixels × 480 pixels in size and generated in real time (30 frames per second).The system code was written in C# supported by Kinect SDK library.For data processing including visualization, Python 2.7.5 with matplotlib [24] and Numpy [25] modules were used.

Ueda's Algorithm
Detection of intra-fractional patient motion during radiotherapy requires real-time processing and high precision motion tracking techniques.Intra-fractional patient motion is usually so small that the motion signal could be buried in noises.Efficient real-time noise reduction is one of the key tasks to construct the intra-fractional motion monitoring system.
In this paper, we introduced Ueda's algorithm as a fast and effective noise reduction method.In his algorithm the local time-profile of depth values is approximated to a quadratic form; ( ) With defining the squared residual error J by ( ) ( ) where i y denotes the measured depth value at time , each coefficient k a is determined by least square method as follows, 0, 1, 2,3.
The largest advantage of applying this algorithm is that those coefficients are able to be solved analytically resulting in significant reduction of the computational time.Ueda reported that this algorithm successfully tracked the respiratory motion in real-time [21] [22].

Evaluation of the Noise Reduction Performance
To evaluate the performance of Ueda's algorithm for noise reduction, we measured the oscillating object with and without the noise reduction filter.As shown in Figure 1(a), the QUASAR Programmable Respiratory Motion Platform (Modus Medical Devices, Inc., London, ON, Canada) [26] was used for simulating respiratory motion.QUASAR has an insert phantom whose motion is programmable.We set control parameters to make a simple sinusoidal vibration as follows, ( ) where A and T are an amplitude and a period of a sinusoidal motion of the phantom.In this experiment, the control parameters were set as 11 mm A = and T = 6 s (10 Breaths per Minutes) for simulating a human breathing motion.Kinect was placed at 100 cm away from the centroid of the sinusoidal motion of the insert phantom (Figure 1(b)).To compare with the time profile of the insert phantom, i.e.Equation ( 4), we calculated the measured time profile by subtracting 100 cm from the measured depth values.

Comparison with the Laser Range Meter
We demonstrated an experiment using a laser range meter to evaluate characteristic of noise reduction by Ueda's algorithm.We used the laser range meter, IL-2000 manufactured by Keyence, which enables us to measure distances from 1000 mm to 3000 mm.Both IL-2000 and the Kinect were set on 110 cm above the participant.
We measured the two kinds of motion, abdominal respiration and breath holding, at the same time.We continuously monitored the distance changes observed every 200 μs by the laser range meter at one point on the participant's abdomen.In contrast to the laser range meter, Kinect measured average depths of the each region of interest (50 pixels × 50 pixels) including the monitored point by the laser range meter.After data taking, acquired data were normalized using the maximum value and the minimum value.We calculated the cross correlation coefficients between the laser range meter and the Kinect.

Comparison with a Consumer Respiratory Monitor
We must validate that our system has an ability to track the respiratory motions of a participant.To verify whether the different types of respiratory motion can be distinguished, a participant was suggested to take four different types of motions e.g.thoracic respiration, abdominal respiration, breathing arrest and irregular motion.We measured the depth curves for each motion using a commercially available strain gauge system, AZ733V known as Anzai-belt (Anzai Medical Co., Tokyo, Japan) as well as Kinect.Time offsets had been observed even though the signals of Kinect and Anzai-belt were simultaneously acquired.To extract the cross correlation coefficient, we started with defining the cross correlation function ( ) where, n K represents the normalized depth values of Kinect and n k A + represents the normalized signals of Anzai-belt shifted by time lag k.The maximum value of this function is regarded to be the cross correlation coefficient.These calculations were performed in 42 regions over a participant's thraco-abdominal area.

Calculation Method of Three-Dimensional Surface
Figure 2 shows the experimental setup for a detection of participant's motion.The Kinect sensor was placed on 100 cm above the participant during a treatment, so as to cover whole thoracoabdominal area of the participant.Each region of interest for calculating the average depth value is 50 pixels × 50 pixels in size.42 regions overlap by 25 pixels in the horizontal and the vertical direction.We reconstructed the three dimensional surface of the participant by interpolating those 42 average depth values according to a thin-plate spline method (Figure 3).

Intra-Fractional Motion Method
Irregular motion and breathing arrest are major concerns during radiotherapy treatment.Averaged depth data  taking over 42 regions of a participant's thoracoabdominal surface make it possible to detect these motion errors and issue a warning alert.To provide the motion error detecting system, we started with checking the fluctuation of depth values in time by defining the absolute difference ( ) ( ) abs , , 1, 2, , 6; 1, 2, , 7; where ( )( ) , , S i j t ∈  denotes an average depth value in the ( ) , i j region at time t.Notation of the region in- dex is as follows; ( ) , i j indicates that the region locates at  th in horizontal direction and th j in vertical and the most upper left region is assigned as ( ) ( ) ( ) ( ) As long as the patient stays still and keeps normal respiration, ( ) ave T t also expectedly stays constant with small fluctuations.Once an irregular motion or a breathing arrest is occurred, however, ( ) ave T t no longer re- mains constant, resulting in going larger for an irregular motion or coming close to zero for breathing arrest.We set two threshold values, bd T and br T , for an irregular motion and for breathing arrest respectively as follows, ( ) ( ) If ( ) ave T t goes beyond bd T , our system reacts as detecting the irregular motion and issues the warning message and sound of "Motion Error!" appeared on the screen.If ( ) ave T t goes below br T , our system reacts as detecting the breathing arrest and issues the warning message and sound of "Breathe Holding!" on the screen.

Performance of Ueda's Algorithm
We used our developed motion monitoring system to measure the oscillating QUASAR phantom with amplitude of 11 mm and a period of 6 s, which was 100 cm away from the Kinect.Figure 4 shows that our system with the noise reduction filter based on Ueda's algorithm successfully tracked the sinusoidal oscillation of the phantom, resulting in the mean residual errors of 0.00 ± 1.94 mm.On the other hand, the measured time profile without Ueda's algorithm showed a poor tracking result with the mean residual errors of 3.00 ± 4.67 mm.Ueda's algorithm for noise reduction evidently improved not only the accuracy of the centroid of the oscillation but also the precision of the measurements.
We also measured two kinds of breathing patterns, abdominal respiration and breath holding, to examine the performance of our system.Breathing patterns measured by the Kinect and by the laser range meter, IL-2000, are compared in Figure 5.Our results (white circles) were in good agreement with ones by the laser range meter (black lines).The cross correlation coefficients between them were 0.982 (p < 0.001) for abdominal respiration and 0.995 (p < 0.001) for breath holding (t-test).

Detection of Respiratory Signal in Multiple Regions
To confirm whether our system can distinguish the respiratory motions from irregular motions, we have examined our system with four different types of motions, thoracic respiration, abdominal respiration, breath holding, and body motion.
The difference between thoracic respiratory motion and abdominal one was expectedly not easy to be identified, while irregular motion and breath holding are easily distinguished themselves from other motions because of their characteristic depth patterns.Figure 6 shows that our system clearly distinguishes the four types of motions.It should be noted that our system successfully identify the difference between the thoracic respiratory motion and abdominal one, comparing the depth curves over one respiratory period (Figure 7).
In Figure 8, the crosses represent the depth values measured by the Kinect and the dotted line by the Anzaibelt.The cross correlation coefficients between them were obtained by maximizing the cross correlation function k The resulting cross correlation coefficients was 0.90 for thoracic respiration, and 0.93 for abdominal respiration respectively.Figure 9 shows distributions of cross correlation coefficients at 42 regions on the participant.The distributions of cross correlation coefficients for thoracic respiratory motion and abdominal one are also shown in Figure 9 right and left respectively.

Intra-Fractional Motion Detection Using Three-Dimensional Surface
According to the method described in section 2.6, we demonstrated how our intra-fractional motion detecting system worked.First we set threshold value T to 20 mm and 0.5 mm in case of irregular motion and breathing arrest respectively.These threshold values were carefully determined after repeating a test many times.The participant was ordered to take various kinds of motions including breath holding.Every time the participant made irregular motions, our system successfully demonstrated detecting motion errors with issuing a warning message and alert sound shown in Figure 10.

Discussion
We have developed an intra-fractional patient motion detection system for radiotherapy treatments by using  depth camera, Microsoft Kinect.High accuracy and real-time noise reduction of the depth data are essential for detecting of the intra-fractional motion.First, we introduced Ueda's algorithm for real-time noise reduction, since it makes it possible to directly implement an analytical solution of least square method.To validate its performance, two breathing patterns were measured by the laser range meter as well as Kinect.The resulting cross correlation coefficients between them, 0.982 for abdominal respiration and 0.995 for breath holding, indicates that the developed system has an ability to track human respiratory motions.
Second, we detected patient motions at multiple points unlike other conventional respiratory monitoring systems, such as Anzai.There are some reports regarding the measurements of the respiratory curve using a Kinect sensor, but the number of region for monitoring was, however, limited.In our newly developed system, we set multiple regions to acquire the respiratory curve in the whole thoracoabdominal region of the participant.
The respiratory curve was measured by Kinect and by Anzai simultaneously and the cross correlation coefficients between them are compared.As a result, the average of the cross correlation coefficients of the thoracic region and abdominal region were 0.90 and 0.93, respectively.These results were consistent with the report of Xia and Jochen, proving that the performance of our system was comparable to an existing monitoring system.Third, we were able to reconstruct the three-dimensional surface by using spline interpolation of depth data obtained from all thoracoabdominal regions.Furthermore, comparing the average depth value with the predefined threshold values every frame, this system successfully detected motion errors and issued warning messages on the monitor screen in real-time.
Although the Kinect sensor was placed 100 cm above the participant in most cases of this study, this setup is, however, not always possible for daily clinical uses in a radiotherapy treatment room.If we place the Kinect on the wall of the treatment room, which is considered as a realistic solution for clinical uses, the longer distance and the smaller incident angle of infrared light to a patient may cause the worse resolution in measured depth.This must be seriously considered before adapting this system to a daily clinical use.
Even though the Kinect only covered the relatively narrow region of the participant body, this study proved that the developed system based on the depth camera has potential to reconstruct the three dimensional surface of the patient without any contacts to the patient.If depth data become available for wider range of the patient body, the detection efficiency of intra-fractional patient motion could be increased considerably.The multidepth camera system would make this possible in near future.

Conclusion
We constructed a three-dimensional noninvasive and contact-less motion tracking system by the use of a depth camera.The proposed method enabled us to monitor respiratory curves at multiple points simultaneously unlike other conventional devices.Owing to the Ueda's algorithm, we obtained the noise-less depth values rapidly and accurately so that the patient surface is reconstructed by using thin-plate spline method.

Figure 1 .
Figure 1.(a) A QUASAR phantom was used to simulate respiratory motion.(b) The experimental setup.We placed a Kinect sensor 100 cm from a QUASAR phantom.

Figure 2 .
Figure 2. The experimental setup for detection of participant motion.The Kinect sensor was positioned 100cm above patient's surface.

Figure 3 .
Figure 3. Reconstruction of three-dimensional surface.The 42 squared regions were configured from the participant's chest to abdomen (Left).The average depth values were calculated in each region.Subsequently, the three dimensional surface was calculated by thin plate spline interpolation method (right).

Figure 4 .
Figure 4.The results for tracking a oscillating phantom.In upper figure, black squares represent the time profile of raw depth values, white circles represent one with Ueda's algorithm, comparing with the sinusoidal oscillation curve in bold dashed line.The sinusoidal oscillation of the phantom was set with an amplitude of 11 mm (thin dashed line) and the period of 6 s.The residual errors between the theoretical sinusoidal oscillation curve and the measured time profile of the phantom are also shown in lower figure.Black crosses and black dots represent the results with and without Ueda's algorithm, respectively.() ,

Figure 5 .
Figure 5.The comparison of measured respiratory patterns for the abdominal respiration (left) and the breath holding (right).The white circles represent the respiratory motion patterns measured by Kinect and the black line by the laser range meter, IL-2000.

Figure 6 .
Figure 6.Monitoring four types of respiratory motion.Upper left, lower left, upper right, lower right figures correspond to thoracic respiration, abdominal respiration, respiratory holdings, irregular motion, respectively.

Figure 7 .
Figure 7.The comparison of the respiratory patterns for the thoracic respiratory motion (left) and for the abdominal respiratory motion (right).

Figure 8 .
Figure 8.The comparison between the depth values measured by Kinect (black crosses) and Anzai (dotted line).Upper figure represents an abdominal respiratory motion and lower for a thoracic respiratory motion.

Figure 9 .
Figure 9. Distributions of cross correlation coefficients on the surface of the participant (100 s measurement).

Figure 10 .
Figure 10.Detection of the intra-fractional motion.The system sends the warning message on screen promptly once the averaged depth fluctuation exceeds the predefined threshold.