Real-Time Range of Motion Measurement of Physical Therapeutic Exercises

Physical therapeutic exercise (PTE) is the planned process of performing bodily movements, postures, or physical activities to provide a patient with the ability to remediate or prevent impairments at a minimum. The efficacy of the PTE involves measuring accurately the range of motion (ROM) of joint functions and parameters that indicate the onset of fatigue, jerky motion, and muscle/joint resistance to the PTE. A physical therapist (PT) typically determines the efficacy of a PTE by measuring joint angles in clinical diagnosis to assess the ROM using the simple device Goniometer since motion capture systems are generally expensive, difficult to use, and currently not suited for real-time operations. The joint angle measurement using Goniometer suffers from low accuracy, low reliability and subjective. Furthermore, a patient when performing PTE by themselves at remote locations like their home or community centers cannot use a Goniometer to determine the efficacy. In this study, we present the approach of using an inexpensive, simple human motion capture system (HMCS) consisting of a single camera and a graphical processing unit (GPU) to perform the efficacy of the PTE in real-time. The approach involves the use of general purpose graphic processing unit (GPGPU) computer vision technique to track and record human motion and relate the tracked human motion to the prescribed physical therapy regimen in real-time. We have developed a tracking algorithm derived from the Klein’s algorithm known as the Modified Klein’s algorithm (MKA) capable of tracking human body parts while the original Klein’s algorithm was only capable of tracking objects with sharp edges. The MKA algorithm is further modified for parallel execution on a GPU to operate in real-time. Using the GPU, we are able to track multiple markers in a high definition (HD) frame of the HD video in 1.77 msecs achieving near real-time capability of ROM measurements. Furthermore, the error in the ROM measurements in comparison to How to cite this paper: Kumar, R.R.P., Muknahallipatna, S., McInroy, J., McKenna, M. and Franc, L. (2017) Real-Time Range of Motion Measurement of Physical Therapeutic Exercises. Journal of Computer and Communications, 5, 19-42. https://doi.org/10.4236/jcc.2017.59003 Received: June 1, 2017 Accepted: June 30, 2017 Published: July 3, 2017 Copyright © 2017 by authors and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access R. R. P. Kumar et al. 20 Goniometer measurements is in the range of −2.4 ̊ to +1.4 ̊, which is well within the joint measurement prescribed standards. The suitability of the HMCS for elbow PTE is also presented.

Goniometer measurements is in the range of −2.4˚ to +1.4˚, which is well within the joint measurement prescribed standards.The suitability of the HMCS for elbow PTE is also presented.

Introduction
A typical therapy session constitutes a patient performing PTE consisting of a series of complicated bodily motions in a pattern at certain intensity level to achieve a beneficial range of motion.Current therapy sessions are performed by a patient under the supervision of a physical therapist in a clinical setting.The PT at all times is in the immediate proximity of the patient assisting the patient in performing an exercise or set of exercises in a proper way.Furthermore, the PT ensures the full compliance of the patient in meeting the prescribed motions and number of repetitions through verbal feedback.This individual attention by a PT to a patient during a therapy session is a viable approach if an adequate number of PTs are available.
However, with the ever-increasing demand for physical therapy services by the aging population coupled with the shortage of highly skilled PTs, has made individual patient attention approach by immediate proximity a non-viable solution.Furthermore, in many geographical areas in the US, patients have to travel a long distance to avail the services of a PT.The lack of a sufficient number of PTs, long distance travel and expenses force many of the patients to skip entirely or inadequately perform physical therapy sessions.An Ad-hoc approach currently used to address the above issues is expecting the patients to perform the exercises remotely on their own using diagrams of various exercises made available by the PT.The ad-hoc approach leads to a deleterious effect on the efficacy of the physical therapy.Efficacy of physical therapy is directly correlated to the patient's ability and willingness to comply with the assigned work.Furthermore, it is very difficult to measure the outcomes of PTE, due to lack of evidence-based medicine associated with physical therapy.Physical therapy remains a largely manual process with less objective data generated than most other disciplines.Currently, there is no common, cost-effective way to monitor patients' compliance and outcome.Hence, to address the above issues, there is a need for an alternate approach that would allow a patient to perform PTE remotely but still under the supervision of a PT.
We propose an alternate physical therapy strategy of using a human motion capture system (HMCS) based system to assist the patient in performing repeated physical therapy sessions with real-time feedback and supervision by a PT.In this paper, we present the ongoing work of the human motion capture system based on the use of general purpose graphic processing unit (GPGPU) computer vision technique to track and record human motion.
Generally, physical therapy actions are correlated with the geometric motion of specific body parts.The body parts may include, for example, a limb, elbow, foot, wrist, torso, neck and head.The motion is measured by calculating the position of a body part of interest in relation to another body part [1], which may or not be stationery.The motion is not measured by determining the position in relation to physical surroundings.Typically, a PT uses Goniometer to measure body angles to determine the ROM of a specific body part.The ROM is measured and recorded before starting the physical therapy sessions.The measurement may be performed again at the end of each therapy session or after few sessions to capture the efficacy of treatment.Since, the measurements are performed manually using the Goniometer, measurements during the PTE is not possible.For example, there is no objective data recorded which would help the PT to determine the intensity of the prescribed PTE.
In our research related to the HMCS, we envision a patient performing PTE can have ROM, velocity, acceleration and other parameters measured accurately in real-time, using a single off-the-shelf high-definition camera in a cost effective way.The PT now receiving the measured parameters in real-time can provide feedback to the patient and supervise the PTE.This paper is organized as follows: In Section 2, we present the existing human motion capture approaches.Section 3 presents an overview of the proposed HMCS.In Section 4, the computer vision based tracking and the Klein's tracking algorithm is discussed.In Section 5, the modified Klein's algorithm suited for tracking the motion of human body parts is presented.Section 6 presents the parallelization of the modified Klein's algorithm for GPGPU computing.In Section 7, the implementation of MKA on GPU is presented.The MKA performance and the application of HMCS to determine the efficacy of elbow PTE is presented in Section 8. Section 9 presents conclusions and future work.

Related Work
Muller, et al., in [2] have used two inertial measurement units (IMU) to perform online measurements of the flexion/extension and the pronation/supination angle of the elbow.They use kinematic constraints of the elbow to estimate the two dominant axes of reference and thereby eliminate the need for alignment of the IMUs.To achieve real-time capability a window of measurements is chosen with the window shifted by each new measurement.This approach, however, requires a patient to attach the IMUs at the correct position on the elbow and in the same position repeatedly.Furthermore, the computational time required to estimate the two dominant axes of reference is 20 secs making this approach not suitable for real-time application.
Chang, et al., have demonstrated the use of the Microsoft Kinect to track the motion of a human body [3] in real-time.They use the Kinect to track the twenty joint positions inherently tracked by the Kinect SDK.They provide feedback to the patient about the number of repetitions a particular geometric motion was performed and change in the ROM.However, this approach requires the use of the special camera the Microsoft Kinect, which is typically now owned by elderly patients.The measurements performed using the Kinect is based on the SDK database of joint positions of a large number of healthy human beings.Using the joint positions of healthy human beings as reference position introduces errors when the joint positions of a patient performing PTE are measured.
Zhao proposes an assessment method using kinematic rules [4] for each rehabilitation exercise to provide specific feedbacks regarding the quality of exercise.
The Microsoft Kinect is used to measure joint angles, and a fuzzy inference engine is used to assess the overall quality of the execution of a PTE.The fuzzy inference engine ability to assess the quality of the PTE performed is dependent on a set of kinematic rules specifying the requirements for each PTE fully and measures the joint angles accurately using the Kinect camera.
Two Microsoft Kinect cameras are used by Chen, et al., to construct 3D joint positions [5] and then they construct center of voxels upon the joint positions using the mean shift algorithm.The joint moveable angles and the motion data are calculated from the center of voxels frame by frame.They compare their ROM measurements with VICON motion capture system.This approach of measuring the ROM of a patient first requires the patient to perform the PTE in front of a VICON motion capture system to record the skeleton data of the patient body as Golden Standard or ground truth.The two Kinect cameras have to be positioned at a precise distance from the patient.Due to the reconstruction of 3D joint positions and center of voxels in each frame makes their approach computationally intensive, and not suitable for real-time application.
Liu, et al., have proposed a method to track human motion using multiple low-cost RGBD cameras [6].They use multiple RGBD cameras to address the missing depth values, and noise issues with low-cost Microsoft Kinect cameras.
They also address the problem of body occlusions with a single camera by using multiple cameras.Using the color and depth images and point cloud acquired in each view, they extract an initial pose.The initial pose is dynamically updated by combining a filtering approach with a Markov model to estimate the new poses and thereby measure the ROM of the entire human body.However, this approach is heavily dependent on correctly aligning the point clouds from three depth cameras.If the alignment is not performed accurately, the tracking will fail.The approach requiring pose determination is computationally intensive making it unsuitable for real-time application.
Penelle, et al., have used multiple Microsoft Kinect cameras to track human motion by fitting a model to the observed data, which are the depth images from the Kinect cameras [7].They have been able to achieve near real-time execution speed using a GPU.However, the tracking accuracy is dependent on having a perfect 3D model of the patient, which requires pose estimation using multiple cameras.
All of the discussed approaches for tracking human motion require a) the use of specialized camera or cameras, b) multiple cameras c) active sensors/markers attached to parts of the human body or d) involve complex manual adjustments to get the right output.The patient is expected to position the multiple cameras at precise locations to construct a 3D model of the patient or estimate pose of the various parts of the human body.Furthermore, the majority of the approaches are computationally intensive.Especially with HD cameras, the time needed to process images is so high that the approaches are unsuitable for real-time use.
Our proposed approach differs significantly from the previous work by using a single camera without the need of pose estimation or 3D model of the patient.In our approach, we process high definition resolution monocular images with passive optical markers on body parts of interest to measure individual ROM.The camera can be of any RGB type without any depth measuring capability.Furthermore, our approach has a parallel implementation on GPUs, which facilitates real-time processing.

Overview of Proposed HMCS
The primary function of our HMCS is to measure the joint position or angles when a patient is performing a particular PTE.As an example, in Figure 1, a typical PTE a patient performs after an elbow injury is shown.
The patient performing the elbow PTE is expected to perform the flexion and extension motions of the arm maintaining the shoulder at a stationary position.
If a patient allows the shoulder to move along with the flexion and extension motions, the elbow muscles and the joint will be subjected to forces that are not prescribed in the PTE and will result in poor efficacy.A high efficacy of the elbow PTE is achieved when the patient performs a smooth flexion and extension motions according to the prescribed joint angles and repetitions without experiencing fatigue.This necessitates measuring the joint angle position in reference to the shoulder horizontal axis during the PTE to determine the number of repetitions, smooth motion, ROM and fatigue free motion.In Figure 2 The HMCS structure overview is shown in Figure 3.
As shown in Figure 3, an off the shelf camera is used to capture the video of a patient performing PTE with passive optical markers attached to the body part performing PTE as shown in Figure 2. The video can be transmitted in real-time to the therapy clinic or to a laptop/PC with GPU for computing the ROM.The flow diagram in Figure 4 depicts the computational steps required to process the video and measure the joint angles.
In Figure 4, the loop represents the modified Klein's tracking algorithm, which is executed in parallel for each marker on a GPU.The first input to the algorithm is the initial position of the markers, which is located approximately at the center of a marker determined visually by a mouse click.The second and the third inputs are the size of a marker in pixels and the number of markers.The output of the algorithm is the centroid of the markers in each video frame.The Klein's and MKA for tracking human body are discussed in detail in the following section.

Klein's Tracking Algorithm
The Klein's tracking algorithm [8] focuses on tracking augmented reality workspaces.The algorithm uses the corners of different objects in a workspace as the   A search template of the marker is generated using the warped conversion.
The image and the marker template are down-sampled by several levels, reducing the dimension of the image and the template.One level of sub-sampling involves reducing the image or marker template to half its original size, i.e., replacing one pixel with four adjacent pixels.This helps in reducing the overhead of the search.The algorithm begins at the S1 part of the transformation for resilience, followed by segmentation.The segmented image is then subjected to the S2 part of the transformation to obtain the position of the marker on the new image.This is repeated for each image in the sequence.

Modified Klein's Algorithm
Klein's algorithm has been modified by its author Klein as well as other researchers as seen in [9]- [16].These modifications focus on a) tracking an object with lesser or different parameters object parameters [9] [10], b) using low per devices [11], which may involve cloud computing for robots [12] c) using different camera configurations/error estimation techniques [13], d) improving

Transformation
The first part of the transformation in Klein's algorithm provides resilience.The approach here is to make the tracking more resilient not just to lighting changes, but also to the background of the image.An overview of the transformation phase is shown in Figure 6.reproduce a broad array of colors, meaning each RGB frame provides 3 sub-frames of m n × pixels, where the sub-frames belong to the red, green and blue color models respectively.Let each RGB image be given by k A , where ).Using this constraint, the transformation is formulated as shown in the middle part of Figure 6, using: using the following standard conversion:

The final step of the transformation is to obtain the output image
, as shown in the bottom part of Figure 6, using: where, ij k f F ∈ .The output image k F renders the markers as perfect black spots having zero values.The two-step transformation not only provides resilience to lighting changes by lowering and subtracting the intensity but also helps in identifying the markers more effectively by differentiating them from the rest of the background.

Segmentation
The segmentation step is based on the search regions in the original Klein's algorithm.The boundary of the search region is used to segment the image k F .
The boundary is determined based on the relative motion between the camera and the markers, the dimension of the marker in the image and s f .Let the ve- locity of the marker motion relative to the camera be v .Let the dimension of the th l marker on the image be l l p q × .Let the maximum possible displace- ment of any given marker between two consecutive images be given by x v and , , , Using ( ) In Figure 7, it can be observed that the dimension of the search region on an image is given by ( )( ) implies that there are number of l l p q × dimension marker sized regions within the search region.Each of the l l p q × dimension marker sized region, called as match blocks, is possibly the actual marker position, i.e., every match block is a potential marker.
Hence all match blocks are compared with the marker templates.The technique to compare these match blocks forms the definition part of the algorithm.

Definition
In order to find a marker, a search template (s) is used.The squared summation of the difference (SSD) between each element of the ( ) s template and the cor- responding element in the match block let b represent a match block, then the SSD is given by: ( ) In the MKA's transformation phase, all ij s are reduced to zeros, therefore, the SSD for MKA can be computed using: ( ) Figure 7. Segmentation phase of the algorithm.
This SSD can be implemented in a number of different ways.A technique adopted in [8] is described in Figure 8.The SSD for the first match block position within the search region begins at the first possible match block location, i.e., first row and first column, as shown in the first row of the search in Figure 8.The search continues by performing SSD for subsequent match block locations, which occur in the same row but in subsequent columns.This will continue till the end of the row is reached, and then the search to next row, as seen in the second row of the search in Figure 8.This is continued row-wise till the end of search region is reached as shown in the bottom part of Figure 8 Since The threshold T is based on the illumination and background of the image.It's determined on a per scenario basis by trial and error.If the SSD provides one match, then the centroid of the marker location is taken to be the centroid of the object.If multiple marker positions provide a SSD value satisfying the threshold then the average of all the locations is obtained to get the centroid of the object.
Once the centroid of one marker is obtained, the definition part of the algorithm moves on to the next marker until centroids of all markers in an image are obtained.Once all markers in an image are processed, the tracking algorithm moves on to the next image in the sequence.
Figure 8.The brute force technique to compute SSD for all match blocks within search boundary.

Parallelization of Modified Klein's Algorithm
The MKA needs to be redesigned for a parallel implementation.The re-design is performed in two phases: a. Computationally Intensive Region Identification: This phase is to identify the bottleneck of parallelism in the code.b.Algorithm Modification: Once the region is identified, the algorithm is modified to achieve maximum parallelism.

Computationally Intensive Region Identification
The SSD operation was given in Equation 6.Consider a search region described in Figure 8 represented as shown below: ( ) ) The search region contains multiple match blocks as observed in Figure 8. SSD is performed on each of the match blocks within the search region as seen in Figure 8.Since the dimension of a match block is l l p q × , using the brute force technique to perform SSD on a match block requires l l p q multiplications and ( ) additions, leading to ( ) p q p q + + − number of operations.Using the number of possible match blocks in a search region given in Equation ( 4), total number of operations to perform SSD within a search region is given by: ( ) This leads to ( ) l l x y O v v p q complexity for one search region.Hence this is identified as the bottleneck of the algorithm due to the large number of operations.

Algorithm Modification
The reduce and re-use guidelines are merged as the following techniques not only reduce the number of operations but also aid in re-use of elements.The SSD of match blocks have repeated operations.For example, consider the first two SSDs, SSD 1 and SSD 2 as shown in Figure 8.They are given by: 1.
( ) ( ) f and 22 f are squared and added twice in the first two summations.These repeated computations increase with an increase in the size of the search region and number of markers.
A five-step modification is introduced that reduces the number of operations for SSD.In the first step, all the elements of the matrix are squared.In steps 2 and 3, row-wise summation of search region is computed such that the summation is limited to the marker size.Steps 4 and 5 are similar to steps 2 and 3 but are executed column-wise to obtain summation of the search region limiting it to marker size by columns.These steps are computed as follows: Step 1: Squaring assuming the search region mentioned above, each element is squared as shown below: ) ) multiplications are required to complete this step.
Step 2: Prefix row Summation: For every row, perform a prefix sum as shown below: ( ) ) If each element is computed independently, then there are ) However, for a given row, if dependency is assumed between column summations, i.e., the third column does not compute till the second column is computed, then for a given row there are 2 y v additions.
Hence overall there are 4 x y v v additions.
Step 3: Marker dimension row subtraction: For every row, based on the l q , consecutive elements starting from ( ) column of each row are subtracted from each column (say th r column) with ( ) th l r q − column of every row, as shown below: ) This requires a total of ( )( ) Step 4: Prefix column summation: Starting from the ( ) th l q column, perform a column wise prefix sum as shown below: ) similar to step 2, this step requires a total of ( )( ) Step 5: Object dimension column subtraction: Similar to the row-wise submatrix summation, start from the ( ) column and perform the column wise summation as shown below: ( ) ) This step requires a total of ( )( ) The summary of the computations required is given below in Table 1.
Since ( ) ( ) + > , overall SSD per search region is reduced to a computational complexity of ( ) O v v which is significantly lower than Equation 8 calculated using the brute force technique shown in Figure 8.

GPU Implementation
General-Purpose Computing on GPU (GPGPU) is a new parallelization domain to accelerate scientific and engineering computations in vision algorithms [17].
NVIDIA's Compute Unified Device Architecture (CUDA) is the earliest and most widely adopted programming model for GPGPU [18].Hence, for the implementation of parallelized MKA, we have used NVIDIA GPU with CUDA programming model.
The GPUs, at a high level, are a collection of Streaming Multiprocessors (SMs), with each SM having the ability to concurrently execute large number of threads that are grouped in blocks, with each SM and thread having access to dedicated shared and register memory.Hence, the implementation of parallel algorithms has lot parameters such as number of threads, blocks, shared memory size, etc. to be determined.
In our previous work [19], we have proposed and implemented parallel version of the Scale Invariant Feature Transform (SIFT) algorithm on GPUs.In [19], we describe the architecture of GPUs (section 4) and a framework on how to calculate the parameters (section 5) to obtain a near optimal implementation Table 1.Summary of the computations required for all match blocks' SSD.
Step number Computations 1 ( )( ) ) ) ) ( ) of the parallel algorithm.We have adopted the same framework in this paper to calculate the amount of shared and register memory and to determine the number of blocks and threads to be used for execution of the algorithm.
The implementation for MKA's SSD operation can be written such that it uses different amount of register and shared memory.For example, the elements of the search region can be completely put on shared memory or a row or column of the search dimension can be moved to register memory.As such the blocks and threads will vary based on the implementation.While the memory access latency on shared memory is higher than register memory, the execution time for code using shared memory or shared and register memory combination are comparable.For this paper, the code was written to minimize the number of registers and maximize the shared memory, since this resulted in a cleaner and single source code for different search region dimensions without compromising much of performance.Using the minimized registers code and using NVIDIA profiler to determine the total amount of shared and register memory, the number of threads per block and the number of blocks was determined using Equations 28 and 30 respectively in [19] for each search region.In general, the number of threads per block was found to be 256 for search dimensions having less than 2000 pixels and 512 for search dimensions having more than 2000 pixels.
The number of blocks varied based on the search region dimension.

Results and Analysis
In this section, we present and analyze the effectiveness of the HMCS in tracking the motion of a human body part which performing PTE.First, to determine the suitability of the HMCS for real-time applications we compare the performance of sequential Klein's algorithm with MKA.Second, we compare the performance of the parallel implementation of Klein's algorithm and MKA, parallelized using standard libraries.Lastly, we look at the performance of the parallel MKA, parallelized using our framework.
For analysis, the programs are executed on Intel Xeon CPU, having two E5620 processors operating at 2.40 GHz and running a 64 bit Windows 7 Pro Operating System.The sequential programs are executed on a 64-bit Matlab R2014b software.Matlab's original core has been developed from LINPACK and EISPACK [20].LINPACK and EISPACK have proven to be computationally effective ways to solve linear algebra problems [21].Hence we use this software to obtain the reference time for sequential programs.For parallel program execution, the CPU is equipped with a NVIDIA Tesla C2075 card.The card is equipped with 448 cores and 6 GB of memory for general computations.Though K20x and K40m cards seem to be a good option for GPU parallelization, we limit ourselves to a low-cost GPU such as C2075.
For developing parallel Klein's algorithm and MKA code using standard libraries, a combination of cuLA and cuBLAS library package was used.cuLA is a CUDA based accelerated Linear Algebra (LA) package provided by NVIDIA for GPUs.Similarly, cuBLAS library package is another accelerated Basic Linear Al-gebra Subprograms (BLAS) library provided by NVIDIA for GPUs.Lastly, for profiling NVIDIA's Visual Profiler, NVVP, compatible with CUDA 6.0 and Microsoft Visual 2015 was used to profile the code.
We use execution time as a measure of performance.Low execution time is considered to be better.Each implementation of an algorithm for a given search dimension is considered as one simulation and the execution time is collected.
Each simulation is executed one thousand times and AET is computed.The standard deviation of AET for all simulations was observed to be under 6.5%.
Each simulation has been verified by manually inspecting the tracked positions on the image.Data, for verification of algorithms, is taken from videos under indoor computer lab conditions using 60fps at 1080p resolution.For the tracking algorithm, the marker size is generally small and the search region varies from image to image.For the simulations, 8 l l p q = = and { } 32, 64,128, 256,512

Application of HMCS to Elbow PTE
The video frames of a patient performing the elbow flexion and extension exercise shown in Figure 2 is presented again for the completion.In   Therefore, the PT based on the measured ROM can see the patient has some physical issues with elbow flexion and provide appropriate feedback.Furthermore in Figure 13, the ROM measurements between markers (1, 3) indicate that the patient is not maintaining the arm in the rigid position.In Figure 14 and In Figure 14, it can be seen that the patient is performing the elbow motion with a significantly large variation in angular velocity and correspondingly with an angular acceleration and deceleration as shown in  and whether the patient is experiencing fatigue.In Figure 16, the error in ROM measurement between using the HMCS and a Goniometer is shown.The error varies anywhere from −2.6˚ to +1.4˚ which is well within accepted error value of 5˚ [1].This low value of error when compared with the Goniometer measurements indicates our technique of measuring the patient performance in near real-time can be used in clinical settings.Furthermore, our implementation on the GPU allows us to track all of the markers in a HD frame in 1.77 msecs making the HMCS capable of operating in realtime.

Conclusions & Future Work
An analysis of the Klein's algorithm indicated that there was a potential to re-  We compared the effectiveness of our algorithm, with Klein's algorithm for sequential and Klein's algorithm and MKA for parallel implementations.For sequential implementation, MKA performed much better than its predecessor algorithm.To understand the effectiveness of our parallel implementations, Klein's algorithm and MKA were parallelized using the cuLA and cuBLAS library and compared with our parallel implementation.The results showed that our parallel implementation of MKA has the lowest execution time.Hence, our

Figure 3 .
Figure 3. HMCS structure overview for real-time application.
d. Definition: The tracking is performed by evaluating the sum of the squared differences (SSD) between the image and the marker template at all locations within a given search region of an image.A search region is a fixed boundary around the occurrence of the marker in the previous image.The boundary is pre-determined for each workspace.The location within the search region having the smallest SSD value is taken as the location of the marker in the new image.A flow diagram representing the different phases of the Klein's algorithm is shown in Figure 5.The algorithm begins at the S1 part of the transformation for resilience, followed by segmentation.The segmented image is then subjected to the S2 part of the transformation to obtain the position of the marker on the new image.This is repeated for each image in the sequence.
Figure 6.An overview of transformation phase of tracking algorithm.
the pixel values (intensity) of the whole image including the markers.The markers are assumed to be uniformly colored objects.Since the human body lacks uniform coloring, small uniformly colored stickers are assumed to be present on the relevant body part.This is considered as a tracking feature.This aids in segmenting an image more efficiently, reducing noise in the images.Considering blue stickers as markers, the blue frame would have higher numerical pixel values for the marker locations than red or green frames ( reduces the blue content in k G .

yv
along the horizontal and vertical axes respectively.The x v and y v satisfy the relationship: and the known marker location-centroid of the tracking fea- ture on the ( ) the search for the marker in the th k image can be limited within the boundary ( ) estimated using the , , l s v f p and l q values provided.If this estimation fails to track the markers on any frame, then , x y v v are incremented from the initial estimation to widen the boundaries of search region.The new , x y v v values are stored for successive frames.Hence the image k F will be segmented as shown in Figure 7.

Figure 9
Figure9shows the speed-up of sequential execution of MKA over Klein's algorithm for tracking a single marker.The speed-up is obtained by dividing Klein's algorithm AET with MKA's AET.As expected, the MKA outperforms Klein's algorithm by a significant margin.This is expected since MKA has significantly lower computations when compared to Klein's algorithm.Since, Klein's algorithm has

Figure 10 Figure 9 .
Figure10shows the AET of the Klein's algorithm and MKA algorithm paralle-

Figure 10 . 8 . 3 .
Figure 10.Performance comparison of parallel implementations of Klein's algorithm and MKA using standard libraries.
Figure X, the patient elbow has five markers stuck on the hand, which are tracked by the HMCS.The patient in Figure 12, starts at full extension or supine position and bends the elbow to reach the full flexion position.The patient when performing the elbow exercise is expected to maintain the arm (structure between shoulder and elbow joint) rigid and at the horizontal position.The ROM is measured from the zero value position.The ROM angle for flexion is measured ranging from 0˚ to +90˚ to the left of the zero value position.The ROM angle for extension is 0˚ to −90˚ to the right of the zero value position.Using the HMCS, the markers are tracked in Figure X between subsequent

Figure 11 .
Figure 11.AET of MKA parallelized using framework for different region dimensions.

Figure 12 .
Figure 12.Video frames of flexion and extension exercises.

Figure 15 ,
Figure 15, the angular velocity and acceleration of the elbow motion while performing the flexion and extension exercise are shown.

Figure 15 .
Figure 13.ROM measurements between pair of markers.

Figure 14 .
Figure 14.Angular velocity of the elbow motion.

Figure 15 .
Figure 15.Angular acceleration of the elbow motion.