On a Feature Extraction and Classification Study for PPG Signal Analysis

Photoplethysmography (PPG) is a low cost, non-invasive optical technology to detect the volumetric changes of blood circulation at the surface of skin. While the medical indication of components of PPG signals in the form of pulse wave are not yet fully understood, it is vastly agreed that they carry valuable pathophysiological information related to the cardiovascular system. Going beyond just dealing with frequency and time domain features of the pulse wave, as well as the first and second derivatives of the wave commonly seen in many of the relevant work, we applied a K-MEANS improved algorithm for feature extraction based on selected time domain parameters: K1 (systolic area), K2 (diastolic area) and K (entire pulse wave area). The ex-tracted characteristic waveforms under the same light intensity could achieve average confidence level of 90% or higher. The stationary wavelet transform was adopted to further analyze the characteristic waveform by calculating the wavelet entropy; We then trained a Probability Neural Network (PNN) model using the wavelet entropy and other time domain characteristic parameters. It is found that the trained PNN model performs well in analyzing characteristic waveform to distinguish between health condition and severe arterial stenosis.


Introduction
Photoplethysmography (PPG) is an electro-optic technology to generate cardiovascular pulse wave by measuring the volumetric changes of blood circulation at the surface of skin [1]. PPG is both clinically and individually adopted for a wide variety of application scenarios from professional diagnostics to society or home health monitoring. Numerous researches [2] [3] [4] on how to extract valuable information out of the PPG pulse wave beyond intuitive heart rate count and pulse oximetry estimation emerged recently. It is believed that the second derivative of pulse wave contains essential health-related information, hence pulse wave analysis could be of significant value in evaluating cardiovascular diseases, facilitating early detection and recognition of illnesses, and continuous health monitoring.
However due to the electro-optic nature of PPG, many factors could affect PPG signal detection [5]. For example, sensor displacement and movement due to body movement, variation of applied pressure incurred changes of magnitude of the received signal. In reality, PPG measurement usually collects excessive data to average out noises for better signal quality. Nevertheless, this inevitably could further raise difficulties for human reader of the PPG pulse wave. Peculiarity in certain pulse wave may rise simply because of affected sampling due to sensor displacement but sure causes distraction to human readers. It is therefore of practical use to extract feature waveform from vast PPG pulse wave data for the purpose of improving productivity of human readers.
We propose in the first part of this paper clustering algorithms to extract PPG pulse waves characteristics using three time domain feature parameters: K1, K2 and K, where K1 represents the systolic area, K2 denotes the diastolic area, whereas K holds the entire pulse wave area. An improved K-MEANS algorithm is adopted to extract the feature waveforms out of the pulse wave sets given the same light intensity. We present detailed algorithm implementations and the average confidence level achieved of more than 90%.
We calculate the wavelet entropy of the characteristic waveform using the sta-

Time Domain Feature Parameters
We adopt three time domain feature parameters: K1, K2 and K, where K1 represents the systolic area, K2 denotes the diastolic area, whereas K holds the entire pulse wave area. In medical sense pulse wave area K represents characteristics of microcirculation in general but does not reflect the correlation of other feature points and areas of the whole pulse wave. We then divide the pulse wave area into 2 parts, where K1 is the systolic area, and K2 is the diastolic area.
We calculate the K1, K2 and K with reference to Figure 1. K is the ratio of the area S ABCDE vs. area of rectangular AHFE, denoted as below  , where x A denotes the start of the pulse wave segment, whereas x E denotes the end of the segment, G(t) is the function over time of the pulse wave. Consequently, the K1 and K2 can be calculated as follows.
where area S ABC is area starting from the X A to the dicrotic notch, whereas S CDE is the area covering dicrotic notch to X E .

Improved K-MEANS
It is well understood that dirty data affects the clustering results with K-MEANS algorithms; while PPG measurement is prone to noise caused by many factors. We propose an improved K-MEANS algorithm by introducing updated sample center and thresholds after each round of clustering calculation in order to achieve more accurate clustering. Such an improved algorithm is less sensitive to noise and dirty data at the expense of more computing. Fortunately in our case we just have small set of data, hence it is more appropriate to land on the algorithm. Figure 2(a) depicts the improved K-MEANS results whereas Figure 2(b) depicts K-MEANS results.
The confidence level of improved K-MEANS is much higher than that of the standard K-MEANS algorithms.

Stationary Wavelet Transform
Wavelet transform [6] combines both time and frequency domain together to describe the localized variation of power analysis. Wavelet provides multi-resolution analysis of pulse wave hence makes the result more insights for feature extractions. We adopted stationary wavelet transform, a.k.a., binary wavelet transform or non-decimated wavelet transform, which stops down sampling hence upon each transformation, maintains the same length as the original signal, preserve most valuable information (Figure 3).

Wavelet Entropy
We calculate the wavelet entropy as follows.  where j denotes the layers of the signal decomposition (j = 1, 2, 3, ... , 5); k is length of the original signal (k = 1, 2, 3, ..., 512); W E denotes the wavelet entropy, E j denotes total energy at each layer, P j is the probability of layer j's energy vs. total energy.

 Data Preparation
PPG measurement can be affected by many factors, including pathophysiological condition and environmental condition upon test, among which age and blood pressure are key factors. We picked 23 healthy (coronary artery normal or mild stenosis) participants and 23 unhealthy (severe coronary artery stenosis) participants all at age of 50 -70.

 Test Results
The Mean, Variance and Standard Deviation of Wavelet Entropy for healthy and unhealthy participants are listed in Table 1.
The fact that mean value of wavelet entropy of healthy people is less than that of unhealthy people implies that healthy people's PPG pulse wave is more stable than that of unhealthy people.

Probabilistic Neural Networks (PNN)
Probabilistic Neural Networks (PNN) [7] is a simple network which can be implemented using linear algebra computation and applicable to classification. As depicted in Figure 4, five layers are input layer, normalization layer, hidden layer, summation layer, and output layer.   Table 2. Unhealthy People's Parameters are listed in Table 3. It is obvious that the time domain parameters listed above for healthy and unhealthy people vary in different degree; hence it is difficult to derive any valuable information alone. As a result, we use all these time domain parameters together with wavelet entropy as inputs to the PNN for classification of PPG pulse wave.

PPG Classification w/PNN
Our test consists of 13 samples as input for training, 10 samples for classification. Results of classification are listed in Table 4.
It clearly demonstrated that classification results in 60% accuracy for healthy people and 80% accuracy for unhealthy ones. The reason for this is that there are clear standard to define unhealthy (stenosis) but not for the healthy ones.  Where, N denotes healthy people, P denotes unhealthy people; "+" denotes coronary artery normal; "-", denotes severe coronary arterystenosis; "_" denotes misclassification.

Conclusion
The feature extraction and classification methodology for PPG signals using improved K-MEANS improved algorithm, stationary wavelet transform and PNN modelling is easy to implement and effective to use. Time domain parameters and frequency domain wavelet entropy are appropriate data set for PNN modelling to achieve acceptable classification results. We see all this as a start for further work to gain more insights into pathophysiological indication of PPG pulse wave.