A Modified CNN Network for Automatic Pain Identification Using Facial Expressions

Pain is a strong symptom of diseases. Being an involuntary unpleasant feeling, it can be considered a reliable indicator of health issues. Pain has always been expressed verbally, but in some cases, traditional patient self-reporting is not efficient. On one side, there are patients who have neurological disorders and cannot express themselves accurately, as well as patients who suddenly lose consciousness due to an abrupt faintness. On another side, medical staff working in crowded hospitals need to focus on emergencies and would opt for the automation of the task of looking after hospitalized patients during their entire stay, in order to notice any pain-related emergency. These issues can be tackled with deep learning. Knowing that pain is generally followed by spontaneous facial behaviors, facial expressions can be used as a substitute to verbal reporting, to express pain. In this paper, a convolutional neural network (CNN) model was built and trained to detect pain through patients’ facial expressions, using the UNBC-McMaster Shoulder Pain dataset. First, faces were detected from images using the Haarcascade Frontal Face Detector provided by OpenCV, and preprocessed through gray scaling, histogram equalization, face detection, image cropping, mean filtering, and normalization. Next, preprocessed images were fed into a CNN model which was built based on a modified version of the VGG16 architecture. The model was finally evaluated and fine-tuned in a continuous way based on its accuracy, which reached 92.5%.


Introduction
Humans move their facial muscles, either spontaneously or purposefully, to con-How to cite this paper: Karamitsos, I., Seladji, I. and Modak, S. (2021) A Modified CNN Network for Automatic Pain Identification Using Facial Expressions. Journal of Software Engineering and Applications, Journal of Software Engineering and Applications vey a certain emotional state (e.g., sadness, happiness, fear, disgust, pain) in a nonverbal way. These facial moves are called facial expressions. Facial expressions vary between different species and humans as well; they can be affected by a person's age, gender, psychological state, personality, and social situations.
Moreover, they can either be innate or acquired through the influence of someone else's. Humans have the ability to discern hidden feelings and fake emotions in some cases, especially when these are expressed by someone with whom they have strong relationships. However, the automation of such tasks is very laborious and challenging.
Facial expressions can be regarded as an effective alternative to verbal communication. For instance, paralyzed people can communicate through eye contact and eye movements. Therefore, facial expressions are very important and worthy to be interpreted by machines, and one of the applications in which they are involved in the detection of pain.
Pain is an unpleasant feeling which is triggered by an anomaly in the body. This anomaly can either be medical (e.g., an injury), or emotional (e.g., stress and depression which can cause terrible headaches). When nerves detect tissue damage or irritation, they send information through the spinal cord to the brain, thus causing humans to react to that anomaly. Pain is either expressed verbally or physically, through facial expressions.
Pain can vary from being slightly annoying to debilitating. Regardless of its intensity, it gives a strong and reliable message that something within the body is malfunctioning and needs to be cured. Additionally, it can affect a person's behavior, memory, concentration, and intellectual capacities. Hence, it should never be neglected and needs to be taken seriously and treated promptly.
In this regard, many entities around the world work on improving pain relief with the help of researchers and professionals involved in the diagnosis and treatment of pain, because they believe that the relief of pain is a human right; the International Association for the Study of Pain (IASP) [1] is one of them, and it is very active in this sector, one of its main events is the biennial World Congress on Pain.
The main motivation for this research is based on providing quality healthcare services, managing patients' conditions and doing the right diagnoses at the right times is pivotal in life-saving.
COVID-19 has been a paramount motivator for this study. With an unmanageable number of daily cases and the necessity of physical distancing, many countries found the perfect opportunity to switch to tech-enabled and AI-empowered solutions. From online businesses and Chabot-assisted services to robots as assistants to automatic temperature checks in public areas to real-time identification of undisciplined citizens, numerous innovative solutions have emerged to contain the spread of the virus. Having different impacts from each other, each solution proved its worth during these unprecedented times.
In Being ahead of research topics of the decade, computer vision-empowered pain assessment is in full effervescence. Yet, it is still not widespread in medical centers due to its incompleteness and the necessity to be fused with other disciplines, such as psychology, which makes it a very intriguing and challenging topic that deserves intensive research. This has also been a great motivator to start this research project.
The contribution of this study is to build a reliable pain assessment system based on patients' facial expressions and become a game-changer in healthcare.
This study can be broken down into three major contributions, which are:  Giving the opportunity to all patients, regardless of the language they speak as well as their psychological and physical conditions, to express their pain accurately and receive the right medical care at the right time.  Constantly looking after patients and notifying medical staff about emergencies while keeping them focused on their main tasks.  Avoid hiring more medical staff whose sole task is to ceaselessly look after patients and notify doctors only in case of an emergency.
The novelty of this study is the design of a deep learning classification model based on a tailored CNN architecture to be used as a pain assessment tool, in the following way: a camera would be recording patients when they are not under any supervision and sending input frames to the classification model which will process them in real-time and classify them into pain/no-pain images, and ultimately, notify doctors if the pain is detected.

Related Work
The field of image processing is in constant development and research. Whether it is applied in medicine, security, satellite imagery analysis or seismic imaging, thousands of researchers are pioneering daily and coming across new methods and approaches to give the advantage of computer-aided imaging. Medical imaging is one such field that requires daily improvements and growth, due to its complexity, diversity of cases and importance of accuracy. Numerous studies on pain detection using facial expressions have been conducted on diverse data sets using different data analytics approaches.
Sourav Dey Roy et al. [2] have conducted their research on the UNBC-McMaster Shoulder Pain Expression Archive Database [3] in the following way: First, they converted all image frames into grayscale images. After that, they performed shape alignment using the Generalized Procrustes Analysis (GPA). Next, they I. Karamitsos et al. Journal of Software Engineering and Applications applied texture warping, in which the texture of all images is warped with respect to the base shape using an affine warping. Their methodology achieved an accuracy of 82.43% for the pain level estimation.
Xiaojing Xu et al. [4] developed an ensemble learning model based on the UNBC-McMaster Shoulder Pain Expression Archive Database. They performed face detection using the cascade DPM face detector and used five metrics to predict the level of pain: the Prkachin and Solomon Pain Intensity (PSPI) score, the Visual Analog Scale (VAS) score, the Observers Pain Rating (OPR) score, the Affective Motivational Scale (AMS) score and the Sensory Scale (SEN) score.
Lijun Yin et al. [5] developed their own 3D Dynamic Facial expression database using the Di3D (Dimensional Imaging 3D) face capturing system including six universal expressions: Anger, disgust, fear, happiness, sadness and surprise. One of the methodologies proposed by Laduona Dai et al. [7] is an Action Units based method. First, Features/AUs were extracted using software called Zhanli Chen et al. [11] came up with a different approach from the ones cited previously, which they implemented based on the UNBC-McMaster Shoulder Pain Expression Archive Database. Instead of relying on pain intensity scores to classify images and video sequences, they detected individual pain-related AUs by themselves and combined them using two different structures (compact and clustered). After that, they used two different frameworks, namely the Multiple Instance Learning (MIL) and Multiple Clustered Instance Learning (MCIL) to train their models based on low-dimensional features. Their classifier achieved an accuracy of 87%.
Reza Kharghanian et al. [12] used an unsupervised learning approach to classify unlabeled images. First, they extracted features of shape and appearance separately from faces, using a Convolutional Deep Belief Network (CDBN). Those extracted features were then used to train an SVM model with a linear kernel that has two output classes (pain/no-pain). Their model was tested on the UNBC-McMaster Shoulder Pain Expression Archive Database, and it achieved an accuracy of 87.2%.

Proposed CNN Modified Architecture
CNNs have been used previously to solve the automatic pain assessment problem, using the UNBC-McMaster Shoulder Pain dataset, some of which got inspired by the famous VGG16 architecture [13], shown in Figure 1.
Based on existing architectures and after many trials and hyperparameter tunings, the modified architecture shown in Figure 2 has been selected for this study.
This proposed architecture was inspired by the VGG16 architecture and modified to fit the pain detection problem, by replacing the number of neurons in the second fully connected layer (FC2) with 1000 (against 4096 in VGG16) and the number of outputs with 2 for the two classes, pain and no-pain.
In this proposed architecture (Figure 2), images are initially preprocessed and resized to 224 × 224 pixels. After that, they are fed into the CNN model which processes them as follows:  Images go through two convolutional layers, with 64 (3 × 3) filters, a stride of 1 and a padding of 1, followed by a max-pooling layer with a filter size of 2 × 2 and a padding of 2.   The model's hyperparameters have been fixed at the following values:  Activation function: ReLU + Softmax in the last fully connected layer.
 Loss function: Cross-entropy loss.
 Optimizer: Adam.  Use of batch normalization: Batch normalization allows the model to be trained on separate mini-batches. As a consequence, weights are updated between batches, so the number of epochs required to train the model is minimized.  Use of dropout regularization: In our case, since the number of examples is limited, the dropout regularization will be used to reduce overfitting. Dropout drops random nodes during training; thus, a single model can be seen as a number of different simulated architectures, with less computing power required to train actual models.

Dataset
The data used to fulfill this study is the UNBC-McMaster Shoulder Pain Expression Archive Database [3]. It has been collected by researchers from McMaster University and the University of Northern British Columbia and shared for research purposes in collaboration with Carnegie Mellon University and the University of Pittsburgh [14].
A total of 100 patients who were suffering from shoulder pain caused by arthritis, bursitis, tendonitis, subluxation, rotator cuff injuries, impingement syndromes, bone spur, capsulitis and dislocation, underwent the following rangeof-motion tests on both affected and unaffected limbs:  Abduction: In abduction movements, the arm is lifted forward and up in the sagittal plane (i.e., the plane which divides the body into right and left parts). Journal of Software Engineering and Applications  Flexion: In flexion, the humerus (or upper arm) moves forward from the rest of the body.  Internal (resp. external) rotation of each arm separately: Involves bending the arm 90 degrees at the elbow, abducting it by 90 degrees and finally turning it internally (resp. externally).
All those tests were performed under active and passive conditions. In active tests, patients had to perform movements by themselves to the best of their ability (i.e., until pain would prevent them from doing further movements), whereas in passive tests, a physiotherapist would move patients' limbs until the maximum range would be reached or they would be stopped by patients themselves who would not bear the pain anymore. For each test, a video sequence was recorded with frequent changes in pose, as we can see in Figure 3.
The pitch, yaw and roll in frames taken from all sequences range between −40 and 30. According to Lucey et al. [14], head movements coincide with painful facial expressions.
A total of 200 video sequences containing spontaneous facial expressions related to genuine pain were recorded for the 100 patients. Those sequences were rated by observers and pain was self-reported by patients using four different types of assessments, to make sure that the rating is as accurate as possible: 1) The Sensory Scale (SEN): Used to reflect the pain intensity. It starts at "extremely weak" and finishes at "extremely intense".
2) The Affective-Motivational Scale (AFF): Used to reflect the unpleasantness incurred by the pain. It starts at "bearable" and finishes at "excruciating".
3) The Visual Analogue Scale (VAS): Gives more flexibility to the patient to rate pain in the most accurate way possible by providing a 10 cm scale anchored at each end with the words "No Pain" and "Worst Pain", on which patients can select the most accurate intensity, even if it ranges between two specific intensities (e.g., between "moderately strong" and "very strong").

4) Observers Pain Intensity (OPI): Trained observers rated patients' pains
using a 6-point scale ranging from 0 (no pain) to 5 (strong pain). A number of those ratings were rated by a second rater to assess their reliability, and the Pearson correlation between both ratings was 80%. In addition, the correlation Figure 3. Pitch, yaw, and roll of frames in the UNBC-McMaster dataset (Source [14]). Journal of Software Engineering and Applications between the OPI and patient self-reported VAS was 74%, which is higher than the high concurrent validity threshold (70%) [14]. Thus, sequence-level ratings can be considered trustworthy. A total of 48,398 frames were captured from each sequence and coded into Action Units (AUs) by certified Facial Action Unit System (FACS) coders.

Pain Assessment
We need to introduce three basic components for pain assessment, namely action units (AUs) and Facial Action Coding system (FACS), and the Prkachin and Solomon Pain Intensity (PSPI) score [16].
Action Units are the fundamental actions involving one or multiple muscles in response to a certain feeling, such as cheek raising, lip stretching and head left turning. AUs are encoded using the FACS. Prkachin and Solomon [16] found that the most representative AUs for pain are brow lowering (AU4), orbital tightening (AU6 and AU7), levator contraction (AU9 and AU10) and eye closure (AU43). Based on that, they defined the following pain formula:

( ) ( ) ( )
Pain AU 4 AU6 or AU7 AU9 or AU10 AU43 = + + + (1) where AU(4) and AU(43) are always present in pain, one of AU6 and AU7 and one of AU9 and AU10 must be present too. The highest intensity would be selected if both are present. Each AU is scored on a 6-point scale based on its intensity ranging between "a = 0" for absent intensity and "f = 5" for maximum intensity, except for AU43 which has a binary intensity: 0 (absent) or 1 (present) [16]. For instance, AU6d refers to cheek rising with an intensity of 3.
The Prkachin and Solomon Pain Intensity score (PSPI score) [16] was introduced as the only metric for pain intensity based on facial expressions.
From Formula (1), the PSPI score formula is: Consequently, if a frame is coded as: Then, its PSPI score is derived as:

Exploratory Data Analysis
As mentioned in the previous section, the UNBC-McMaster dataset has 100 subjects (patients) who underwent different kinds of tests in different sequences, from which a sample of frames has been captured. The average size of each frame is 320 × 240 pixels. Figure 5 shows the number of frames captured for each patient (subject).
Based on Figure 5, we can observe that subjects 14, 16 and 19 have the highest number of recorded frames, as opposed to subjects 23, 7 and 12, which have the lowest number of frames. This means that if the training set comprises frames of subject 16 and 23 for example, the model would perform better in the classification of frames of subject 16 than those of subject 23, because it was trained on  shown in Figure 6. We can note from Figure 6 that the dataset has imbalanced classes. There is a clear dominance of "No Pain" frames over "Pain" ones. If the model is trained on more "No Pain" examples, it would be biased towards detecting "No Pain" faces, which means that it has more chances to misclassify a "Pain" example as "No Pain", because it got accustomed to "No Pain" examples. In order to have a fair prediction model, to some extent, we will train our model on as many pain examples as no pain ones. Journal of Software Engineering and Applications

Data Processing
Real-world data is often subject to errors, noise and outliers. Before it can be visualized or processed, it needs to be preprocessed and cleaned. Preprocessing is a crucial phase in data analysis. It allows us to do a data quality assessment and resolve all issues which might affect the performance of our model. For numerical and categorical data, the following data dimensions are assessed: The process is quite different with images. Images usually have to undergo a couple of transformations such as filtering, cropping, resizing, color grading, rotating and mirroring, mainly to be simplified for faster processing, to be formatted or to be adapted to a certain machine learning algorithm. The following preprocessing steps have been used in this study using the OpenCV library [17] as depicted in Figure 7. Journal of Software Engineering and Applications

Data Modeling
The CNN model was built based on the architecture described in Section 3, trained on equally-distributed images for each class, using batches of 10 images, and evaluated using the cross entropy loss and the accuracy of each batch in each epoch, as shown in Figure 8 and Figure 9.
From Figure 8, we can notice a random variation in the accuracies, as the model was being trained on each batch for the first time (first epochs). After about 550 iterations, the learning accuracies started ranging between 80% and 100%, with two falls to 70%, which exhibits success in the model's hyperparameter tuning.
Similarly, losses started from around 0.7 in the first epochs, as we can see in

Performance Analysis
After the training, the model was evaluated on a sample of 400 images based on its accuracy, sensitivity and specificity.   Table 3. Comparative study with other researchers' results.

Conclusions and Future Work
In this paper, we have presented a modified CNN model for the automatic pain detection problem. It was built and trained to detect pain through patients' facial expressions, using the UNBC McMaster Shoulder Pain Expression Archive database. Our proposed model can alert medical staffs in a timely manner regarding patients' conditions based on their facial expressions. The model achieved an accuracy of 92.5% which stands in competition with other researchers' results.
As we can see, the model is very specific, with an average specificity of 100% based on the testing sample. This means that the model has a perfect performance when no pain is expressed, which in its turn means that the pain detection system is most likely never going to cause false alarms to the medical staff. As for the sensitivity, the model detects pain in 86.96% of pain situations, which is still excellent, since we know that weak and mild pains do not always infer an emergency. Thus, the error is permitted by the model in some cases, where pain is not expressed with strong intensity. In future improvements, more emphasis will be put on testing different architectures to extract more features from the facial frames. Additionally, we will consider using Conditional Generative Adversarial Networks (CGANs) to generate synthetic facial images, in order to strengthen the training set.

Strengths & Limitations
Despite the fact that our dataset had very limited examples (frames from 25 patients only), it could still generalize well and gave excellent and reliable results on newly seen faces, using the least complex architecture possible for such a complex classification problem. However, we would get a much better performing model if we could combine our dataset with other more diversified datasets and give a try to different architectures, which are more adapted and tailored to the pain detection problem than a simple general architecture, which has proven its success with many computer vision problems.