Automatic Detection of Learner Engagement Using Machine Learning and Wearable Sensors

Training can now be delivered on a large scale through mobile and web-based platforms in which the learner is often distanced from the instructor and their peers. In order to optimize learner engagement and maximize learning in these contexts, instructional content and strategies must be engaging. Key to the development and study of such content and strategies, and adaptation of instructional techniques when learners become disengaged, is the ability to objectively assess engagement in real-time. Previous self-reported metrics, or expen-sive EEG-based engagement measures are not appropriate for large-scale platforms due to their complexity and cost. Here we describe the development and testing of a measurement and classification technique that utilizes non-invasive physiological and behavioral monitoring technology to directly assess engagement in classroom, simulation, and live training environments. An experimental study was conducted with 45 students and first responders in a unmanned aircraft systems (UAS) training program to assess the ability to accurately assess learner engagement and discriminate between levels of learner engagement within classroom, simulation and live environments via physiological and behavioral inputs. A series of engagement classifiers were developed using cardiovascular, respiratory, electrodermal, movement, and eye-tracking features that were able to successfully classify engagement levels at an accuracy level of 85% with eye-tracking features included or 81% without eye-tracking features. This approach is capable of monitoring, assessing, and tracking learner engagement across learning situations and contexts, and providing real-time and after action feedback to support instructors in modulating learner engagement.


Introduction
Digital technologies are increasingly being exploited in self-directed or guided educational settings to provide individualized opportunities for learning. Emerging technology can now deliver training on a larger scale through platforms such as massive open online courses (MOOCs), instructional use of games, and virtual classrooms [1]. These emerging training environments rely heavily on learner-initiated involvement and motivation before, during, and after training [2]. Self-motivation guides personal effort and the allocation of resources toward work [3], and leads to significantly greater declarative knowledge and skill acquisition, higher post-training self-efficacy, and improved performance [4]. Personal motivation represents the driving force behind learning activities and can lead to learner engagement, a person's active involvement in a learning activity [5]. High levels of engagement have been associated with a state of flow, in which an individual becomes completely engaged in a task marked by high levels of interactivity, challenge, and feedback; a state which has been found to lead to improved task performance and learning outcomes [6]. Learner engagement is influenced by a range of factors related to the individual learner, the learning tasks, and the learning environment [7]. The ability to measure and optimize learner engagement during training holds the potential to increase the transfer of training to practice, leading to enhanced digital training effectiveness [6].
A number of approaches to measure learner engagement have been reported previously. Self-report questionnaires and interviews provide non-invasive and inexpensive methods to assess learner engagement, but do not provide results in real-time [8], and are associated with bias which affects the reliability and validity of such approaches [9]. Analysis of facial expressions using computer vision can determine affective components of learner engagement and can be applied to groups [10], but may be considered obtrusive and suffer from privacy concerns [11]. Cognitive engagement classifiers have been reported using laboratory grade physiological sensors, but require high cost sensor technology, and are complex from a usability and data analysis perspective. For example, a previous electroencephalography (EEG)-based engagement classifier returns a 3-level score between relaxed wakefulness and high engagement based on individual performance in a protracted psychomotor vigilance task [12]. However, the expense associated with EEG equipment, complex data interpretation, and sensor montaging have limited the use of EEG in learning settings. Alternatively, there are a range of physiological measures that provide an opportunity for validly and non-invasively measuring engagement. Measures such as electrodermal activity (EDA) [13], heart rate and heart rate variability (HRV) [14], and gross body movement [15] have been used to monitor learner engagement and learning performance. Previous research has also indicated that analysis of eye tracking data can offer insight in-to engagement during gaming [16], online search [17], and online conversations [18]. While there is theoretical support for the use of these measures to capture engagement, to date there is limited empirical research identifying valid measures of engagement that are noninvasive, easy to use, and scale to large populations.
There is a need to evaluate emerging sensor technology to measure learner engagement objectively in real-time, via noninvasive, wireless sensors. Here we describe the development and testing of a measurement and classification technique that utilizes non-invasive physiological monitoring technology with subjective self-report measures to directly assess engagement in classroom, simulation, and live training environments to support instructor interventions to increase engagement. Using unmanned aircraft systems (UAS) instructional materials, we delivered instructional content via video-based, simulated flight, and live flight training tasks that were designed to vary in cognitive engagement based on varying levels of interactivity, challenge, feedback, and immersion. We hypothesized that participant engagement, and associated participant physiology and behavior, would differ between tasks, with increasing levels from video to simulation to live training. This was also hypothesized to provide features necessary to develop a high accuracy classifier of engagement state.

Participants
All methods involving participants were approved by an Institutional Review Participation was voluntary and all student participants were given the option to perform an alternate task if they were not interested in participating in the study.
Student participants were given extra credit in the UAS course for their participation.

Materials
A demographic survey was used to determine participant gender, age, and UAS experience. The Flow Short Scale (FSS) was used to assess self-reported engagement [19], as a state of flow is proposed to represent a high level of learner engagement [7]. A Dell Precision M2800 laptop PC, with 8 GB RAM and a quad core CPU was used to provide video and simulation training.  during computer-based tasks (video training and simulation, described below).

Experimental Procedure
Upon arrival participants provided written, informed consent and were briefed on the experimental learning tasks. Participants then responded to a demographic questionnaire and were fitted with the Equivital device. Participants were instructed to stand comfortably and read a magazine for 5 minutes while a physi-ological baseline was collected. After the baseline, participants were directed to the adjacent room for a series of computer-based UAS tasks with varying levels of engagement. Participants stood in front of a workstation, with height adjusted to ensure adequate alignment with the eye tracker, and the eye tracker was calibrated. Participants then watched a series of videos on how to use the UAS controllers while holding the UAS controller and mimicking the movements displayed in the videos for a duration of six minutes. The video training task was hypothesized to have low levels of engagement, due to low levels of interactivity, challenge, goal clarity, feedback, and immersion. The next task consisted of performing a series of UAS obstacles in the Real Flight 7.5 simulator, and was hypothesized to have moderate to high levels of engagement due to high levels of interactivity, challenge, goal clarity and feedback and moderate levels of immersion. Finally, participants moved to a live, indoor, high-bay, learning environment where they completed the live training task, which included flying a UAS quadcopter through a series of physical obstacle courses. The live task was hypothe- All participants completed the tasks in the same order. Eye-tracking was not captured during the live training task.

Data Modeling
Physiological data analysis and classifier development consisted of synchronization between data capture systems, feature extraction, and data modeling. Data analysis was implemented in Python 3.7 with scikit-learn [20]. Data from each sensor was graphically visualized and synchronized with events of interest including: baseline; video training; simulation training; and live training. Any epochs of signal loss, such as when participants were not looking at the screen for eye tracking, were also identified.
In order to preserve temporal relationships among variables, timestamps for data arising from the Equivital system were synchronized. Respiration rate and heart rate were up sampled with linear interpolation to 0.133 Hz. The Equivital and EyeTech systems were synchronized with an NIST clock and the eye-tracker was adjusted to synchronize with the Equivital clock. Classification was performed on 30 second blocks of data and synchronization errors were constrained below 5 seconds.
Features were extracted from the raw physiological data, baseline normalized, and updated every 30 seconds to provide input to an engagement classifier. Features which were not predictive of the user state were removed, and dimensionality reduction was performed to combine features as needed. A random forest algorithm was utilized to rank feature significances collected from the Equivital and eye tracking systems. Features from the Equivital system included mean EDA, respiration rate, accelerometer signal magnitude area (SMA), mean heart rate (HR), QRS duration, QRS sum [21], and heart rate variability visual degrees and >100 ms [22].
Once the features were selected, two separate models were trained: one in which the participants' data were broken down by task into 2 states (low and high engagement) corresponding to the video and simulation tasks due to the lack of eye tracking data in the live task; and one in which the participant's data were broken down by tasks into 2 states (low and high engagement) corresponding to the video and live tasks for situations in which eye-tracking is not feasible. To predict the engagement state of the participants, the extracted features were subjected to a logistic regression analysis. A series of models were trained to map the 5 selected continuous physiological variables to two discrete states (low/high engagement) using 5 fold cross-validation, iteratively repeated 100 times, to minimize bias and variance. Final model parameters were taken from the model with the highest classification accuracy across both testing and training sets.

Data Analysis and Statistics
Participant performance was assessed using two measures. First, outcome performance was measured based on the number of challenge levels that a participant successfully completed, out of a maximum of 10, with participants receiving 10 points per level completed. Second, participant errors were assessed using a count of the number of times that the UAS bumped into an object or crashed.
Differences between training task self-reported engagement were evaluated using repeated measures ANOVA with α = 0.05. Differences in performance between the video and simulation tasks were analyzed using repeated measures t tests with α = 0.05. Statistical analyses were performed in IBM SPSS Statistics version 23.

Results
The sociodemographic factors in the study sample are listed in Table 1. Four participants were removed from the study due to missing physiological data, leaving forty-five participants for data analysis and modeling. Most participants were male, of an average age 25.7 ± 9.9 years, and had some previous UAS expe-   Average, session-level physiological and behavioral data is shown in Table 2.
Features were extracted from individual, baseline-normalized physiological and behavioral data and used to train an algorithm of engagement state.
All data used to train the model was labeled either as high (sim or live) or low

Discussion
The ability to measure and ultimately optimize learner engagement is considered to be one of the most effective countermeasures in combating student dropout, disaffection, and low performance [11]. Our results suggest that utilizing low non-invasive physiological measures that capture electrodermal and cardiovascular measures can provide a window into the real-time state of learner engagement. As predicted, results indicated that learner engagement levels significantly increased with increasing levels of interactivity, challenge, goal clarity, feedback and immersion. In addition, features derived from physiological sensing and eye-tracking equipment were able to successfully classify engagement with high accuracy.
In the current effort we developed a two class model of engagement using physiological and behavioral features, which classified engagement as high or low, with 85% accuracy with eye-tracking features included, and 81% accuracy without eye-tracking features. Both models are based on significant differences observed in self-reported engagement, user physiology, and task performance between video, simulated, and live environments. Both simulated and live environments indicating significant increases in engagement as compared to video, and were labeled as high engagement in data modeling. Similar levels of accuracy have been reported recently in measuring engagement using features derived from wearable physiological sensors, touchscreens, and optical biomechanics [8], or using EDA only [11]. Given the physiological and behavioral differences observed between scenarios of increasing engagement, we leveraged a random forest classifier to select classifier features. As compared to other feature selection techniques, random forest approaches have shown the highest accuracy in modeling user state including engagement [8].  [26], and from eye-tracking [27]. Our results suggest a decrease in heart rate and an increase in HRV occurs during increasingly engaging tasks, likely associated with increasing parasympathetic activity [24]. Our results also indicate a decrease in fixations and increasing fixation duration with increasing engagement. Such results agree with theory that posits that individuals will direct visual attention to salient images under engaging conditions because they are thinking more deeply about those images [27]. Individuals reported increasing levels of engagement across video, simulation and live tasks, as measured using the FSS. Underlying these results are aspects of interactivity, task challenge, goal clarity, feedback and immersion. Interactivity has been shown to impact engagement by providing the individual with control in the learning task through technology, leading to increased learner motivation, interest, and learning gains [28] [29]. The level of challenge associated with a learning task is related to engagement as the presence, level, and appropriateness of a challenge can lead to intrinsic motivation, flow and micro-engagement [30] [31] [32]. However, the level of challenge must be appropriate to an individual's capabilities to increase engagement. A task where the challenge is too advanced leads to a stress response which can reduce the motivation of the individual [26], [33] and reduce motor and cognitive performance [34]. If a task is not challenging enough or too easy it may produce apathy and a reduced attention due to a lack of motivation [33] [35]. An important facet in challenge is that as the individual learns, the challenge generally becomes easier, thus requiring the challenge to grow with the individual's progress [36] [37] [38]. Increasing challenge between the simulation and live flight tasks was associated with decreasing performance. Further, engagement levels can also be positively influenced by presenting clear goals and performance feedback as these elements can improve learner motivation and strategy use [39]. Finally, immersion has been shown to influence engagement through increased interest and motivation towards learning tasks, ultimately leading to higher engagement [40] [41].
In the current effort, we were able to achieve high accuracy classification of engagement similar to recent reports [8] [11]. However, results of this study and the implications for use in educational settings should be interpreted with caution. The inclusion of eye-tracking features in this modeling effort are difficult to scale, and are associated with necessary calibration and restriction of user head movements [10]. Similarly, physiological sensors add cost, are difficult to scale, have varying levels of data quality [42], and may be associated with privacy issues [43].
In summary, this research demonstrated the capability of monitoring, and assessing individualized learner engagement across learning situations and contexts using physiological and behavioral inputs. Application of such an approach can facilitate to support them in determining where they need to adjust training to optimize learner engagement.