Automatic Detection of Learner Engagement Using Machine Learning and Wearable Sensors ()
1. Introduction
Digital technologies are increasingly being exploited in self-directed or guided educational settings to provide individualized opportunities for learning. Emerging technology can now deliver training on a larger scale through platforms such as massive open online courses (MOOCs), instructional use of games, and virtual classrooms [1]. These emerging training environments rely heavily on learner-initiated involvement and motivation before, during, and after training [2]. Self-motivation guides personal effort and the allocation of resources toward work [3], and leads to significantly greater declarative knowledge and skill acquisition, higher post-training self-efficacy, and improved performance [4]. Personal motivation represents the driving force behind learning activities and can lead to learner engagement, a person’s active involvement in a learning activity [5]. High levels of engagement have been associated with a state of flow, in which an individual becomes completely engaged in a task marked by high levels of interactivity, challenge, and feedback; a state which has been found to lead to improved task performance and learning outcomes [6]. Learner engagement is influenced by a range of factors related to the individual learner, the learning tasks, and the learning environment [7]. The ability to measure and optimize learner engagement during training holds the potential to increase the transfer of training to practice, leading to enhanced digital training effectiveness [6].
A number of approaches to measure learner engagement have been reported previously. Self-report questionnaires and interviews provide non-invasive and inexpensive methods to assess learner engagement, but do not provide results in real-time [8], and are associated with bias which affects the reliability and validity of such approaches [9]. Analysis of facial expressions using computer vision can determine affective components of learner engagement and can be applied to groups [10], but may be considered obtrusive and suffer from privacy concerns [11]. Cognitive engagement classifiers have been reported using laboratory grade physiological sensors, but require high cost sensor technology, and are complex from a usability and data analysis perspective. For example, a previous electroencephalography (EEG)-based engagement classifier returns a 3-level score between relaxed wakefulness and high engagement based on individual performance in a protracted psychomotor vigilance task [12]. However, the expense associated with EEG equipment, complex data interpretation, and sensor montaging have limited the use of EEG in learning settings. Alternatively, there are a range of physiological measures that provide an opportunity for validly and non-invasively measuring engagement. Measures such as electrodermal activity (EDA) [13], heart rate and heart rate variability (HRV) [14], and gross body movement [15] have been used to monitor learner engagement and learning performance. Previous research has also indicated that analysis of eye tracking data can offer insight into engagement during gaming [16], online search [17], and online conversations [18]. While there is theoretical support for the use of these measures to capture engagement, to date there is limited empirical research identifying valid measures of engagement that are noninvasive, easy to use, and scale to large populations.
There is a need to evaluate emerging sensor technology to measure learner engagement objectively in real-time, via noninvasive, wireless sensors. Here we describe the development and testing of a measurement and classification technique that utilizes non-invasive physiological monitoring technology with subjective self-report measures to directly assess engagement in classroom, simulation, and live training environments to support instructor interventions to increase engagement. Using unmanned aircraft systems (UAS) instructional materials, we delivered instructional content via video-based, simulated flight, and live flight training tasks that were designed to vary in cognitive engagement based on varying levels of interactivity, challenge, feedback, and immersion. We hypothesized that participant engagement, and associated participant physiology and behavior, would differ between tasks, with increasing levels from video to simulation to live training. This was also hypothesized to provide features necessary to develop a high accuracy classifier of engagement state.
2. Methods
2.1. Participants
All methods involving participants were approved by an Institutional Review Board (IRB; Florida Institute of Technology [FIT]). A priori power analysis with α = 0.05, β = 0.8, and d = 0.4 indicated a sample size of 41 participants needed, with an additional 20% included to address participant attrition. Forty-nine adult participants from the Melbourne, FL, USA area were recruited from an FIT UAS course, FIT drone club, and UAS operators from local first responder agencies. Participation was voluntary and all student participants were given the option to perform an alternate task if they were not interested in participating in the study. Student participants were given extra credit in the UAS course for their participation.
2.2. Materials
A demographic survey was used to determine participant gender, age, and UAS experience. The Flow Short Scale (FSS) was used to assess self-reported engagement [19], as a state of flow is proposed to represent a high level of learner engagement [7]. A Dell Precision M2800 laptop PC, with 8 GB RAM and a quad core CPU was used to provide video and simulation training. Simulation training used Real Flight 7.5 software, along with a U818A-1 quadcopter controller (UDIRC Technology, Guangdong, China). For live UAS training, the U818A-1 quadcopter was flown through a series of physical obstacles in a high bay learning environment (Figure 1).
![]()
Figure 1. Experimental tasks. Top panel: participants viewed instructional flight videos while holding UAS controllers and mimicking movements in the video task. Center panel: participants controlled a virtual UAS in a number of flight tasks within Real Flight 7.5 software as part of the simulation task. Bottom panel: participants controlled a UAS through a series of physical obstacles as part of the live flight task. Depicted participants provided written informed consent for publication of this figure.
The Equivital EQ02 system (Hidalgo; Cambridge, UK) was used to collect electrocardiography (ECG; 256 Hz), electrodermal activity (EDA; 2 Hz), respiratory rate (0.0667 Hz), and accelerometry (25.6 Hz). The VT3 mini eye tracker (EyeTech; Mesa, AZ, USA) was used to quantify participants’ gaze location (45 Hz) during computer-based tasks (video training and simulation, described below).
2.3. Experimental Procedure
Upon arrival participants provided written, informed consent and were briefed on the experimental learning tasks. Participants then responded to a demographic questionnaire and were fitted with the Equivital device. Participants were instructed to stand comfortably and read a magazine for 5 minutes while a physiological baseline was collected. After the baseline, participants were directed to the adjacent room for a series of computer-based UAS tasks with varying levels of engagement. Participants stood in front of a workstation, with height adjusted to ensure adequate alignment with the eye tracker, and the eye tracker was calibrated. Participants then watched a series of videos on how to use the UAS controllers while holding the UAS controller and mimicking the movements displayed in the videos for a duration of six minutes. The video training task was hypothesized to have low levels of engagement, due to low levels of interactivity, challenge, goal clarity, feedback, and immersion. The next task consisted of performing a series of UAS obstacles in the Real Flight 7.5 simulator, and was hypothesized to have moderate to high levels of engagement due to high levels of interactivity, challenge, goal clarity and feedback and moderate levels of immersion. Finally, participants moved to a live, indoor, high-bay, learning environment where they completed the live training task, which included flying a UAS quadcopter through a series of physical obstacle courses. The live task was hypothesized to induce high levels of engagement due to high levels of interactivity, challenge, goal clarity, feedback and immersion of the learning environment. In both simulation and live training tasks, participants completed as many obstacle course challenge levels as they could, within the six minute session, up to the maximum of 10 levels. Challenge levels increased in difficulty, requiring increasing levels of skill such as hovering, flying in various orientations, and landing. Participants in simulation and live training also received clear goals related to their objective prior to the challenge and received performance feedback after each failed attempt at a level. Two performance measures were captured during the simulation and live tasks. Outcome performance was assessed based on the number of levels completed. Errors were assessed based on the number of times the UAS bumped into an object or crashed. Participant performance was not assessed during the video task. Participants responded to the FSS survey immediately following each task. All participants completed the tasks in the same order. Eye-tracking was not captured during the live training task.
2.4. Data Modeling
Physiological data analysis and classifier development consisted of synchronization between data capture systems, feature extraction, and data modeling. Data analysis was implemented in Python 3.7 with scikit-learn [20]. Data from each sensor was graphically visualized and synchronized with events of interest including: baseline; video training; simulation training; and live training. Any epochs of signal loss, such as when participants were not looking at the screen for eye tracking, were also identified.
In order to preserve temporal relationships among variables, timestamps for data arising from the Equivital system were synchronized. Respiration rate and heart rate were up sampled with linear interpolation to 0.133 Hz. The Equivital and EyeTech systems were synchronized with an NIST clock and the eye-tracker was adjusted to synchronize with the Equivital clock. Classification was performed on 30 second blocks of data and synchronization errors were constrained below 5 seconds.
Features were extracted from the raw physiological data, baseline normalized, and updated every 30 seconds to provide input to an engagement classifier. Features which were not predictive of the user state were removed, and dimensionality reduction was performed to combine features as needed. A random forest algorithm was utilized to rank feature significances collected from the Equivital and eye tracking systems. Features from the Equivital system included mean EDA, respiration rate, accelerometer signal magnitude area (SMA), mean heart rate (HR), QRS duration, QRS sum [21], and heart rate variability (HRV) measured by the ratio of low frequency (LF) HRV, defined as the sum of the power spectral density (psd) of the IBI signal in the 0.04 to 0.15 Hz range, over high frequency (HF) HRV, defined as the sum of the psd from 0.15 to 0.4 Hz range. Features derived from the eye tracker included the mean fixation number and mean fixation duration, defined as maintenance of visual gaze on a single location, defined as 3 visual degrees and >100 ms [22].
Once the features were selected, two separate models were trained: one in which the participants’ data were broken down by task into 2 states (low and high engagement) corresponding to the video and simulation tasks due to the lack of eye tracking data in the live task; and one in which the participant’s data were broken down by tasks into 2 states (low and high engagement) corresponding to the video and live tasks for situations in which eye-tracking is not feasible. To predict the engagement state of the participants, the extracted features were subjected to a logistic regression analysis. A series of models were trained to map the 5 selected continuous physiological variables to two discrete states (low/high engagement) using 5 fold cross-validation, iteratively repeated 100 times, to minimize bias and variance. Final model parameters were taken from the model with the highest classification accuracy across both testing and training sets.
2.5. Data Analysis and Statistics
Participant performance was assessed using two measures. First, outcome performance was measured based on the number of challenge levels that a participant successfully completed, out of a maximum of 10, with participants receiving 10 points per level completed. Second, participant errors were assessed using a count of the number of times that the UAS bumped into an object or crashed.
Differences between training task self-reported engagement were evaluated using repeated measures ANOVA with α = 0.05. Differences in performance between the video and simulation tasks were analyzed using repeated measures t tests with α = 0.05. Statistical analyses were performed in IBM SPSS Statistics version 23.
3. Results
The sociodemographic factors in the study sample are listed in Table 1. Four participants were removed from the study due to missing physiological data, leaving forty-five participants for data analysis and modeling. Most participants were male, of an average age 25.7 ± 9.9 years, and had some previous UAS experience, defined as UAS usage 1 - 3 times per year.
Users evaluated each training task immediately following exposure using FSS scale (Figure 2). Differences were observed in self-reported engagement [F(2, 88) = 13.697, p = 0.0000066]. Post-hoc testing indicated significant differences between video and simulation training (p = 0.003), between video and live training (p = 0.0001), and between simulation and live training (p = 0.007).
![]()
Table 1. List of sociodemographic factors in the study sample.
![]()
Figure 2. Boxplots overlaid with raw data of self-reported engagement using the FSS survey. Participants rated the simulated training session significantly higher than the video training, and the live training significantly higher than both the simulated and video training sessions. *p < 0.05.
Users performed significantly better in simulated UAS flying compared to live [t(44) = 6.961, p = 0.000000013], and made similar errors in simulation as compared to live [t(44) = −1.539, p = 0.131] (Figure 3).
Average, session-level physiological and behavioral data is shown in Table 2. Features were extracted from individual, baseline-normalized physiological and behavioral data and used to train an algorithm of engagement state.
All data used to train the model was labeled either as high (sim or live) or low (video) engagement. This data was then preprocessed to ensure the model: 1) generalizes across individuals and 2) varies on a time scale useful for learning interventions. This was accomplished by training the model with feature data from 45 different subjects and breaking the data into 30 second windows. A logistic regression model with both behavioral (eye tracker) and physiological (Equivital) features was capable of predicting whether a 30 second physiological data set
![]()
Table 2. Average (SD) session level physiological and behavioral (eye-tracking) data from the cohort.
![]()
Figure 3. Boxplots overlaid with raw data of performance in simulated (white) and live (gray) UAS flight. *p < 0.05.
came from a high or low engagement condition with a training accuracy of 79% and a testing accuracy of 85% (Figure 4). Parameters included fixation number (w = 0.002), EDA (w = 0.063), fixation duration (w = −0.030), QRS duration (w = −0.205), and heart rate (w = 1.125), with an intercept of 0.0164. An additional logistic regression classifier was developed without eye-tracking features to account for situations when eye-tracking is not available, and was capable of predicting whether a 30 second physiological data set came from the high or low engagement condition with a training accuracy of 76% and a testing accuracy of 81% (Figure 4). Parameters included heart rate (w = −0.021), QRS sum (w = 0.041); LF/HF HRV (w = 0.074, SMA (w = 0.047), EDA (w = 0.050), and respiration rate (w = 0.091), with an intercept of 7.163.
4. Discussion
The ability to measure and ultimately optimize learner engagement is considered to be one of the most effective countermeasures in combating student dropout, disaffection, and low performance [11]. Our results suggest that utilizing low non-invasive physiological measures that capture electrodermal and cardiovascular measures can provide a window into the real-time state of learner engagement. As predicted, results indicated that learner engagement levels significantly increased with increasing levels of interactivity, challenge, goal clarity, feedback and immersion. In addition, features derived from physiological sensing and eye-tracking equipment were able to successfully classify engagement with high accuracy.
In the current effort we developed a two class model of engagement using physiological and behavioral features, which classified engagement as high or low, with 85% accuracy with eye-tracking features included, and 81% accuracy without eye-tracking features. Both models are based on significant differences observed in self-reported engagement, user physiology, and task performance between
![]()
Figure 4. Logistic regression model accuracy. The model average accuracy was 81% without eyetracking features, and 85% with eyetracking features.
video, simulated, and live environments. Both simulated and live environments indicating significant increases in engagement as compared to video, and were labeled as high engagement in data modeling. Similar levels of accuracy have been reported recently in measuring engagement using features derived from wearable physiological sensors, touchscreens, and optical biomechanics [8], or using EDA only [11]. Given the physiological and behavioral differences observed between scenarios of increasing engagement, we leveraged a random forest classifier to select classifier features. As compared to other feature selection techniques, random forest approaches have shown the highest accuracy in modeling user state including engagement [8].
The engagement classifier reported here is associated with features derived from physiological and behavioral data. Previous research has indicated that various physiological metrics are associated with engagement, including features derived from EDA [23] [24] [25], ECG [24] [26], and from eye-tracking [27]. Our results suggest a decrease in heart rate and an increase in HRV occurs during increasingly engaging tasks, likely associated with increasing parasympathetic activity [24]. Our results also indicate a decrease in fixations and increasing fixation duration with increasing engagement. Such results agree with theory that posits that individuals will direct visual attention to salient images under engaging conditions because they are thinking more deeply about those images [27].
Individuals reported increasing levels of engagement across video, simulation and live tasks, as measured using the FSS. Underlying these results are aspects of interactivity, task challenge, goal clarity, feedback and immersion. Interactivity has been shown to impact engagement by providing the individual with control in the learning task through technology, leading to increased learner motivation, interest, and learning gains [28] [29]. The level of challenge associated with a learning task is related to engagement as the presence, level, and appropriateness of a challenge can lead to intrinsic motivation, flow and micro-engagement [30] [31] [32]. However, the level of challenge must be appropriate to an individual’s capabilities to increase engagement. A task where the challenge is too advanced leads to a stress response which can reduce the motivation of the individual [26], [33] and reduce motor and cognitive performance [34]. If a task is not challenging enough or too easy it may produce apathy and a reduced attention due to a lack of motivation [33] [35]. An important facet in challenge is that as the individual learns, the challenge generally becomes easier, thus requiring the challenge to grow with the individual’s progress [36] [37] [38]. Increasing challenge between the simulation and live flight tasks was associated with decreasing performance. Further, engagement levels can also be positively influenced by presenting clear goals and performance feedback as these elements can improve learner motivation and strategy use [39]. Finally, immersion has been shown to influence engagement through increased interest and motivation towards learning tasks, ultimately leading to higher engagement [40] [41].
In the current effort, we were able to achieve high accuracy classification of engagement similar to recent reports [8] [11]. However, results of this study and the implications for use in educational settings should be interpreted with caution. The inclusion of eye-tracking features in this modeling effort are difficult to scale, and are associated with necessary calibration and restriction of user head movements [10]. Similarly, physiological sensors add cost, are difficult to scale, have varying levels of data quality [42], and may be associated with privacy issues [43].
In summary, this research demonstrated the capability of monitoring, and assessing individualized learner engagement across learning situations and contexts using physiological and behavioral inputs. Application of such an approach can facilitate to support them in determining where they need to adjust training to optimize learner engagement.
Acknowledgements
This work is supported by the US Air Force Research Laboratory (AFRL) under Contract No. FA8650-17-P-6852.