EEG-Based Estimation and Classification of Mental Fatigue

Mental fatigue was associated with increased power in frontal theta (θ) and parietal alpha (α) EEG rhythms. A statistical classifier can use these effects to model EEG-fatigue relationships accurately. Participants (n = 22) solved math problems on a computer until either they felt exhausted or 3 h had elapsed. Preand post-task mood scales showed that fatigue increased and energy decreased. Mean response times rose from 6.7 s to 7.9 s but accuracy did not change significantly. Mean power spectral densities or PSDs of θ and α bands rose by 29% and 44%, respectively. A kernel partial least squares classifier trained to classify PSD coefficients (1 18 Hz) of single 13-s EEG segments from alert or fatigued task periods was 91% to 100% accurate. For EEG segments from other task periods, the classifier outputs tracked the time course of the development of mental fatigue. By this measure, most subjects became substantially fatigued after 60 min of task performance. However, the trajectories of individual classifier outputs showed that EEG signs of developing fatigue were present in all subjects after 15 30 minutes of task performance. The results show that EEG can track the development of mental fatigue over time with accurate updates on a time scale a short as 13 seconds. In addition, the results agree with the notion that growing mental fatigue produces a shift away from executive and attention networks to default mode and is accompanied by a shift in alpha frequency to the lower alpha band.


Introduction
The purpose of this study was two-fold: a) to identify the features of the electroencephalogram or EEG that change in an orderly and reliable manner with the development of mental fatigue; and b) to construct a practical system that can use EEG features to accurately estimate the instantaneous degree of mental fatigue over a performance period of up to three hours. Our approach consisted of five key steps. First, we adopted a clinicaltheoretical framework that allowed us to distinguish mental fatigue from general fatigue, sleepiness or other factors that influence task performance. Second, we adopted a task and a set of procedures for inducing mental fatigue in the participants of the study. Third, we adopted psychometric tests that can independently validate the ability of the task to induce mental fatigue. Fourth, we selected an EEG measurement system that is practical enough to use in operational settings and sensitive enough to detect changes in EEG features with the development of fatigue. Finally, we adapted an algorithm for estimation of mental fatigue from EEG features that is robust in the face of signal corruption from noise or sensor loss and accurate enough to classify states of mental fatigue in individual human subjects with a few seconds of EEG recordings.
There are countless occupations today in which fatigued individuals routinely operate complex, high-risk systems. Such systems may be as common as driving a car or operating a forklift, or as rare as controlling traffic at a busy airport or docking two spacecraft in orbit. This undesirable state of affairs has contributed to many well publicized-and some not so well publicized-disasters. For example, operator fatigue was involved in the Three-Mile Island and Chernobyl nuclear power plant disasters, the EXXON Valdez grounding, and the Guantanamo Bay airline crash (U.S. Congress, Office of Technology Assessment, 1991;Hall, 2000;Dinges, 1995). "There is [also] much evidence that [operator] fatigue has contributed to serious incidents and accidents in industrial operations … and to accidents in all [emphasis added] modes of transportation" (Dinges, 1995: p. 8).
Analyses of crash data confirm that fatigue, drowsiness and inattention pose the greatest known risks to automobile driver and passenger safety, surpassing all other known risks including alcohol use and secondary tasks such as cell phone usage (Dingus et al., 2006;Sagberg et al., 2004). Accordingly, there continues to be much scientific interest in assessing, monitoring, and predicting operator fatigue (Gevins et al., 1995;Kennedy, 1953;Wilson & Fisher, 1991;Russo, Stetz, & Thomas, 2005).
Unfortunately, there is no standard definition of fatigue. Clinically, fatigue is best defined as difficulty in performing voluntary activities (Chaudhuri & Behan, 2004). The main psychological symptoms are subjective feelings of low energy and motivation, which may be accompanied by physical weakness. Drowsiness aside, fatigue is still a broad concept, which includes mental, motor, metabolic, endocrine, sensory, and other physiological or environmental components (Chaudhuri & Behan, 2004;Winningham, 1996). We prefer the point of view that fatigue is distinct from drowsiness, or the feeling of a need to sleep, although fatigue and drowsiness are often felt at the same time (Hurd, 2007). Hockey (2013) considered that mental fatigue arises not so much from excessive work and effort as from a lack of motivation to continue tasks that become less and less rewarding over time. In this view, fatigue is healthy in the sense that it can help prevent motivational fixation on current activities. For example, the onset of fatigue can force a break in attention to a monotonous and unrewarding task, and allow a person to weigh the benefit of continuing that task or redirecting behavior towards activities that may be more rewarding. Nevertheless, our society often requires people to engage in long-duration tasks with low immediate reward value, such as long-haul truck driving, piloting aircraft or various monitoring activities. A better understanding of the neurophysiology of alertness-fatigue transitions may aid in task design and analysis. Ultimately, knowing precisely what changes in an individual's EEG during such transitions could be used with biofeedback to train operators of fatigue-inducing systems to self-regulate these transitions and avoid accidents (e.g., Beatty et al., 1974).
As with all mental states, clinicians and neuroscientists cannot sharply define mental fatigue in terms of specific physiological processes. So mental fatigue is usually defined as a subjective assessment that is associated with various behavioral and physiological symptoms. For example, we can define mental fatigue as the unwillingness of alert, motivated subjects to continue performing a specific mental task (Montgomery, Montgomery, & Guisado, 1995). Then at times before, during, or after performance of the task, we can validate this definition with psychometric tests.
Although mental fatigue is currently assessed subjectively, there is a pressing practical need for objective measures to estimate, monitor, and predict it. Changes in the EEG clearly signal the onset and stages of sleep, but more recent studies have shown that EEG patterns can also signal the onset of mental fatigue in fully awake persons. For example, a separate analysis of data from this study found evidence for three statistically separable stages of mental fatigue development: an alert state, a "normal" state (neither alert nor fatigued), and a mental fatigue state . However, despite many studies to date, the EEG signs of fatigue are still not sharply defined and using EEG for operational fatigue estimation is still a formidable challenge.
This state of affairs motivated us to rigorously test the accuracy and robustness of EEG-based estimation of mental fatigue. We used a two-pronged approach: first we assessed the common across-subjects pattern of changes in power spectral density or PSD of multichannel EEG over extended periods of mental arithmetic performance. Second, we tested the feasibility of using the PSDs of EEG to classify mental fatigue states at a sub-minute time scale in individual subjects. More specifically, we trained, cross-validated, and tested statistical learning algorithms for classifying brief segments of EEG recordings as pertaining to self-reported states of alertness or mental fatigue.
To assess EEG changes common to all subjects, we tested the hypothesis that the PSDs of theta, alpha, and other EEG bands would co-vary regularly with the degree of mental fatigue that subjects experienced. While many studies have considered EEG correlates of general fatigue, drowsiness and sleepiness, our study focused on inducing mental fatigue through sustained performance of a cognitive task for up to three hours, while minimizing other effects. For example a study of EEG classification of mental fatigue using support vector machines from subjects who performed an auditory vigilance task reported classification accuracies ranging from 87.2% to 91.2% (Shen et al., 2008). However, these results may have confounded sleepiness and mental fatigue, as the subjects had endured 25 hours of sleep deprivation before performing the vigilance task. Other studies lead us to expect significant fatigue-related changes in PSDs of the frontal midline theta band and the parietal alpha band (e.g., Lorist et al., 2005;Gevins et al., 1998;Barwick et al., 2012). So we specified null hypotheses that EEG power in specific theta and alpha bands would remain constant over the course of a fatigue-inducing task.
To test the feasibility of using EEG to monitor mental fatigue in individuals, we trained a supervised machine learning algorithm to extract information-rich features from brief segments of EEG recordings and classify them as pertaining to states of alertness or mental fatigue. The algorithm is known as kernel partial least squares or KPLS, which we have applied successfully to classification of a wide range of data types including linear and nonlinear classification of EEG and event-related potentials Wallerius et al., 2005;Trejo, Matthews, & Rosipal, 2006). The features we extract using KPLS are a set of orthogonal basis vectors, which exhibit high covariance with states of alertness or fatigue. Except for this covariance constraint the basis vectors are computed using methods similar to those used in principal components analysis or factor analysis. We find the necessary and sufficient set of basis vectors using algorithms (de Jong, 1993;Rosipal, Trejo, & Matthews, 2003) that extract each vector sequentially, in descending rank of its covariance with an output measure, which in this case is a distinction between alertness and fatigue. Each extracted basis vector is a unique linear combination of multichannel EEG power spectral densities. We can then iterate this feature extraction procedure with cross validation methods to optimize the number of extracted components by striking a robust balance of classification accuracy and generality. We apply the KPLS algorithm in this study to accurately and robustly estimate mental fatigue in the framework of developing a useful system for continuous monitoring of mental fatigue under working conditions.

Experimental Design
We tested several hypotheses about the association of subjective moods, observed behavior, performance, and physiological measures during continuous performance of mental arithmetic for up to three hours. We manipu-lated a single factor, that is, time on task, and used a repeated measures design. Subjective moods were indexed by the Activation Deactivation Adjective Checklist (AD-ACL; Thayer, 1986;1989) and the Visual Analogue Mood Scales (VAMS; Stern, 1997) questionnaires. Observed behavior included ratings of activity and alertness from videotaped recordings of each participant's performance. The performance measures were response time and response accuracy, recorded in real time by the task computer. The physiological measures were derived from spontaneous EEGs, including: a) theta activity (4 -8 Hz) at Fz (both average power and peak amplitude in the theta band) and b) alpha activity (8 -13 Hz) at Pz (both average power and peak amplitude in the alpha band).

Participants
We collected data from 33 individuals from the NASA Ames Research Center community. All participants gave their informed consent using a form approved by the NASA Ames Research Center Institutional Review Board and were paid for their participation. We report only the data from the 22 participants who met the following inclusion criteria: they had to have a) been able to wear an EEG electrode cap without significant discomfort; b) stayed awake for at least 90 minutes; c) provided stable EEG recordings within a tolerance of +/− 100 μV. We dismissed participants who failed to meet these criteria. Six of the 22 eligible participants were excluded from analyses. One was excluded because he violated experimental protocol and wore a watch during the testing, and attended to the watch repeatedly. Five more were excluded because their response times were extremely slow (and consequently they provided too few EEG epochs for analysis). The remaining 16 participants included 12 males and 4 females with a mean age of 26.9 (SD = 7.4) years. Also, according to self-reports, all of the participants had normal vision and hearing. Fourteen of the 16 participants reported being right-handed and two reported being left-handed.

Mental Arithmetic Task
Participants sat in front of a computer with the right hand resting on a 4-button keypad (Neuroscan STIM pad, Compumedics USA, El Paso, TX). Arithmetic summation problems, consisting of four randomly generated single digits, three operators, and a target sum (e.g., 4 + 7 -5 + 2 = 8), were displayed on a computer monitor (cathode ray tube) continuously until the subject responded (Figure 1). Only addition and subtraction were used, and equations for which answers were obvious (such as those including several repeated digits) were excluded. The participants: a) solved the problems; b) decided whether their calculated sums were less than, equal to, or greater than the target sums provided; c) indicated their decisions by pressing the appropriate key on the keypad. The keypad buttons were labeled "<", "=", and ">", respectively. Subjects were instructed to answer as quickly as possible without sacrificing accuracy. After a response, there was a 1 s inter-trial interval, during which the monitor was blank. Participants performed the task until either they quit from exhaustion or three hours had elapsed. As can be seen in Table 1, all participants completed at least 6 of the 12 15-min blocks and 11 participants completed all 12 of the 15-min blocks (i.e., the total 3-hour period).

Activation Deactivation Adjective Checklist
Thayer's AD-ACL (Thayer, 1986;1989) is a multi-dimensional checklist reflecting perceptions of activation.  Individuals respond to 20 items using a 4-point rating scale (definitely feel, feel slightly, cannot decide, and definitely do not feel). The scoring procedure includes four subscales: energy (reflects general activation), tiredness (reflects general deactivation), tension (reflects high preparatory arousal), and calmness (reflects low preparatory arousal). The AD-ACL is a reliable and valid subjective method (Thayer, 1986;1989).

Visual Analogue Mood Scales
The VAMS (Stern, 1997) measure eight specific mood states, including afraid, confused, sad, angry, energetic, tired, happy, and tense. The VAMS have a neutral schematic. That is, they have a "mood-neutral" face (and word) at the top of a 100 mm vertical line and they have a "mood-specific" face (and word) at the bottom of the line. Individuals mark the point along the line that best illustrates how they feel at present. Scores range from 0 to 100, with 100 indicating the maximum level of the mood and 0 indicating the minimum level of a mood. Like the AD-ACL, the VAMS are also reliable and valid (Nyenhuis et al., 1997;Stern, 1997).

Observed Activity and Alertness
Activity and alertness were measured by visual inspection of videotapes of each participant's performance. The videotapes showed combined overall scene and facial views of the participants. For each 15-min interval, a single rater judged levels of alertness and activity (unnecessary motion) on a five-point scale ( Table 2). The alertness scale considered the subjective appearance of sleep, sleepiness, dozing off, distraction, yawning, and general alertness. The activity scale considered the frequency of subject motion, including moving in the chair, fidgeting, tapping, or shaking. The rater was blind to the specific aims of the study, although not to the time interval being rated (as the videotaped segments were viewed sequentially). However, changes in activity and alertness were related to obvious changes in behavior and, as a result, were fairly easy to quantify. Moreover, a second rater randomly rated a subset of the videotapes and found good agreement with the ratings. Ratings of activity and alertness were tested for correlations with response time, accuracy, and EEG spectral measures.

EEG Activity
EEG was recorded continuously using 32 Ag/AgCl electrodes embedded in an elastic fabric cap (Quik-Cap TM , Compumedics USA, El Paso, TX). The electrode cap was placed on the participant according to the manufacturer's instructions. The reference electrodes were digitally averaged mastoids and the ground electrode was located at AFz. Vertical and horizontal electrooculograms (VEOG and HEOG) were recorded using bipolar pairs of 10 mm Ag/AgCl electrodes (one pair superior and inferior to the left eye; another pair to the right and to the left of the orbital fossi). Impedances were maintained at less than 5 kΩ for EEG electrodes and 10 kΩ for EOG electrodes. The EEG was amplified and digitized with a 64-channel Neuroscan Synamps TM system (Compumedics USA, El Paso, TX), with a gain of 1000, sampling rate of 500 s −1 and a pass band of .1 to 100 Hz. Amplifiers were calibrated with a 50 µV signal prior to each testing session. The signals were digitized and stored on hard disk drives by a computer equipped with Neuroscan Scan 4.2 software (Compumedics USA, El Paso, TX) and archived on optical media (CD-R).

Procedures
Participants: a) were given an orientation to the study; b) read and signed an informed consent document; c) completed a brief demographic questionnaire (age, handedness, hours of sleep, etc.); d) practiced the mental arithmetic task for 10 minutes; and e) were prepared for data collection by having the electrode cap, EOG, and reference electrodes applied. They then completed the pretest self-report measures (i.e., the AD-ACL and VAMS) and performed the mental arithmetic task until either three hours had elapsed or volitional exhaustion

Data Processing
The EEGs, initially processed using Neuroscan Scan 4.2 Edit™ (Compumedics USA) software, were: a) submitted to an algorithm for the detection and elimination of eye-movement artifact; b) visually examined and blocks of data containing artifact were manually rejected; c) epoched around the stimulus (i.e., from -5 s prestimulus to +8 s post-stimulus); d) low pass filtered (50 Hz; zero phase shift; 12 dB/octave roll off); and e) submitted to an automated artifact rejection procedure (i.e., absolute voltages > 100 µV). The overall single-epoch rejection rate was 47%. The cleaned and filtered epochs resampled at 128 Hz using a non-aliasing method. EOG artifact was removed by using wavelet-denoised VEOG and HEOG signals as predictors of the artifact voltages at each EEG electrode in a multivariate linear regression. The ocular artifact rejection algorithm consisted of four main steps: a) wavelet smoothing of the EOG channels; b) blink detection; c) computing the linear regression of the EEG on the EOG channels; and d) subtracting the resulting weighted EOG from the EEG channels. These steps were repeated for each single EEG epoch. The wavelet smoothing process decomposed the EOG channels by using a Daubechies (DWT-8) wavelet filter. The lowest 2% of the coefficients were retained and the signal was reconstructed from the retained coefficients (e.g., Trejo & Shensa, 1999). The voltage of each sample in the VEOG signal was then compared to an absolute voltage threshold set at 100 μV and marked as a blink or non-blink accordingly. A multivariate linear regression algorithm was used to calculate the regression weights for the EEG on the VEOG and HEOG channels. Separate regression weights were estimated for blink and non-blink epochs. The final step was a linear subtraction of the weighted VEOG and HEOG signals from each EEG channel. EEG PSDs were estimated with Welch's periodogram method at 833 frequencies from 0 -64 Hz. Peak and average power in the theta and alpha bands were measured at electrodes Fz and Pz, respectively.

Classification Procedures
We classified single EEG epochs using kernel partial least squares decomposition of multichannel EEG spectra coupled with a discrete-output linear regression classifier, or KPLS-DLR. Through extensive side-by-side testing of EEG data, we found that KPLS-DLR is just as accurate as KPLS-SVC, which uses a support vector classifier for the classification step (Rosipal, Trejo, & Matthews, 2003). KPLS selects the reduced set of orthogonal basis vectors or components in the space of the input variables (EEG spectra) that maximizes covariance with the experimental conditions. DLR finds the linear hyperplane in the space of KPLS components that separates the classes. In a pilot study (Montgomery, Montgomery, & Guisado, 1995) and in the present study, we found that the first 15 minutes on task did not produce substantial mental fatigue, whereas mental fatigue was substantial in the final 15 minutes. So we randomly split EEG epochs from the first and last 15-min periods into equal-sized training and testing partitions for classifier estimation. Only the training partition was used to build the final models. The number of KPLS components in the final models was set by five-fold cross-validation. The criterion for KPLS-DLR model selection was the minimum classification error rate summed over all (five) cross-validation subsets.

Statistical Analyses
The data were analyzed using either singly or, when appropriate, doubly multivariate repeated measures analyses of variance with time of measurement as a within-subjects factor. Doubly multivariate analyses were carried out for AD-ACL subscale scores (energy, tension, calmness, and tiredness), VAMS subscale scores (afraid, confused, sad, angry, energetic, tired, happy, and tense), behavioral observation data (observed activity and alertness), theta activity measures (peak and band-average amplitudes), and alpha activity measures (peak and bandaverage amplitudes). Singly multivariate analyses of variance were carried out for response time and accuracy. For the doubly multivariate analyses, significant multivariate F-ratios were decomposed using single degree of freedom, within-subjects contrasts. For the singly multivariate analyses, Huynh-Feldt-corrected degrees of freedom and p-values were reported (i.e., because of sphericity). In both analyses, partial η 2 values were reported as effect size estimators.

Self-Report Analyses
The . There was also a non-significant linear trend for tension (F(1, 8) = .92, p < .37). Thus, the AD-ACL data indicate that our manipulation decreased general activation (i.e., self-reported energy) and preparatory arousal (i.e., self-reported calmness) and increased general deactivation (i.e., self-reported tiredness). The VAMS subscale scores (i.e., for afraid, confused, sad, angry, energetic, tired, happy, and tense) were analyzed in a doubly multivariate ANOVA with time of measurement (i.e., pretest vs. posttest) as a within-subjects factor. The main effect of time of measurement was non-significant (multivariate F(8, 1) = 1.31, p = .59). This analysis suggests that our manipulation, despite its effects on activation and arousal, did not influence other, more durable, moods.

Behavior Analyses
The behavioral observations (i.e., observed activity and alertness) were analyzed in a doubly multivariate ANOVA with time of measurement (i.e., 15-min periods) as a within-subjects factor. The main effect of time of measurement was significant (F(18, 178) = 3.70, p < .0005, η 2 = .27). This analysis suggests that time on task influenced behavior (i.e., observed activity and alertness levels). Moreover, time on task had a progressive effect on behavior. Within-subjects contrasts showed a linear decrease in alertness (F(1, 10) = 10.4, p < .009, η 2 = .51) and a linear increase in activity (F(1, 10) = 5.88, p < .04, η 2 = .51). Alertness decreased from a mean of 5.00 (SD = .00) in the first 15-min period to a mean of 2.43 (SD = .98) in the last 15-min period. Activity increased from a mean of 1.36 (SD = .51) to a mean of 2.45 (SD = 1.30), respectively.
The response times (RT) were analyzed in an ANOVA with time of measurement (i.e., 15-min periods) as a within-subjects factor. The main effect of time of measurement was significant (Huynh-Feldt corrected F(3, 39) = 3.78, p < .03, η 2 = .24). This analysis suggests that time on task influenced performance. Moreover, time on task had a progressive effect on performance (Figure 2). Within-subjects contrasts showed a significant linear increase in RT (F(1, 12) = 8.29, p < .01, η 2 = .41) rising from a mean of 6.70 s (SD = 2.18) in the first 15-min period to a mean of 7.87 s (SD = 2.64) in the last 15-min period. We found the same pattern of significant effects for RT analyzed in an ANOVA with fraction of artifact-free trials (i.e., 1 st 100, middle 100, and last 100) as a within-subjects factor.
Response accuracy was analyzed in an ANOVA with time of measurement (i.e., 15-min periods) as a within-subjects factor. The main effect of time of measurement was not significant (Huynh-Feldt corrected F(5, 43) = 1.74, p = .14). Response accuracy was also analyzed in an ANOVA with fraction of artifact-free trials (i.e., 1 st 100, middle 100, and last 100) as a within-subjects factor. The main effect of number of trials was not significant (Huynh-Feldt corrected F(2, 19) = 2.84, p = .09). This analysis suggests that, despite its effects on other aspects of behavior, time on task did not have a substantial influence on response accuracy.

EEG Analyses
Because the small number of participants precluded analysis of the full montage of electrodes, we examined the average spectra for pronounced effects (Figure 3) and focused our analyses on the electrodes that had maximal average voltages for the alpha and theta bands (i.e., Fz for theta and Pz for alpha). Electrode Fz shows a marked increase in theta power spectral density near 6 Hz. Electrode Pz shows marked increases in alpha PSD with peaks near 9 Hz (alpha 1) and 10 Hz (alpha 2). The increase in theta power is steady over the three time periods, whereas the increase in alpha 1 and alpha 2 is large between early and middle time periods and smaller between middle and late time periods. The relative distribution of average alpha 1 power is more anterior than alpha 2, with alpha 1 showing a maximum at Cz and alpha 2 showing a maximum at Pz.
The changes in frontal midline theta (i.e., average PSDs and peak amplitudes at Fz) were analyzed in a doubly multivariate ANOVA with time of measurement (i.e., 15 min-periods) as a within-subjects factor. Each subject's theta peak was measured individually. The main effect of time of measurement was significant (multivariate F(18, 178) = 2.05, p < .01, η 2 = .17). Average power in the theta band (4 -8 Hz) increased from a mean of 199.36 µV 2 /Hz (SD = 97.50) in the first 15-min period to a mean of 256.58 µV 2 /Hz (SD = 135.57) in the last 15-min period. Peak amplitude in the theta band increased from a mean of 272.4 µV 2 /Hz (SD = 146.0) in the first 15-min period to a mean of 390.8 µV 2 /Hz (SD = 227.1) in the last 15-min period. This analysis suggests that theta increased with time on task. Moreover, this analysis suggests that time on task had a progressive effect on frontal midline theta activity. Within-subjects contrasts showed significant linear increases in average theta PSDs (F(1, 10) = 7.42, p < .01, η 2 = .48) and in peak theta amplitudes (F(1, 10) = 9.31, p < .01, η 2 = .48).
The changes in midline parietal alpha activity (i.e., average PSDs and peak amplitudes at Pz) were analyzed in a doubly multivariate ANOVA with time of measurement (i.e., 15-min periods) as a within-subjects factor. For each subject the peak alpha power was measured individually without respect to distinctions of alpha 1 and alpha 2 bands shown in Figure 3. The main effect of time of measurement was significant (multivariate F(18, 178) = 2.20, p < .005, η 2 = .18). Average power in the alpha band (8 -13 Hz) increased from a mean of 307.4 µV 2 /Hz (SD = 434.3) in the first 15-min period to a mean of 459.0 µV 2 /Hz (SD = 593.9) in the last 15-min period. This analysis suggests that alpha power increased progressively with time on task. Within-subjects contrasts showed significant linear increases in average alpha PSDs (F(1, 10) = 6.07, p < .03, η 2 = .38). Peak alpha amplitudes increased and trended similarly, but not significantly so (F(1, 10) = 4.11, p = .07).

Classification
We applied our classification procedure to EEG recordings from 14 of the 16 subjects (two subjects had too few EEG epochs for model estimation). The EEG epochs were synchronized with the onset of each math problem, extending from -5 s to +8 s relative to each stimulus onset. As such there was some overlap among the EEG segments. However a second analysis of 3.5 s segments with no overlap produced highly similar results, so we will focus only on the long-epoch results. We also reduced the likelihood of electromyogram artifacts by lowpass filtering the EEG with an 18-Hz cutoff.

Figure 3.
Grand average EEG power spectra and topographic maps at peak frequencies in the theta band (peak 6.25 Hz), alpha 1 band (peak 9 Hz) and alpha 2 band (peak 10 Hz) across all subjects as a function of time on task. Time periods for each spectrum or plot were early = first 15-min block, late = last 15-min block, middle = all other blocks. Electrode Fz shows a marked increase in theta power spectral density (PSD) near 6 Hz. Electrode Pz shows a marked increase in alpha PSD with peaks near 9 Hz (alpha 1) and 10 Hz (alpha 2). The color scales of the topographic maps are in units of dB/Hz and range from blue to red with the following limits: theta = −3.6 to 2.4, alpha 1 = −4.5 to 2.7, alpha 2 = −5.3 to 3.0. All topographic maps in this report used a spherical spline interpolation algorithm (Perrin et al., 1989) and the APECSgui toolbox (Pacific Development & Technology, 2012) with an adaptation of the eeg_topoplot function from EEGLAB (Delorme & Makeig, 2004).
For each subject we made a KPLS-DLR model using either linear or Gaussian (nonlinear) kernels and selected the best model as described above. On average linear kernels had slightly better results than Gaussian kernels, so we focus on linear kernels here. Classification accuracies (Figure 4) across both classes for 18-Hz filtered EEG ranged from 91.12 to 100% (mean = 98.30, Table 3). The corresponding range for 11-Hz filtered EEG was 89.53 to 98.89% (mean = 98.30%). The number of KPLS components ranged from 1 to 4 (mean 2.77) for 18-Hz EEG and from 1 to 5 (mean 3.76) for 11-Hz EEG ( Table 3).
With as few as two components, the separation of classes was usually evident from the distribution of KPLS scores for single EEG epochs. The test-set data for the first and last 15-min blocks occupied distinct regions inthe space of the KPLS scores (Figure 5).   The scalp topography of the KPLS weights can serve as an indicator of which regions or electrodes strongly influence classification. For example, by plotting the KPLS model coefficients at the theta and alpha band peaks of the EEG spectra in one subject (Figure 6), we see that frontocentral midline sites are important for classification in the theta band, with a maximum near Fz. In the alpha band, the discriminating electrodes were concentrated near Pz.

KPLS Model Prediction
We also examined the predictive validity of the KPLS-DLR models by testing them with data from the first nine intervening 15-min periods (between first and last). The behavior of the classifiers for these periods was consistent with an orderly, progressive migration of single-trial KPLS predictions from the non-fatigued to the fati- gued class. This observation agrees with the trends we observed in response times, EEG measures, and behavioral observations. We examined these patterns of migration by inspecting graphs of the predicted scores for the first two components of the KPLS models for single subjects (Figure 7). Initially, the predicted points overlapped with the region occupied by the non-fatigued training set. Over time, the predicted points shifted towards the fatigue region. For each of 11 subjects, we computed the center of mass (mean) of the KPLS classification scores for each of 10 fifteen-minute blocks (Figure 8). Negative scores represent alert states and positive scores represent fatigue states. For most subjects there is a progressive shift of the means toward the fatigue state, with some intermittent reversals. Three subjects entered fatigue between 15 -30 min whereas one subject did not enter fatigue until between 105 -120 min. Although the majority of subjects (n-6) did not cross the zero line between classified alertness and fatigue until between 60 -75 min, all subjects show progressive transitions towards the fatigue state in the first 15 -30 minutes.

Discussion
Our results showed that the power spectral densities of theta and alpha EEG bands co-varied with the degree of fatigue that subjects experienced. Power in both theta and alpha bands increased significantly between the beginning and end of the task. These increases were confirmed with behavioral and subjective reports as pertaining to alert and fatigued conditions. Moreover, we found that we could accurately model and estimate fatigue as a function of EEG spectral features using KPLS-DLR, a statistical learning theory based classifier.

Behavioral Measures
Time on task produced decreased general activation (i.e., self-reported energy) and preparatory arousal (i.e., self-reported calmness) and increased general deactivation (i.e., self-reported tiredness) but did not significantly influence other moods. These effects support the assertion that our task produced a state of substantial mental fatigue. Observations of the participants showed that behavioral activity progressively increased while alertness progressively decreased over time. Moreover, there was a progressive slowing of response times. However, time on task did not significantly influence response accuracy. Together, these results suggest that our subjects experienced mental fatigue, but did not sacrifice accuracy as might be expected if motivation had waned. The moderate, general increases in RT over time also indicate increasing mental fatigue, but not a severe increase in RT, as might be expected if lapses or microsleep episodes had occurred frequently.

Figure 7.
Estimating the development of fatigue over time in one subject (S3). KPLS scores were predicted for EEG epochs from nine 15 minute blocks between the training set blocks (B2 -B10). Block 2 = 15 -30 min, block 3 = 30 -45 min, block 4 = 45 -60 min, … , block 10 = 135 -150 min. Black circles and purple crosses are the KPLS C1 and C2 scores of single EEG epochs from fatigued (B1) and non-fatigued (B12) training sets, respectively. Colored diamonds are the KPLS C1 and C2 scores (x = C1, y = C2) of single EEG epochs for intervening 15-minute blocks 2 -10. In this subject, the drift of the orange diamonds in B3 away from the black circles and towards the purple crosses marks the onset of fatigue after 30 -45 minutes on task. By the tenth block most predicted scores fell in the fatigue region, as defined by the training data set. Prolonged vigilance appears to have different effects on error rates than fatigue arising from prolonged cognitive effort such as mental arithmetic. For example, in a simulated sonar task, an EEG-based alertness index followed closely the time course of detection error rates (Makeig & Inlow, 1993). However, we found that cognitively fatigued subjects continued to perform with low and stable error rates for up to three hours. Our task differed from the sonar task by virtue of having strong cognitive demands, whereas the demands of the sonar task were mainly sensory or perceptual (detecting sonar sounds). The main behavioral impact of cognitive fatigue in our arithmetic task appeared to be a slowing of mental processes, because response times trended significantly and progressively higher over time. Because the response demands of our task were simple and highly practiced, we do not consider that cognitive fatigue substantially slowed response selection processes. Instead, our results are more consistent with the view that cognitive fatigue slows down central executive functions such as working memory and decision making as compared to perception and response-selection.

EEG Measures
Our EEG analyses indicate that time on task had a progressive influence on PSDs of frontal midline theta and parietal alpha activity. PSDs in both of these EEG bands increased as a function of time on task. Our inspection of the EEG spectra did not indicate effects outside the theta and alpha bands. In particular, there were no indications of effects at 14 Hz or in the beta band. The lack of effects in bands above 14 Hz also discounts electromyographic artifacts or other broad-band noise sources as determinants of the theta and alpha effects. Progressive increases in alpha and theta band powers were also observed in a study of extended performance of an air traffic control task (Krishnan, Dasari, & Ding, 2014). In general agreement with our findings, the EEG changes in the air traffic control task indicated alertness-to-fatigue "transition times" after about 70 minutes of task performance. In our mental arithmetic task such transitions may occur earlier, in as few as 15 -30 minutes of continuous performance. Task differences may account for the shorter transition time observed here, or the KPLSderived EEG measures we used may be more sensitive than those used by Krishnan et al. Our results do not support an overall slowing of the EEG in mental fatigue, as much as they indicate specific increases in frontal midline theta and midline parietal alpha power. However, the scope of our analyses does not allow us to rule out specific fatigue-related decreases in individual peak alpha frequency or relative proportions of power in lower and upper alpha bands. More recent analyses across several studies of sustained attention tasks suggests that fatigue may induce a shift in the frequency of the alpha rhythm accompanied by a relative shift of the alpha sources towards more anterior locations (Rosipal et al., 2009;Trejo et al., 2010;Rosipal et al., 2013;Zaidel et al., 2013). A similar finding was observed by Barwick et al. (2012) when mental fatigue developed in a Stroop interference task. Studies of individual alpha frequency changes should be a fertile area in future mental fatigue research.
One explanation of the fatigue-induced increases in alpha band power derives from a neurophysiological model about the significance of varying alpha band power levels. The model states that "when patches of neurons display coherent activity in the alpha band, a depressed state of active processing of information in the underlying cortical neuronal populations can be assumed to exist" (Pfurtscheller & Lopes da Silva, 2005: pp. 1008-1009. This model is consistent with the idea that the likelihood of depressed or reduced processing surrounding the onset of a task-relevant stimulus increases as a function of time on task or in relation to subjectively perceived mental fatigue.
Another possible explanation for increasing alpha power concerns the timing of our EEG spectral measures. Our EEG epochs were synchronized with the onset of each stimulus presented in the mental arithmetic task. This means that the EEG features associated with task event processing will tend to dominate the analysis, as compared to EEG features that are randomly selected over time. It is possible that over time on task, alpha oscillations became progressively more synchronized with task events. For example, if bursts of alpha tended to occur randomly early on but became regularly linked to stimuli or responses later. Our present analyses provide no evidence for or against this explanation as we did not look at event-related synchronization measures, but this again seems a fertile area for future analyses.
The hypothesized fatigue-related increase in frontal midline theta band PSDs was supported by our analyses. One current hypothesis states that increases in theta band PSDs are associated with increases in the demands of cognitive processes (Klimesch, 1999;Pfurtshceller & Lopes de Silva, 2005). In our mental arithmetic task we consider that the critical cognitive processes are working memory, long-term memory, and selective attention. A large and diverse body of experimental reports has substantiated that working memory (WM) capacity and speed of access to long-term memory (LTM) are critical limiting processes in cognitive performance (Anderson, 1995). In particular, a theoretical framework has emerged, which relates alpha, theta, and beta activity in the anterior cingulate cortex and the hippocampus to WM and LTM (Gevins et al., 1997;Klimesch et al., 1997;Mecklinger, Kramer, & Strayer, 1992). Distinctions are now recognized among the spatial and frequency parameters that characterize: spatially diffuse theta rhythms associated with sleepiness, spatially focused, theta rhythms associated with cognitive effort and WM load, lower alpha rhythms (8 -10 Hz) associated with relaxed wakefulness, and focused higher alpha rhythm (10 -12 Hz) suppression associated with spatial and non-spatial shifts of attention (reviewed in Klimesch, 1999). These experimental and theoretical results are generally consistent with the theta band effects we observed here.
Another model for the increase in frontal midline theta band power states that the frontal executive network may go "off-line" during states of mental fatigue. In this view, the observed increases in theta power would indicate that rhythmic theta represents an "idling" state and the frontal executive network is in a standby mode. In support of this view, Lorist et al. (2005) found evidence of reduced frontal event-related potential amplitude with time on task in subjects performing a mentally fatiguing switch task for two hours. Another study, in which functional magnetic resonance were images recorded while participants performed a psychomotor vigilance test task for 20 minutes, also found evidence of reduced frontal-parietal activity with time on task (Lim et al., 2010). Activity in the fronto-parietal network was lower during post-task rest compared to pre-task rest, and cerebral blood flow decreased in this network in a manner correlated with a decline in task performance. The view that increasing theta signals reduced activity in the frontal executive network goes hand in hand with the view that increasing alpha activity signals reduced activity in parietal regions. Both theta and alpha rhythms increasing could also suggest a withdrawal from executive and attentive processing to more activity in the default mode network. Modeling of EEG-fMRI correlations at the network level is still a young field that must explain a complex and subtle set of effects (Murta et al., 2015), but some evidence shows that default mode network activation appears to be more highly related to beta-band activity than to alpha or theta (Laufs et al., 2003). However, where correlations with default mode network activation and alpha were obtained, it was with low frequency alpha (Jann et al., 2009;, which would agree with the notion that growing mental fatigue produces a shift away from executive and attention networks to default mode and is accompanied by a shift in alpha frequency to the lower alpha band. Perhaps it is in such flights away from task performance to default mode that a person finds time to re-evaluate goal priorities, as predicted by Hockey's (2013) motivational model of mental fatigue. We strongly encourage new research to sort out the reasons for the significant effects of mental fatigue on increased alpha and theta band powers, as well as their relationship to functional brain networks.

Classification and Prediction
Overall, our KPLS-DLR classification of single-trial EEG epochs ranged from 90% to 100% with a mean of 97% to 98%. The performance of these classifiers is highly accurate for single trials, and may serve as the basis for predictive models of mental fatigue in operational settings. Inspections of the predictive behavior of the KPLS models showed an orderly relationship between the scores and time on task as well as between the scores and correlated behavioral, subjective, and performance measures. In the 11 subjects that we examined for model predictions, our classifier generalized in an orderly manner to data from the 15-minute blocks in between the initial and final 15-minute blocks used to train the classifier. Specifically, the ensembles of KPLS scores used to estimate mental states from single EEG epochs in these intervening blocks exhibited an orderly progression from alertness to fatigue. According to the estimates of our classifier, subjects fatigued at widely different rates, but most subjects entered the region of KPLS scores corresponding to fatigue after about 60 min on task. Although the focus of the present study was on binary classification of alert vs. fatigue states, we have also assessed the clustering of KPLS-derived features with these data and have found evidence that the development of mental fatigue comprises three distinct states of alertness: alert, normal, and fatigued . Future studies should examine whether such states correspond to physiologically distinct network activities or represent levels of activation within a fixed set of networks.
The present study also shows that the individual EEG spectral signatures and patterns of change are useful for estimation of cognitive fatigue. The general trend across subjects in EEG spectral change is explained by an alpha + theta model, with both alpha and theta PSDs being greater in fatigued states than in normal states. However, the estimation of momentary cognitive fatigue states from single EEG trials of 13 s duration was most effective with multi-electrode frequency spectra of individual subjects. The KPLS algorithm correctly identified features of EEG spectra for each individual, and accurately classified over 90% of single EEG trials as having come from a period of alertness or mental fatigue in most subjects.
Our results suggest several important implications for future EEG studies of mental fatigue. First, as others have found (e.g., Galbraith & Wong, 1993;Smith et al., 2001), EEG classification algorithms benefit greatly by being both individualized and sensitive to multiple sensors and frequency bands. The development of simpler and more general models, which apply to a broad set of subjects, will require considerable additional research. Another well-known problem in the applied EEG community is that the performance of classification algorithms from day to day, or at different times of day is unstable (Trejo, Matthews, & Allison, 2007;Wilson & Russell, 2003a). Additional research is needed to develop methods for stabilizing the link between EEG features and mental states such as fatigue or alertness over long periods of time. Finally, some type of saliency analysis, such as our maps of spectral features (Figure 6), is important in order to determine which fundamental aspects of the EEG yield the most accurate predictions (Wilson & Russell, 2003b). Additional research is also needed to develop quantitative methods for saliency analyses that are meaningful across studies and feature extraction or classification algorithms.