TREFACE: A New Computerized Test of Emotional Stroop with Facial Expressions ()
1. Introduction
Among the various behavioral manifestations resulting from an emotion, facial expressions can be considered of great relevance for the external signaling of what the individual is feeling [1], demonstrating his emotions and interacting socially, besides presenting an adaptive value for the organism that performs them [2] [3] [4].
Recognizing facial properties is not only an important mechanism for maintaining survival, but also refers to a capacity of the brain’s biological system to establish and maintain social life [5] [6] [7]. The face transmits a large amount of information, which is processed in the order of milliseconds [3] [4] [7] [8]. These signals have been recognized as emotional gestures and are translated into a universal language: happiness, fear, disgust, sadness, surprise, and anger [9] [10] [11].
Several biological mechanisms are involved in the processing of emotional meaning. There is a broad consensus in the literature on the existence of a neural activity between the amygdala and the orbitofrontal cortex (OFC) in this type of processing and in the recognition of emotional facial expressions [2] [6] [12] [13] [14].
Neurophysiologically, the face is perceived as an image that follows a path of direct visual recognition, which goes from the retina to the lateral genicular nucleus in the thalamus. The information continues in areas of the primary and secondary cortex, located in the occipital cortex and in the medial sulcus of the temporal lobe. There is also an indirect route which starts at the retina and goes to the superior colliculi in the midbrain and from there to the amygdala, where it would generate signals for the central and peripheral structures; in addition to continuing its path to the visual cortex and specialized areas, such as the inferior temporal lobe and the superior temporal sulcus, where a perceptual analysis begins [7] [15].
Haxby, Hoffman and Gobbini [16] proposed a cognitive model of facial processing and analysis. The first of the mechanisms would be responsible for the treatment of the invariable details on the face: eyes, nose, mouth and their organization, which would result in efficient resources for the recognition of identity. The second mechanism would serve the aspects that change, with the movement of the eyes and mouth, such as emotional expression [1]. This processing model includes a central and an unfolded system, the first of which performs visual analysis and is composed of the lower occipital cortex, involved in the perception of facial features; the lateral fusiform gyrus, which encodes the inalterable aspects; and the upper temporal groove, which will engage with the changeable details of the face.
On the other hand, the second model involves, among its connections, structures such as the amygdala, the insula and components of the limbic system that modulate the emotional attributes of facial expressions; the intra-parietal groove related to spatial attention; the auditory cortex, involved in speech processing and the anterior temporal cortex, in charge of processing identity and biographical information [13] [14] [16] [17].
Cognitive processes such as memory, language, attention control and basic components of executive functions are strongly involved in the efficiency of emotional facial processing [2] [11] [15] [18] [19] [20].
At the same time, there are mechanisms or disorders that change the efficiency of the emotional recognition system, as in cases of generalized anxiety [21] [22] [23] [24], depression [22] [25], states of sadness [26] [27], post-traumatic stress [28] [29], autism [30] [31], Alzheimer’s disease [32], multiple sclerosis [33], insomnia, and expression of nocturnal cerebral hyper-metabolism [34].
Other studies have shown a relationship between hormonal variables and the phases of the woman’s menstrual cycle and a selective level in the emotional recognition of the faces [35]. Additionally, differences have been identified by age, comparing young and elderly adults [32] [36].
There are different instruments for assessing the monitoring of emotional conflict. The classic Stroop color and word task uses the interference effect to assess inhibitory control, preferably through the comparison between control and conflicting tasks [37]. In general, versions of this test differ in some dimensions, such as, number of colors used; type of stimuli used to present ink stains on the page; presentation of items in sequences of rows or columns; and the correction method [38].
The color and word Stroop is usually presented in three stages. In the first, called the Word, participants are required to read the written words. In the second stage, known as Color, is required to name the colors of words. In the last part, called Color and Word, the colors of the ink in which the words are printed should be named as quickly as possible, without considering the word itself [39] [40]. The conflicting mode of the presentation of the stages aims to generate interferences and distracting stimuli, evaluating the individual’s ability to inhibit an automatic response in detriment of another less used [40]. The effect that the test contemplates is one of the most robust cognitive phenomena available in neuropsychological assessment [27] [41].
Currently, different versions of the Stroop test are used to evaluate various aspects of executive functions, including constructs such as attention, interference, and inhibition [42] [43].
Emotional Stroop, a variant of the classic Stroop task, has been used for more than two decades. It is characterized by the presentation of words with an affective emotional content, for example, sadness, anger, happiness, among others of positive and negative valence, and with neutral content. The words are printed in color, where the subject must name their color while ignoring their semantic content [44] [45]. Thus, the emotional Stroop paradigm reveals itself as a popular measure of attentional bias in anxious and depressed patients [27] [46].
From this paradigm, it is possible to recognize how emotional attributes prove to be of great importance, since they exert an influence on cognitive processes. Some studies, in addition to verifying significant delays in naming colors in the classic Stroop, have also shown that the speed to name emotional words could be an indicator of the subjects’ concerns or anxieties: thus, words referring to emotional contents, such as death or sadness, produce greater interference than neutral words (for example, table or tree) [12] [47] [48].
To provide a measure of emotional conflict monitoring, which can consider more details, not only for the identification, but also for the resolution of the conflict, the scientific literature shows a wide variety of experimental designs using the emotional paradigm to study the neural mechanisms that may be associated with the efficiency of perceptual processing. In this direction, for example, photographs with images of faces accompanied by congruent and incongruous emotional words have been used [12] [20] [49], facial expressions of half of the face and its congruent and incongruent complement [19], emotional facial expression and complementary body expression in the congruent and incongruent condition [8] [50] and emotional expressions in the right vertical position and facing downwards [51].
As a result of these experimental variations, the emotional-face Stroop task proposed by [12] presupposes that the disturbance of the attention system, required for control and resolution, is altered by emotional interference, as for example, a word unrelated to the emotional expression of the face [12] [19] [20] [52]. Thus, the emotional face Stroop test model could assess the monitoring of emotional conflict, and, in turn, can be used to explore the quality of the active cognitive control mechanisms that shape the conflict, as well as those that offer the resources to identify and solve the problem [27] [49].
It is worth pointing out that, although there are instruments for the evaluation of executive functions and their relationship with the emotional components, the existence of an instrument that allows the monitoring of emotional face/word conflict is not available in the Brazilian context. In this perspective, the present work aims to develop a computerized instrument capable of reproducing the main attributes of the emotional Stroop paradigm, specifically formulated in previous works by Etkin et al. [12], for the Brazilian population. Thus, it is expected to make available a new assessment tool, which could be configured in a computational language, suitable for the Portuguese language, and that could be of theoretical and practical relevance for future research in neuropsychology.
2. Methods
2.1. Ethical Considerations
The study was approved by the Human Subjects Ethics Committee (protocol 56466216.0.0000.5084). A consent was obtained from all participants, in accordance with the ethical guidelines for research with human subjects (196/96 CNS/MS Resolution).
2.2. Participants
Forty-two participants aged between 18 - 30 years (25 women with a mean age of 29.3 ± 2.4 years and 17 males with a mean age of 26.5 ± 2.3 years) participated in our study. They were recruited within the Darcy Ribeiro Campus of the University of Brasília, DF, Brazil. All participants were native speakers of Brazilian Portuguese, they did not report any history of neurological disorder and they did not obtain values higher than expected in the State-Trait Anxiety Inventory: cut-off point 50, and in the Beck Depression Inventory (BDI-II): cut-off point 20 (see Table 1). All had normal or correct-to-normal vision acuity and they had not consumed drugs or alcoholic beverages in the 24 hours before the study.
2.3. Development of the TREFACE Instrument
Stimuli Selection
First, the original photographs of the set by Ekman and Friesen [53] [54] were purchased from the Paul Ekman Group, Copyright for academic and research use, known in the market as Pictures of Facial Affect (POFA)®.
The set of POFA 110 photographs (original stimuli) was formatted for a digital version, in addition to being numbered according to their original classification by Ekman, Friesen and Hager [55]. The digital formatting considered the following parameters: original dimension of 1411 × 2398 pixels, size 232 KB, resolution of 400 dpi and intensity of 24 Bits.
Then, the evaluation of the stimuli regarding the quality of emotional expression was carried out by five Brazilian specialists (judges) in the study of emotional facial stimuli processing in humans.
The set of POFA original photographs was presented on a computer screen, being positioned at a 90-degree angle in relation to the support surface (table), the person always sitting in front of the screen at 80 cm. None of the judges had
Table 1. Demographic and clinical characteristics.
Note. STAI = State-Trait Anxiety Inventory-Trait/State Version; cut off score 50. BDI-II = Beck Depression. Inventory-II; cut off score 20. SEM: Mean standard Error.
visual problems. The judgment according to the identified emotion (happiness, fear, sadness, anger, surprise, disgust, neutral, or “I cannot identify it”) was registered on a formatted paper sheet. The presentation was made by the Microsoft Office Power Point program, at 100% zoom level of the screen (20 inch).
Finally, the judges positively judged a total of 97 stimuli from the original POFA set. Only the group of stimuli that presented a level of coincidence between 80% and 100% was considered.
2.4. Computerized Model Formulation
The structure of the Task is made up of the sequence: 1) Guided Recognition (GR); 2) Word Reading (WR); and 3) Recognition of Face Expression (RFE), 70 stimuli were selected, pseudo-randomly for each stage. It was considered to have photographs of all emotional categories (happiness, fear, sadness, anger, surprise and disgust) and of both genders (male and female). At the end, 210 stimuli were presented, 70 for each stage.
In a complementary manner, a list of emotional words in Portuguese was created that would accompany the stimuli where necessary. These were: alegria,medo,tristeza,raiva,surpresa and nojo (happiness, fear, sadness, anger, surprise and disgust). The font size was 26, in font Arial, bold, red color. It is worth mentioning that, previous works have used red ink color in their letter protocols [12]. It is known that the red color superimposed on black, with white and gray backgrounds, increases the contrast of the image.
Thus, all the working material (stimuli) was introduced, configured and executed in the Stroop Test software - version 1.0.0.0.0. This software is a tool built in C# computational language on the Microsoft Visual Studio IDE platform, compatible with the Windows Vista operating system or higher.
For the GR stage, the presentation of a face stimulus paired with a word stimulus was considered. All 70 attempts (face-to-word) were matched. The delay time between stimuli was 100 milliseconds. For the stages RW and RFE, the stimuli were accompanied by the word in the center of the face (in the central line between the eyes and the nose), without affecting the central details of the character, an important aspect for the recognition of the emotional properties of the face. Similar examples can be identified in previous works [12]. A red dot on a white background was inserted as a fixation point, facilitating a pattern of attention or vigilance during the test execution according to the previous statements formulated by [12]. The delay and presentation time were standardized at 1000 milliseconds over these last steps. For all presentations the size of the stimuli (photographs) was 7.55 cm × 11.29 cm with a white background (see Figure 1).
Considering the details previously formulated in the literature [12], two conditions were developed, pseudo-randomly, within each of the stages RW and RFE: Congruent (C) and Incongruent (I). The congruent condition indicates a relationship of agreement or correspondence between the qualities of the elements (face expression and the word/emotion); on the contrary, the incongruent
Figure 1. Representation of the TREFACE task in the word reading and recognition of face expression stages. (A) The model pre-established within the stage. Condition; C = Congruent, I = Incongruent. A total of 70 stimuli per stage and 7 for each set were considered. (B) Conditions within the stages: C-WR = Congruent Word Reading, I-R = Incongruent Word Reading, C-REF = Congruent Recognition of Face Expression, I-REF = Incongruent Recognition of Face Expression.
condition reflects a lack of relationship or congruence between them. Thus, four fixed styles of presentation were generated according to the condition: C-WR (Congruent Word Reading), I-WR (Incongruent Word Reading), C-REF (Congruent Recognition of face Expression), I-REF (Incongruent Recognition of Face Expression), counterbalanced in terms of each face expression, word and gender of the photography model.
2.5. Procedure
Data were collected individually, in a single session, in a spacious, bright, and noise-controlled room. First, the participant was evaluated considering the inclusion criteria. Immediately, it was applied the Stroop task in the sequence: TREFACE 01: GR, TREFACE 02: WR, and TREFACE 03: RFE.
During the TREFACE task, the participant had to answer verbally to the objectives described for the different stages of the test, from the instructions that were presented on the monitor screen. The number of correct answers (scores), the number of errors, and the number of omissions (when not answered) were analyzed. To analyze the responses of the participants, the audio saved in the video file in WAV format was used.
2.6. Data Analysis
To analyze the correct answers of the participants’ performance, a Wilcoxon test for paired samples was used to compare the TREFACE stages, and a two-way ANOVA for repeated measures to compare the conditions within the TREFACE stages, where factor one was stages, with two levels (RW and REF) and factor two was condition, with two levels (congruent and incongruent), and the dependent variable was the number of correct answers. Data analysis was performed using the Sigma Stat 3.5 statistical program. The level of significance established for the analyses was p < 0.05.
3. Results
3.1. Performance in the TREFACE Stages
When comparing the scores in RW and REF, a Wilcoxon test for paired samples identified statistically significant differences between them (Z = −5648, p < 0.001). The mean of correct answers was greater for reading emotional words written on the photograph (97.76 ± 1.44) than for recognizing the emotional expression of the face ignoring the written word (43.98 ± 2.43) (see Figure 2).
3.2. Performance According to Conditions within TREFACE Stages
When comparing the performance achieved by the participants in the congruent and incongruous conditions within the TREFACE stages, a two-way ANOVA for repeated measures showed a statistically significant effect on the condition factor (F [1.41] = 69.923, p < 0.001) and in the stage factor (F [1.41] = 813.446, p < 0.001). In addition, an ANOVA also showed a statistically significant effect on the interaction between condition and stage (F [1.41] = 58.785, p < 0.001).
The Post hoc analysis for multiple comparisons (Bonferroni t test) showed that with regard to the condition factor, the congruent was different from the
Figure 2. Correct answers of the participants, in the TREFACE stages (Mean ± SEM). WR = Word Reading. RFE = Recognition of Facial Expression. *Statistically significant difference = WR > RFE. Wilcoxon test for paired samples (p < 0.05).
incongruent (t = 8362, p < 0.001); thus, the number of correct answers was greater in the congruent (81.29 ± 2.69) than the incongruent condition (60.44 ± 4.28). Additionally, in the stage factor, RW was different from REF (t = 28,521, p < 0.001), showing that the number of correct answers was greater for RW (97.76 ± 0.92) than for REF (43.98 ± 3.16).
Moreover, the analysis identified that in the stage factor within the congruent condition, the performance of the participants in RW was different from the performance in REF (t = 10,925, p < 0.001), where the correct answers were greater for RW (98.50 ± 0.97) than for REF (64.08 ± 4.47). This same result was observed in the incongruent condition (t = 23,211, p < 0.001), with an average of 97.01 ± 1.94 for RW and 23.88 ± 2.98 for REF.
Finally, for the condition factor within the REF stage, the analysis showed a difference between congruent and incongruent (t = 11,331, p < 0.001), where the participants’ performance was better in the congruent condition of the word with the image (64.08 ± 4.47) than the incongruent condition (23.88 ± 2.98). However, within the RW stage, no statistically significant difference was observed (t = 0.422, P = 0.674) between the congruent (98.50 ± 0.97) and the incongruent condition of 97.01 ± 1.94 (see Figure 3).
Figure 3. Correct answers of the participants according to the conditions within the TREFACE stages (Mean ± SEM). C-WR = Congruent Word Reading. I-WR = Incongruent Word Reading. C-REF = Congruent Recognition of Face Expression. I-REF = Incongruent Recognition of Face Expression *Statistically significant difference = C-WR > C-REF. **Statistically significant difference = I-WR > I-REF. ***Statistically significant difference = C-REF > I-REF. Two-way ANOVA for repeated measurements, followed Bonferroni t-test (p < 0.05).
4. Discussion
The aim of this study was to build a computerized instrument capable of reproducing the main attributes of the emotional Stroop paradigm for the Brazilian population. In this direction, the behavioral performance of young university students was collected being the first step to validate the instrument in the Brazilian context.
The results regarding the overall performance in the TREFACE stages revealed that the rate of correct answers was significantly higher in the RW stage when compared to the REF stage. Previous studies have reported that the ability to read is a learned mechanism, which becomes automated, especially in people who are assiduous readers [56] [57]. While the visual processing of the faces involves the participation of a set of deeper brain structures, thus resulting in greater difficulty in responding to this type of task [12] [20] [49] [58] [59] [60].
Neuroimaging studies by functional magnetic resonance indicate patterns of functional specialization of the prefontal cortex (PFC) related to two types of processes: sequential and monitoring [12] [20] [58] [59] [60]. Thus, sequential order processes in word reading activity would involve the PFC left hemisphere, and, in turn, more alternate or simultaneous monitoring processes would be at the base of the PFC right hemisphere during recognition of emotional facial expressions [20] [52] [57] [59].
The analysis of the conditions revealed that the rate of correct answers was higher when the word coincided with the image, indicating that the congruent mode makes the task easier. These data agree with previous studies (e.g., [59]), where it is pointed out that the ability to solve tasks in a related sequence favors their resolution with a synergy phenomenon instead of a competition phenomenon.
Regarding the reading of words congruent with the image (C-RW), a better performance was observed compared to congruent recognition (C-REF). This finding suggests that during reading words in congruent condition there was no immediate impairment, but in recognition this phenomenon was not observed. It is possible that there was a cognitive conflict that, even though of low intensity, can compromise recognition, remembering that emotional recognition is a relevant aspect for human behavior, especially when there are details in the context that can hinder processing [48] [52].
On the other hand, for RW when the image did not match it, a better rate of correct answers was observed, compared to REF in the same condition (I-REF). Here it is possible to argue that the word itself does not constitute a conflict when it comes to reading. However, recognition of face emotional attributes where the word generates an interference effect makes this processing difficult [57] [59], similar to that mentioned for the phenomenon between C-WR and C-REF.
Finally, comparisons between congruent and incongruent recognition indicate a higher rate of correct answers in the congruent condition (C-REF), compared to the recognition in the incongruent condition (I-REF). This phenomenon was not observed in the comparison between congruent and incongruent reading, where the maximum number of correct answers (ceiling effect) was always observed for both cases. This result may indicate the presence of an emotional conflict, when participants had to judge the emotional expressions in an incongruous situation. This result is similar to others previously reported [12] [20] [49], which indicate an interference effect on the processing of visual facial stimuli in unrelated contexts, a phenomenon described by Etkin et al. [12] [23] in his model of emotional conflict.
Functionally, it is possible that the TREFACE task, specifically in the stage of incongruent emotional recognition, compromises the performance of the detection and control functions, increasing the difficulty to exert inhibition (for example, a face of happiness with the word “fear”) [12] [23] [52] [60] [61]. In this way, the (verbal) inhibition capacity would become temporarily affected, which in turn makes it difficult to redirect new and appropriate responses to adapt quickly to the objective indicated in the task resolution: “Speak the name of the emotion as quickly as possible,ignoring the word” [12] [23] [62].
In addition to the effects on cognitive control, it is important to highlight that, in a complementary way, the conflict also managed to involve the temporary maintenance of information in working memory. According to Baddeley [63] and Diamond [62], the importance of this type of memory within the model of the components of executive functions is very broad, since in addition to maintain representations relevant to an ongoing activity over time, it also allows this content to be processed in relation to a purpose. Working memory also has the function of facilitating sustained attention, monitoring ongoing activity against its objectives and flexibility in the use of these elements during the performance of the task. These characteristics were observed in the participants’ performance during the TREFACE suggesting this protocol as a way of evaluating emotional working memory.
Functional neuroimaging studies have revealed that emotional stimuli activate the amygdala [12] [23] [64]. However, when incongruent images are presented, amygdala activity is inhibited by specific activation of the cingulate cortex in its anterior portion (ACC). These data indicate that ACC may be exercising a type of (inhibitory) control over amygdala activity, which would lead to improving its efficiency in dealing with emotional conflicts [12] [23] [64]. In contrast, patients with post-traumatic stress disorders and depression, resistant to treatment, cannot adapt to the conflict [65] [66]. A reduction in ACC activity could compromise efficiency during emotional processing, making them unable to control the emotional intrusion in their thoughts [59] [64] [67].
Here it is important to mention that the participants in the present study did not have neurological or psychiatric disorders or signs of anxiety or depression. Thus, the results of the present study can be considered as a standard reference for future research with clinical samples.
5. Conclusions
The results of the present study demonstrate that the TREFACE paradigm can be considered a good assessment tool for monitoring facial emotional conflict. It highlights the role that emotional aspects can play in the functioning of executive processes, which, in turn, requires sustained attention, working memory, cognitive control and mental flexibility, including motor-verbal control.
Thus, it is also possible to suggest TREFACE as an experimental design that reproduces the phenomenon of the emotional Stroop effect with faces, for use in research with clinical samples, in addition to being a low-cost tool, easy to access and with technical conditions to fit any other assessment protocol.
In addition, our results are complementary to previous works carried out in our laboratories, where various properties of visual stimuli and brain substrates related to facial recognition were analyzed [32] [36] [68] [69] [70].
Acknowledgements
This work was supported partly by a FAPEMA grant to CT (FAPEMA no 01102/16). EP was recipient of a PhD fellowship from the Pontifical Bolivarian University, Bucaramanga, Colombia, and LM was recipient of a PhD fellowship from the Student Program—Graduate Studies Plan (PEC-PG), CAPES/CNPq, Brazil. MCHT was recipient of a research fellowship from CNPq/Brazil (311582/2015-0).