A New Association Evaluation Stage in Cartoon Apprehension: Evidence from an Erp Study

The aim of this study was to investigate the temporal cortical activation patterns underlying different stages of humor comprehension (e.g., detection of incongruity stage, resolution of incongruity stage, and affective stage). Event-related potentials (ERPs) were measured when 16 subjects were apprehending cartoon pictures including humorous, non-humorous and unrelated items. Results showed that both humorous and unrelated items elicited a more negative ERP deflection (N500-800) than non-humorous ones between 500-800 ms, which might reflect detection to incongruent element during humor apprehension. Then, both humorous and non-humorous items elicited a more positive ERP deflection (P800-1000) than unrelated ones between 800-1000 ms, which might reflect a classification process preliminarily evaluating whether there were attainable cues in the pictures used to form possible association between context and picture (we named it " association evaluation " stage). Furthermore, humorous items elicited a more positive slow wave than non-humorous items which also elicited a more positive wave than unrelated items between 1000-1600 ms, during which this component might be involved in the forming of novel associations (resolution of incongruity). Lastly, between 1600-2000 ms, humorous items elicited a more positive ERP deflection (P1600-2000) than both non-humorous and unrelated items, which might be related to emotion processing during humor apprehension. Based on these results, we deeply subdivided the second stage (resolution of incongruity) into two stages: association evaluation and incongruity resolution.


Introduction
Humor is a high-level cognitive activity that plays a crucial role in social life.The ability to comprehend humor is considered by many investigators to be a significant component of what makes us unique as human beings [1], and to have a good sense of humor may represent an important coping strategy [2].Suls proposed an "Incongruity-Resolution theory" [3], according to which the humor processing could be divided into two stages: detection and resolution of incongruity [4,5].The detection stage refers to perception of an incongruous element which is resolved in the incongruity resolution stage [6].The resolution stage involves frame-shifting process, in which perceiver activates a new frame from long-term memory to reinterpret information already active in working memory [7].In addition, Gardner et al. (1975) thought humor comprehension could be divided into cognitive and affective elements [8][9][10].The cognitive element refers to the moments where people attempt to comprehend disparities between punch lines and prior expe-rience [11].The affective element refers to the moments where people experience pure visceral and emotional responses dependent upon the exhilaration of experience [8].Together, humor apprehension could be separated into three sequential process stages: incongruity detection, incongruity resolution, and affective experience.
Early researches were from studies on patients [10][11][12][13], which showed that some brain regions play a key role in humor comprehension.Further fMRI studies investigated brain activity in the processing of comprehending humor information, such as dissociation between cognitive and affective elements [8,14,15].However, it is difficult to distinguish detection from resolution of incongruity using fMRI because there exists no clear behavioral transition marker [8] except a recent study using more elaborate design [16].
Fortunately, ERPs that are time locked to the presentation of an external stimulus allow for more precise examinations of the time course of activation for different stages of humor.Using sentences with either jokes or equally surprising non-joke endings that did not entail frame-shifting, Coulson and Kutas had attempted to differentiate resolving incongruity stage from detecting incongruity stage of joke comprehension using ERP [5].
Coulson and colleagues also investigated the relationship between handedness, hemispheric asymmetries, and joke or pun comprehension [17][18][19][20].However, their results didn't provide a simple mapping to the two cognitive stages of humor apprehension [5].Recently, Du et al. designed a funny or unfunny ending for a story, the ERP results suggested the dissociation among these three stages of humor apprehension [21].In addition, using "oddball" procedure, Gierych et al. investigated ERP correlates of processing funny pictures which were not preceded by a "context-setting" phase.Results showed that funny pictures elicited more positive ERP waves within broad latency windows, which, they thought, was the effects of emotional arousal [22].
Up to the present time, previous studies have used word jokes [5,19,21,23], episodes of television sitcoms [8], cartoons [16,22,24,25], or even laughter [26][27][28] as the materials to study the mechanism of humor, yet the result is still uncertain, as some didn't aim at the process stages of humor directly and some had technique limitations.In the present study, we tried to use the cartoon pictures as materials to study the processing stages of humor apprehension, and devised three experimental conditions (humorous/non-humorous/unrelated) so as to differentiate the time courses of three stages with ERP effectively.Specifically, in our experiment, subjects were asked to judge whether the cartoon preceded by a context-setting caption was humorous, non-humorous or unrelated.The reasons for using this paradigm are as follows: in the first place, humorous cartoons in real life usually contained captions, and it was the relations between the caption and the cartoon that stirred humorous feeling.Secondly, we disassociated caption from picture attempting to avoid confusion between word processing and picture processing in order to analyze the ERPs elicited by human comprehension effectively.We speculated that, firstly, both humorous and unrelated conditions would elicit different ERP components than the non-humorous condition because of the detection of incongruity; secondly, the ERPs elicited by the three conditions would be differentiated from each other because of different extent of cognitive sources (resolution of incongruity); thirdly, the ERPs elicited by non-humorous and unrelated conditions would be differentiated from humorous condition because of exhilaration of personal experience (emotion stage).

Subjects
As paid volunteers, 16 adults (8 women, 8 men) aged 18 -24 years (mean age, 21.6 years) from Southwest University in China participated in our experiment.All subjects who gave written informed consent, were righthanded and had no history of current or past neurological or psychiatric illness, and had normal or corrected-tonormal vision.

Stimuli
Prior to the experiment, 300 cartoon pictures were selected from the Internet.All the pictures were transformed into black & white pictures and were slightly altered in order that the pictures did not include any captions.Firstly, we selected 100 the most humorous cartoon pictures and adapted/created a caption context (4 -6 Chinese characters) for each of the 100 cartoon pictures, and the cartoon pictures were funny with explanation of the caption context (humorous condition).Secondly, we selected 100 cartoon pictures from the rest 200 pictures and removed the humorous elements in the pictures if they had any.Similarly, we made a caption context for each of the 100 cartoon pictures, but the cartoon pictures were logically consistent with their caption context (nonhumorous condition).Thirdly, we removed the humorous elements in the remaining 100 pictures if they had, and made an unrelated caption context for each picture (unrelated condition).Examples of the three conditions are shown in Figure 1.
Then, the other 20 people, who did not join the ERP experiment, were asked to rate their attitudes on a scale of 1 to 4 (1-humorous; 2-non-humorous; 3-unrelated; 4unclear) for each cartoon prior to the formal ERP expe-riment.The subjects were demanded to judge the cartoons with the context-setting caption.Finally, for humorous condition 60 cartoon pictures were chosen which were rated more than 14 times as humorous.In the same way, we chose 60 pictures for each of the other two conditions.All the cartoons were 8 cm × 6 cm, and were centered with a width of 6.6˚ and a height of 4.9˚.

Procedure
The flow of stimulus presentation in each trial is shown in Figure 2. The subjects were asked to place their left hand on the space bar and right hand on the numbered keypad.First, a fixation point (+) appeared on the center of the screen for 300 ms, then the character context was presented.Subjects were asked to press the space bar when they had understood the meaning, the context disappeared as soon as the space bar was pressed.Then, after an asterisk (*) appeared randomly 300 -500 ms, the cartoon pictures were presented.Subjects were required to make a "humorous/non-humorous/unrelated" judgment about the pictures ("1", "2" and "3" keys stand for humorous, non-humorous and unrelated appreciations respectively) based on the relationship between pictures and captions context.The cartoons would disappear if subjects didn't press any key within 6000 ms.Following an interval of 1.5 s, another trial continued.In addition, the stimulus-response key assignments ("1", "2" and "3" keys) were counterbalanced across subjects.
The whole test was divided into two parts.There was a pre-test with six trials to familiarize the subjects with the procedure.Then, the formal ERP experiment started.There were 3 blocks and each one consisted of 60 trials, 20 trials for each condition (humorous/non-humorous/ unrelated).The different conditions in each block were displayed randomly.Between blocks, subjects could take an appropriate rest.Subjects were seated in a quiet room facing a screen placed approximately at 70 cm distance from the eyes and were instructed to respond as fast and  accurately as possible by pressing the corresponding key of the keyboard.Subjects were asked to try to make few movements and little eye-blink.

ERP Recording and Analysis
Brain electrical activity was recorded from 64 scalp sites using Ag/AgCl electrodes mounted in an elastic cap (Brain Product), with the reference on the left and right mastoids.The vertical electrooculogram (VEOG) was recorded with electrodes placed above and below the right eye, and the horizontal electrooculogram (HEOG) with electrodes placed by right side of right eye and left side of left eye.All interelectrode impedance was maintained below 5 kΩ.The EEG and EOG were amplified using a 0.05 -80 Hz bandpass and continuously sampled at 500 Hz/channel for off-line analysis.Eye movement artifacts (blinks and eye movements) were rejected offline.Trials with EOG artifacts (mean EOG voltage exceeding ± 80 μV) and those contaminated with artifacts due to amplifier clipping, bursts of electromyographic activity, or peak-to-peak deflection exceeding ± 80 μV were excluded from averaging.
The ERP waveforms were time-locked to the onset of the pictures.The averaged epoch for ERP, including a 200-ms pre-pictures baseline, was 2900 ms.Item was classified as humorous, non-humorous or unrelated condition if it was rated similarly as humorous, non-humorous or unrelated both in pilot study and in formal ERP study, EEG of each condition were separately averaged.And at least 30 trials were available for each condition of each subject.On the basis of the ERPs grand averaged potentials and voltage maps of difference waves (Figures 3-5), the ERP component amplitudes were analyzed in a series of two-way repeated-measures ANOVAs using the factors of Task type (humorous/non-humorous/unrelated conditions) and central-anterior electrode site or centralposterior electrode site, separately for each ERP component.Because using data from multiple electrode sites may lead to a violation of the sphericity assumption, all ANOVA results were corrected using the Greenhouse-Geisser procedure.

Behavioral Performance
The reaction times (RTs) for humorous, non-humorous and unrelated responses were 3335.9 ± 1272.The hit accuracies, which means that an item was rated similarly as humorous, non-humorous or unrelated both in pilot study and in formal ERP study, were 0.71% ± 0.19%, 0.73% ± 0.17%, and 0.76% ± 0.19% for humorous, non-humorous and unrelated conditions respectively.Effect of hit accuracies was not significant, F(2, 30) = 0.64, p > 0.05.

Electrophysiological Scalp Data
From the ERP grand average waveforms and voltage maps of difference waves (Figures 3-5), it is obvious that ERPs elicited by the three conditions have similar effects (such as ERPs between 500 -800 ms, 800 -1000 ms, 1000 -1600 ms and 1600 -2000 ms) at the centralanterior electrode sites, and are almost the same and seem to be insignificant at central-posterior electrode site.Nine electrodes (FPz, AF3, AF4, Fz, F1, F2, FCz, FC1, FC2) at central-anterior electrode site and seven electrodes (Cz, C1, C2, CPz, CP1, CP2, Pz) at central-posterior electrode site were selected for analysis.Mean amplitudes in the time windows between 400 -500 ms, 500 -800 ms, 800 -1000 ms, 1000 -1600 ms and 1600 -2000 ms were analyzed using two-way repeated-measures ANOVAs, with task type and electrode site as factors.In addition, the peak magnitudes of N100 about 100 ms, P170 about 170 ms, and N250 about 250 ms were also analyzed.

Discussion
In the present study, we attempted to use cartoon pictures as our experimental materials to distinguish the electrophysiological correlates of process stages during humor apprehension.The ERP results showed that there were some interesting findings about the neural basis of humor apprehension in the time windows of 500 -800 ms, 800 -1000 ms, 1000 -1600 ms and 1600 -2000 ms, which indicated that three process stages in cartoon comprehension might be probably distinguished on the millisecond scale of event-related potentials.We would discuss the implications of the ERP components as following.
First, both humorous and unrelated items elicited a more negative ERP deflection (N500-800) than did nonhumorous items between 500 -800 ms, which might be involved in detecting the incongruent elements in cartoon apprehension.Previous studies have shown that the N400 is a good marker of incongruity and appears when participants respond to incongruous sentence endings [29,30].Similar components were found in response to pictures of objects that were semantically unrelated to previously displayed pictures or sentence contexts [31][32][33][34].In these studies, the anomalous final pictures generated a larger N400 than did congruous ones.However, the scalp distribution of N400 differed between pictures and words.Specifically, The N400 effect for pictures was largest over the frontal midline site rather than posterior sites [31,32,34].In addition, few researches indicated the relation of N400 to humor, except that Coulson and Kutas (2001) found joke sentence endings elicited a more negative N400 than did the non-joke sentence endings which were equally unexpected.In the present study, clues in the non-humorous pictures could be sensed consistent with expectation inspired by previous context captions, but clues in the humorous and unrelated pictures could be sensed inconsistent with expectation, although subjects had not apprehended the pictures in detail.Therefore, the N500-800 might be related to N400 potential, and reflect the registration of surprise in humorous cartoon apprehension.
Second, between 800 -1000 ms, humorous and nonhumorous items elicited a more positive ERP deflection than did unrelated items between 800 -1000 ms at central-anterior electrode sites.We thought this positivity might be a late positive component (LPC).Previous study showed that LPC was associated with task classification [35].Other studies also suggested that this positive component with latency in the range of 500 -900 ms post-stimulus (sometimes this component was called P600 or P800) was related to recollection processes of a more elaborative nature, based on information stored in long-term memory [36,37].In the present study, the delay of latency might be due to the relation between texts and images which demand the recollection of previously presented texts, as well as complexity of the cartoon [38].This positivity might reflect a classification process preliminarily evaluating whether there were attainable cues in the pictures to form possible association between context and picture (association evaluation) before apprehending the relationships in detail.Because it is really difficult for the subjects to get any cues in unrelated condition to form association, they had to pay more attention to assure whether there were any associations, the smaller amplitude for unrelated condition might index greater attentional resources employed [38,39].The specific process of picture details and the forming associations between contexts and pictures might be reflected in following process stage.
Third, humorous items elicited a more positive slow wave than did non-humorous items between 1000 -1600 ms, which might be involved in forming of novel associations (resolution of incongruity).Resolution of incongruity involves a process of frame-shifting, in which the perceiver activates a new frame from long-term memory to reinterpret the information already active in working memory [7].Coulson and Kutas (2001) found that ERPs to jokes post-onset were more positive over medial posterior sites between 500 and 900 ms during which frameshifting was thought to occur.Previous studies also indicated that slow waves correlated with rehearsal/retention operations in working memory [40,41].It was suggested that larger slow wave indicated more process demands to retain object information in working memory [42].In our study, following previous stage of preliminary association evaluation, subjects might process the pictures in detail and recheck any possibilities of forming association.Obviously, they needed more cognitive resources to form novel association for understanding the humorous items than understanding the non-humorous items.Therefore, this slow wave might be related to the extent to which the working memory are demanded to form novel association between context and picture, that is to say, the larger amplitude, the more cognitive resources used.In addition, we found that the non-humorous items elicited a more positive slow wave than unrelated items between 1000 -1600 ms, which might reflect forming of the consistent association under the non-humorous conditions, but only processing of the pictures in the unrelated conditions because of the less possibility of forming associations between contexts and pictures.
At last, humorous item elicited a more positive ERP deflection (P1600-2000) than did both non-humorous and unrelated items between 1600 -2000 ms, which might be related to emotional processing in humorous cartoon apprehension.Many studies had found that emotional pictures (i.e.pleasant and unpleasant ones) elicited a larger late positive potential than neutral pictures, which started around 300 -400 ms following picture onset and lasted for several hundred milliseconds [43][44][45][46].In addition, using a slow time constant, an extended late positive slow wave was observed which was significantly larger for emotional pictures compared to neutral pictures, and was sustained over a 6-s picture viewing period [47,48].In humor studies, a more positive ERP wave was elicited by funny items than unfunny items within broad latency windows [21,22], which was correlated with emotional arousal.In our experiment, humorous condition involved positive emotion compared to non-humorous/unrelated condition.Therefore, it is reasonable to postulate that the significant difference between humorous and non-humorous/unrelated condition between 1600 -2000 ms might reflect humorous emotion processing in cartoon apprehension.
Together, the ERP results might indicate electrophysiological correlates of three process stages in humorous cartoon apprehension.Moreover, these results suggested that the incongruity resolution stage might be subdivided into two stages: association evaluation and incongruity resolution, comprising the four stages model of humor apprehension.The speculation might be consistent with the findings that general resolution process is dissociated from incongruity resolution process in a fMRI study using the same paradigm [16].However, the conclusions we got are only from the "context-setting" paradigm by using cartoon as material.The future studies adopting different kinds of materials and paradigms are necessary to better understand the process stages in humor apprehension.

Figure 2 .
Figure 2. The flow of stimuli presentation in each trial.

Figure 3 .
Figure 3. Grand average ERPs at central-anterior sites.Both humorous and unrelated items elicited a more negative ERP deflection (N500-800) than did non-humorous items between 500 -800 ms; both humorous and non-humorous items elicited a more positive ERP deflection (P800-1000) than did unrelated items between 800 -1000 ms; humorous items elicited a more positive ERP deflection (P1000-1600) than did non-humorous items which also elicited a more positive wave than unrelated items; humorous item elicited a more positive ERP deflection (P1600-2000) than did both non-humorous and unrelated items between 1600 -2000 ms.

Figure 4 .
Figure 4. Grand average ERPs at central-posterior sites, where the ERP effects were apt to disappear.

Figure 5 .
Figure 5. Voltage maps of difference waves were primarily at the central-anterior site between 500 -800 ms and 800 -1000 ms.