Ostensive Cues Orient 10-Month-Olds ’ Attention toward the Task But Delay Learning

The aim of this study is to investigate how ostensive cues modify infants’ visual attention to task demonstration, and the extent to which this enhances the performance in an imitative learning task. We hypothesized that ostensive cues would help orient infants’ attention toward relevant parts of the demonstration. We investigated the looking behavior of 41 10-month-old infants while observing an adult demonstrating a novel target action after having either provided ostensive cues or not. Infants’ looking behavior was measured using an eye tracker. Two areas of interest were analyzed: the targeted object and the adult’s face. Infants’ performance after demonstration was also analyzed. The results show that infants’ looking behavior varied across groups. When ostensive cues were not provided, infants looked mainly at the experimenter’s face. However, when ostensive cues were provided, infants oriented their attention toward the targeted object. These results suggest that ostensive cues help infants orient their attention toward task-relevant parts of the scene. Surprisingly, infants in the non-ostensive group improved their performance faster after demonstration than infants in the ostensive group. These results are discussed in terms of a video effect and dissociation between separate cognitive systems for social and non-social cognition.


Introduction
Over the last five years, an emerging body of research has been showing how infants' imitative learning capacities are enhanced when the experimenter communicates with the infant before or during the demonstration of a target action.The communication used in these studies can be social interaction unrelated to the task (e.g.playing with the infant before testing) (Nielsen, Simcock, & Jenkins, 2008) or ostensive cues directly related to the target action (Brugger, Lariviere, Mumm, & Bushnell, 2007;Carpenter, Call, & Tomasello, 2005;Esseily, Rat-Fischer, O'Regan, & Fagard, 2013;Southgate, Chevallier, & Csibra, 2009;Topàl, Gergely, Miklosi, Erdohegyi, & Csibra, 2008).In this paper we focus exclusively on how ostensive cues modify the infant's visual attention to the demonstration and the resulting effects on performance.Ostensive cues include visual and auditory cues such as eye contact, eyebrow raising, infant-directed speech, saying the infant's name, illustration of the experimenter's intention, etc. Csibra and Gergely (2009) hypothesized that ostensive cues enhance infants' performance by guiding them to the information to be learned, hence leading infants to pay more attention to the demonstration.However, to our knowledge, it has not yet been empirically shown that ostensive cues guide infants' attention in an imitative learning task.One way to test this hypothesis is to measure infants' looking behavior in an imitative learning task while the experimenter is demonstrating a target action after having provided ostensive cues (or not).The goal of this study was to see whether ostensive cues actually help infants orient their attention to the demonstrated target action.
A number of studies have focused on the effect of ostensive cues on infants' reproduction of a demonstration.Some studies have found that infants primarily imitate effective ways of achieving goals, ignoring apparently unnecessary actions unless the demonstrator makes it manifest to them through ostensive cues that these actions are relevant to the task (Brugger, Lariviere, Mumm, & Bushnell, 2007;Carpenter, Call, & Tomasello, 2005;Carpenter, Nagell, & Tomasello, 1998;Gergely, Bekkering, & Kiràly, 2002;Nielsen, 2006;Southgate, Chevallier, & Csibra, 2009).In study of Brugger et al. (2007) for example, infants saw a model perform a multi-step action in which the first action was either necessary or unnecessary to attain the goal.They showed that when the experimenter provided ostensive cues by speaking directly to the infant before performing the first action, thus marking that action as an important step, the action was imitated in both the necessary and unnecessary conditions, showing the importance of ostensive cues in a learning context.In a slightly different procedure, showing the experimenter's intention or goal before demonstration of the target action was shown to improve infants' performance in an imitative learning task (Esseily, Rat-Fischer, O'Regan, & Fagard, 2013).In this study, 16-month-old infants were shown a novel means-end action (retrieving an out-ofreach toy using a tool).The authors observed that infants tended to ignore the demonstration and tried to reach directly for the object with their bare hand.However, in the condition when the experimenter tried to reach for the object with bare hand while saying "I can't get it" before providing the demonstration, infants reproduced the action of using the tool significantly more frequently.
The question we raise in this paper is why infants' imitative learning performance is better when ostensive cues are provided.Do these cues guide infants' attention?Some studies using joint attention tasks have tried to answer this question.These studies have investigated what infants attend to, depending on eye contact conditions, using gaze direction measurements (Senju & Csibra, 2008;Senju, Csibra, & Johnson, 2008).The results showed that infants follow an experimenter's gaze toward an object only if the experimenter makes eye contact with the infant before a gaze shift toward the object.However, few existing studies have investigated what infants attend to in an imitative learning task when ostensive cues are provided before or during demonstration.
Thus, in the study presented here, we sought to test infants' attention through gaze direction measurements during an adult's demonstration of a novel target action after either providing ostensive cues or not.We suppose here that an infants' gaze reflects what they are attending to in a scene.We hypothesized that ostensive cues would direct infants' gaze to the part of the demonstration that is relevant for learning, namely the action performed or the object manipulated.By contrast, when no ostensive cues are provided, infants may be less likely to know where to look in the demonstration, and their attention may be attracted to salient but less relevant targets such as the experimenter's face (e.g.Franck, Vul, & Johnson, 2009;Henrichs, Elsner, Elsner, & Gredebäck, 2012).
Because ostensive cues partly rely on infants' joint attention capacities (Gergely & Csibra, 2006) and because infants' joint attention has been shown to emerge around 10 months of age (Carpenter, 1998), we decided to investigate 10-month-old infants' looking behavior when presented with an adult demonstrating a complex target action.The target action consisted of holding an opaque wooden container with one hand and pulling out an inserted transparent tube with the other hand, an action that infants are known to be spontaneously successful at around the age of 11 months (Fagard, 1998).Infants observed a movie of the adult demonstrating the target action, either preceded by ostensive cues or not.The use of video was based on a previous study in which we showed that 10-month-old infants were capable of imitating novel means-end actions from video models (Esseily & Fagard, 2012).
Infants' looking behavior was measured using an eye tracker.We also compared infants' performance before and after demonstration in each of the two groups.We expected to observe better learning capacities in the ostensive group as compared to the non-ostensive group.

Method Participants
A total of 41 healthy full-term infants participated in this experiment (22 females).The mean age at the time of testing was 10 months (range: 9 months 7 days to 10 months 13 days).The infants were recruited from a local list of families who expressed interest in participating in the study.Parental consent was granted before observing the infants.

Materials
We chose a target action that is rarely successfully performed spontaneously at the age tested (Fagard, 1998).It consisted in pulling a tube out of a container with one hand while holding the container with the other hand, thus requiring bimanual coordination of complementary movements.The transparent plastic cylindrical tube (12 cm long × 1.5 cm diameter) with an orange cap was inserted into a wooden container 9 cm long × 2.5 cm wide.

Procedure
Testing occurred in the university infant testing room.The infant sat at a table on a parent's lap.Parents were asked not to interfere with their infants' activity.Once the infants were judged to be accustomed to their surroundings and comfortable, testing began.Infants were randomly assigned to one of the two groups: a non-ostensive group (n = 20) where infants observed a movie of an adult performing the target action; and an ostensive group where infants observed a movie of an adult looking at them and addressing them in infant-directed speech saying "Hi baby, look!" before performing the target action (n = 21).Videos were used to ensure that ostensive cues were comparable for all participants.The experimenter who modelled the target action was a stranger to the infants, and different from the experimenter testing the infants.
In both groups, the model repeated the demonstration three times in a row for a total duration of approximately 12 seconds.The video was displayed on a 17" LCD screen placed on the table at 70 cm from the infant.In both groups, the video began with an attractive image of a cartoon character with music to draw infants' attention toward the screen.
Each infant was assigned randomly to one of the two groups, and went through four trials.The first was a spontaneous trial where infants played with the object during one minute of free manipulation.The second, third and fourth trials were test trials, each consisting in one minute of manipulation and each preceded by the same video demonstration, corresponding to the infant's group (ostensive or non-ostensive).We decided to show each infant the demonstration three times because of the video deficit effect observed in many other studies: additional exposure is needed when the demonstration is presented on video rather than live (Barr, Dowden, & Hayne, 1996;Barr & Hayne, 1999;Barr, Muentener, Garcia, Fujimoto, & Chavez, 2007).During the demonstration, the experimenter stood behind the infant and the parents, and the real object was put out of the infant's sight.During testing, the experimenter stood facing the infant and handed the object to her.A video camera recorded the infant's behaviour during the whole experiment.The whole session lasted a maximum of 10 minutes.

Eye Tracking
A Tobii X120 eye tracker and a screen were placed at a distance of 70cm from the infant's eyes.Gaze direction was recorded using a Tobii studio program.The infant's line of gaze was computed by the eye tracker based on the pupil-corneal reflection at a sampling rate of 120 Hz.
The experiment started with a calibration.The experimenter turned on the calibration stimulus, a bouncing ball, whenever the infant was looking at the screen.Five points of calibration were used, one at each corner of the screen and one at the cen-tre.If the infant looked away during the calibration, an animated stimulus popping on the screen was used to redirect the infant's gaze toward the screen, and the experimenter calibrated the missing points.

Coding Eye Tracking Analysis
Data from the eye tracker were analyzed using the Tobii studio software.Fixation times were first calculated on three areas of interest (AOI): face, tube and container.At the beginning of the video, the tube is inside the container and only the cap of the tube is visible.The model holds the container still during the demonstration and pulls the tube out of it in a linear movement.As the tube and the container are both part of the same object, we decided to pool the two parts together in the same area of interest, called "object".Thus, two areas were ultimately considered: the face and the object (tube + container).In the two groups, the face occupied 1.32% of the screen and the object 0.73%.Fixations away from both the face and the object were considered to be out of areas of interest (OAOI).Fixation points were easy to code on relatively static areas like the face and the container.To code dynamic areas like the tube, we did a frame-by-frame analysis to mark each fixation's corresponddence to the movement of the tube.
We analyzed the following data: total fixation time on the demonstration (including both AOI and OAOI fixation times), and fixation time on each AOI.
As mentioned in the procedure section, each demonstration was repeated three times on each trial, for a total duration of approximately 12 seconds.Thus, each demonstration lasted approximately four seconds.We first analysed data separately for the three demonstrations.We found no significant difference in AOI and OAOI fixations between the three demonstrations, and thus, to simplify the results, we present the mean AOI and OAOI fixations over the three demonstrations in the first trial.We had eye tracking data for all three trials, but because of substantial data loss from the second and third trials, only the eye tracking data from the first trial will be presented here.This was because infants were more distracted in the second and third trials than in the very first trial where attention was at its maximum.
When the effects were not significant, we calculated the effect size using Cohen's d (Cohen, 1977).

Behavioral Analysis
The video recordings were coded by two independent observers.Infants' spontaneous activity and behaviour after each demonstration was coded in relation to the target action.A behaviour was coded as the target action if the infant removed the tube from the container bimanually (holding the container with one hand and pulling the tube with the other hand).Non-target actions with the tube included shaking the tube, putting it into the mouth, or striking the table with it.If by chance these manipulations led to the tube leaving the container, or if the infants pulled the tube out of the container unimanually, which happened only three times, we gave the object back to the infant to check whether the action would be intentionally repeated.If the infant re-enacted the action bimanually, it was coded as a success.
For each group, the arcsine transformation of the percentage of infants who produced the target action during spontaneous activity and during the test trials following demonstration was compared.If infants imitatively learned the target action, then a significant increase was expected in the percentage of infants producing the target action after the demonstrations as compared to the spontaneous trial.There was 100% agreement on the possible outcomes between the two observers.

Eye Tracking
We obtained eye tracking data for 37 out of the 41 infants (20/20 in the non-ostensive video group and 17/21 in the ostensive video group), because of technical problems with the eye tracker that occurred during the experiment with the remaining subjects.
We will first present fixations on the screen, then fixations on the areas of interest: the face and the object (AOI), as well as fixations out of areas of interest (OAOI).

Total Fixation Time on the Demonstration
The time infants spent looking at the screen in the non-ostensive group and the ostensive group was 3.7 seconds (SD = 0.7) and 3.5 seconds (SD = 1.2) respectively.An ANOVA on fixation time with group as an independent measure showed no main effect of group.Thus, infants in both groups looked equally at the overall demonstration.

Fixation Time on AOI and OAOI
As can be seen in Figure 1, infants in the ostensive group looked less out of the area of interest (OAOI) than infants in the non-ostensive group.In both groups, infants looked more at the model's face than at the object.However, in the non-ostensive group, infants looked more than twice as much at the face (2.06) than at the object (0.8), whereas in the ostensive group, the difference between time of fixation on the face (1.78) and on the object (1.34) was much smaller.An ANOVA with fixation time on the face, on the object and OAOI as dependent measures and group as independent measure showed a main effect of group (F (2), 32) = 3.9; p = .01).A post hoc LSD test showed that the main effect was due to the difference in fixation times on the object (p = .03)and OAOI (p = .01).There was no significant difference for fixations on the face.A one-sample t-test conducted separately for each group on time spent looking at the face and at the object showed that the difference was significant in the non-ostensive (T (18) = 3.19, p < .01)but not in the ostensive group.
Thus, infants in the ostensive group looked less outside the areas of interest and fixated the object more than infants in the non-ostensive group.

Percentage of Infants Producing the Target Action
The percentage of infants who produced the target action in the spontaneous and test trials is presented in Table 1.A generalized linear model was used to compare the percentage of infants who spontaneously produced the target action, and no effect of group was found.This suggests that the groups were equivalent in terms of spontaneous manipulation, and therefore can be compared for the trials after demonstration.
To check for an effect of ostensive cues on performance after demonstration, we compared the spontaneous trial with the three trials after demonstration by performing a 4 (trial, repeated measures) × 2 (group) ANOVA on the arcsine transformation of the percentage of infants who produced the target action.We found a main effect of trial (F (3, 114) = 8.6; p < .01),no main effect of group, and no Group x Trial interaction.The percentage of infants who produced the target action increased significantly after demonstration compared to the spontaneous trial.A post-hoc LSD analysis indicates that the trial effect is due to a significant change between the spontaneous trial and the first test trial in the non-ostensive group (p = .01),and a significant change between the spontaneous trial and the third test trial (p = .002)in the ostensive group.
Thus, infants improved their performance right after the first set of demonstrations in the non-ostensive group, but only after the third set of demonstrations in the ostensive group.

Discussion
The goal of this experiment was to investigate how ostensive cues modify infants' visual attention to demonstrations, and the extent to which this enhances performance.Our hypothesis was that the role of ostensive cues provided before demonstration is to orient infants' attention to the actions to be learned.In the absence of ostensive cues, instead of looking at the demonstration, infants would instead be attracted by salient targets.In particular, the face is known to attract infants' attention when observing a complex scene as early as 3 months of age (Frank, Vul, & Johnson, 2009;Henrichs, Elsner, Elsner, & Gredebäck, 2012).We tested our hypothesis using gaze direction measurement to examine infants' looking behaviour while they were showng a demonstration of a target action, either preceded by ostensive cues or not.
Eye tracking results show that infants spent at least half of the time of the demonstration looking at the face of the model whether ostensive cues were provided or not.One interpretation could be that infants are seeking information about the novel task by looking at the model's eyes to establish the direction of her gaze, or through her emotional expressions.This behaviour, known as social referencing, is typically seen in ambiguous situations, when strangers are present for example (e.g.Feinman & Lewis, 1983).Social referencing studies demonstrate that infants look at adults and use some of the ostensive cues that adults provide to guide their behaviour.Indeed, our results confirm these conclusions: even though infants in the ostensive group looked at the object manipulated, most of them first made eye contact with the model before directing their gaze to the object, showing social referencing.
Our eye tracking data also show that infants looked more at the targeted object in the presence of ostensive cues than in their absence, thus confirming our hypothesis.In addition, infants in the ostensive group looked less outside the areas of interest (face and object) than infants in the non-ostensive group, suggesting that ostensive cues help infants focus their attention on the demonstration.These results are in accordance with studies on joint attention showing that infants look at a manipulated object more if the experimenter provides ostensive cues such as eye contact (Senju & Csibra, 2008;Senju, Csibra, & Johnson, 2008).
Considering these eye tracking results as well as studies showing the positive effect of ostensive cues on learning, we expected better imitative learning performance when the experimenter provided ostensive cues before performing the demonstration.Surprisingly, we found instead that infants in the non-ostensive group learned the target action significantly faster than infants in the ostensive group.
Even though these results may initially seem surprising, they may be partly explained by a video effect.Indeed, 10-montholds may be surprised to see an adult on a video making eye contact and addressing them with infant-directed speech.Some studies have shown that infants and older children do not always believe that characters on a screen can engage in real communicative interaction (Claxton & Ponto, 2013) and do not always use information from videos to solve a real-world problem.For example, Troseth, Saylor, & Archer (2006) showed that 2-year-old children who were told face-to-face where to find a hidden toy found it, but children who were given the same information by a person on video did not.In the same study, children who engaged in a 5-minute contingent interaction with a person (including social cues and personal references) through closed-circuit video before the hiding task used information provided to find the toy.Taken together, these studies suggest that the video effect is due to a lack of interaction between the experimenter and the infant.Indeed, the results of the two additional test trials in our study show that the performance of infants in the ostensive group improved gradually over the second and third trials, and their results eventually became comparable to those of the infants in the non-ostensive group.These results favour a surprise effect that fades away with repeated exposure to the video and to ostensive cues, leading to better performance.It would have been interesting to see how looking behaviour changed across trials.
Another non-exclusive explanation may contribute to explaining this apparent contradiction between the greater attention that infants in the ostensive group pay to the object and their less successful performance in comparison to infants in the non-ostensive group.It may be that social and non-social cognition depend on separate cognitive systems, as some authors have claimed (Gelman & Spelke, 1981;Legerstee, 2006;Spelke & Kinzler, 2006).Thus, when the social system faces a load of ostensive cues, infants may need time to process the social information at the cost of neglecting the cognitive aspects, in this case pulling apart the tube and the container using bimanual coordination.The results fit with this alternative explanation, given that infants in the ostensive group needed more time to succeed at performing the task than infants in the nonostensive group: this may reflect the time needed to process the social information provided by the experimenter.Thus, it would be interesting to test this hypothesis by varying the social and the cognitive loads in a single experiment, to see whether a trade-off could be observed between the two systems.
Finally, even though the rate of success after the first demonstration was higher in the non-ostensive group, it remains low, since only 20% out of the 75% of infants who failed spontaneously reproduced the target action after demonstration.Two reasons may explain this low success rate.First, some studies have shown that 10-month-old infants have limited imitative learning capacities, and it is not until 12 to 15 months of age that infants begin to learn novel tasks by imitation (Elsner, Hauf, & Aschersleben, 2007;Esseily, Nadel, & Fagard, 2010;Fagard & Lockman, 1998).Second, even though demonstration via video has been tested in previous studies (Esseily & Fagard, 2012), others have shown a video deficit effect (Barr, Dowden, & Hayne, 1996;Barr & Hayne, 1999;Barr, Muentener, Garcia, Fujimoto, & Chavez, 2007;Zack, Barr, Gerhardstein, Dickerson, & Meltzoff, 2009).This effect may have contributed to the low success rate.
In conclusion, this is the first eye tracking study to show that ostensive cues can serve as a pointer directing infants' attention to important elements of a demonstration.However, when ostensive cues are provided, infants may be "distracted" by the social information and ignore the cognitive task.This might be particularly true at young ages when infants' social and cognitive capacities are limited.Thus, it would be interesting to pursue this study with older infants to see whether resolving the task becomes easier with improvement in the capacity to process social and cognitive stimuli at the same time.

Figure 1 .
Figure 1.Fixation times on the face, on the object and OAOI as a function of group.

Table 1 .
Percentage of infants performing the target action as a function of trials and groups.