Reading Aloud Performance and Listening Ability in an L2: The Case of College-Level Japanese EFL Users

Recent advance in neuropsychology has evidenced a facilitative role of sensorimotor activity for the development in L2 speech perception. The study attempted to examine the relationship between reading aloud (RA) performance, grammatical knowledge and listening ability with 31 college-level Japanese EFL users. The result demonstrated highly significant correlations between all the variables, and the subsequent multiple regression analysis also indicated RA significantly accounting for listening. Supplementary analyses dividing the participants by listening ability demonstrated that while significant correlation was maintained between L2 knowledge and listening with less-proficient listeners, it disappeared with proficient listeners; in contrast, significant correlation between RA and listening performance was maintained in both groups, indicating that production accuracy/fluency still played an important role in advanced L2 listening.


Introduction
The motor theory of speech perception by Liberman, Cooper, Shankweiler, and Studdert-Kennedy (1967), which suggests that speech perception depends on access to the speech motor system, and the later work by Liberman and Mattingly (1985), which demonstrates that speech perception is specifically facilitated by coordination between the perceived gestures of the speaker's vocal tract and matching intended gestures on the part of the listener (p. 3), has not appeared to be greatly influential in research into the L2 listening process. However, recent developments in neuroscience, in particular the discovery of the mirror neuron system, have inspired the reconsideration of the implications of the motor theory for the mechanism of speech perception. This system was first reported by a research team led by Giacomo Rizzolatti, who, while studying the activation of the premotor cortex in macaque monkeys in the perception of hand movements, unexpectedly found a system of neurons firing both when the monkeys performed an action themselves and when they watched a researcher performing the same action (e.g., di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, 1992). This has provided evidence that "in addition to action observation eliciting concurrent performance of that same action, the performance of an action influences the concurrent perception of that action" (Oberman, Winkielman, & Ramachandran, 2007: p. 167). Further, mounting empirical evidence from speech-related brain imaging studies has indicated the facilitative involvement of specific motor circuits during speech perception (e.g., Casserly & Pisoni, 2010;Gandour et al., 2007;Iacoboni, 2008;Skipper, Nusbaum, & Small, 2005)-an insight that is leading to a critical paradigm shift in our understanding of the L2 listening process, moving us from the longstanding view of the perception-to-production sequence (e.g., Derwing, Thomson, Foote, & Munro, 2012;Thomson, 2012), that sees perception and production as separate modules, to one that sees these two processes interacting and facilitating each other (e.g., Casserly & Pisoni, 2010;Cogan, Thesen, Carlson, Doyle, Devinsky, & Pesaran, 2014).
However, the contribution of production performance to listening ability has yet to be examined in an L2 context. The present study thus attempts to directly investigate the impact of L2 production performance, as measured by the accuracy and fluency of reading aloud by Japanese university-level EFL learners, on their L2 listening proficiency, with L2 grammatical knowledge as an additional explanatory variable.

Implications of Neuroimaging Studies
The growing number of neuroimaging studies focusing on the activation of brain regions while listening can provide evidence with a critical impact for our understanding of speech perception. To be more specific, the discovery of sensorimotor neurons that are active during both action execution and corresponding perception (e.g., Rizzolatti & Craighero, 2004;Rizzolatti & Sinigaglia, 2008) can yield robust empirical support for the idea that the ability to articulate the target language's sound system has a positive impact on the acquisition of L2 perception ability. Such studies have indicated that the main brain areas that have been traditionally linked to speech production (i.e., the pars opercularis and the triangularis of the inferior frontal gyrus ≈ BA 44/45 ≈ Broca's area) are also activated in speech perception (for an overview of these findings, see Price, 2012). For example, Londei et al. (2010) used functional magnetic resonance imaging (fMRI) to investigate the brain activity of participants passively listening to words, pseudo-words and reversed-played words in their L1. The results suggested that the reproducibility of an incoming speech stimulus is a critical feature in the regulation of the speech perception network. Furthermore, "learning and development of speech production might shape the mapping between sensory and motor maps that later might become useful in predicting and generating hypotheses on the incoming information" (p. 578). Indeed, Meister, Wilson, Deblieck, Wu, and Iacoboni (2007) demonstrate that disruption of the premotor cortex impairs speech perception and that activity in motor areas is causally linked to speech perception as well.
L2 sound perception, Wilson and Iacoboni (2006) used fMRI to investigate neural responses to familiar (native) and unfamiliar (non-native) phonemes among adult monolingual English speakers. Their results indicated that the motor areas play an important role in distinguishing native and non-native phonemes, and more interestingly, that the motor system engages in repeated attempts to perceive heard non-native phonemes, leading to greater, and hence likely more costly, motor activity (ibid., p. 322). Callan, Jones, Callan, and Akahane-Yamada (2004) used event-related fMRI to investigate brain activity related to perception by native Japanese speakers who were late-onset English learners (that is, who began learning it after childhood) of the English /r/-/l/ phonetic contrast, which is non-existent in the Japanese sound system. Their findings suggest that greater differential activity in brain regions involved in speech production planning (specifically, Broca's area and the sensorimotor cortex) was evidenced for the perceptual identification of /r/ and /l/ relative to that of vowels for Japanese L1 participants. Gandour et al. (2007), with a sample of Chinese-English bilinguals with late English onset and upper-intermediate ability (TOEFL score of 600 or more), provided similar findings for sentence-level prosodic features in English: an essential role played by the brain regions responsible for phonological processing and speech motor planning and execution in the performance of an auditory sentence-focus judgment task (specifically, sentence-initial vs. sentence-final position of contrastive stress). Similar findings were reported in Wang, Sereno, Jongman, and Hirsch (2003), where the learning of Mandarin lexical tone by native English speakers was associated with the emergence of new activity in Broca's area. All these findings provide robust support for the assertion that motor areas play a critical role in speech perception.
Equally important, these studies provide evidence for developmental changes in cortical representation as a function of language proficiency. To be more specific, greater activity in the relevant brain regions was evident in Japanese L1 speakers than in English L1 speakers during perceptual identification of /r/ and /l/ (Callan et al., 2004), and also when participants performed a target task in an L2 (English) as compared to an L1 (Chinese) (Gandour et al., 2007). A point of note here is that the participants in both of these studies were late-onset bilinguals. Furthermore, in Gandour et al. (2007), differential activation in relevant brain regions was evidenced for participants with lower L2 proficiency (see also Chee, Hon, Lee, & Soon, 2001;Hasegawa, Carpenter, & Just, 2002;and Xue, Dong, Jin, & Chen, 2004, for similar findings). These findings have important implications for our understanding of the effect of accurate (subvocal) articulation ability for L2 phonological features on L2 listening ability, in particular among less proficient listeners.

Research into the L2 Listening Process
Second language (L2) listening comprehension is a multifaceted, multilayered skill involving various component operations, including but not necessarily limited to phoneme-level perception, word recognition, lexical access, morphological and syntactic processing, activation of prior knowledge, and utilization of contextual information, all interacting to produce a final representation (e.g., Field, 2008;Rost, 2002;Vandergrift, 2011). There have not been many studies focusing on how and to what relative degrees these factors explain L2 listening performance. Of the few, one pivotal work is Vandergrift (2006), whose results for a group of adolescent (14 -15-year-old) English L1 speakers learning French in an L2 setting in Canada indicated that L1 listening ability and L2 proficiency together accounted for about 39% of L2 listening ability, the former explaining about 14% and the latter about 25%. Another study examining the contribution of L2 proficiency to L2 listening ability is Mecartty (2000), whose results from multiple regression analysis of college-level L1 speakers of English learning Spanish indicated that lexical knowledge but not grammatical knowledge significantly predicts listening comprehension, explaining 14% of total variance.
In general, the broad range of studies examining the impact of L1 literacy and L2 proficiency on L2 reading have had similar outcomes to those above considering L2 listening: while both L1 reading ability and L2 proficiency play important roles in successful L2 reading performance, the latter makes a greater contribution among beginning readers, and the predictive value of L1 literacy becomes greater the more advanced the learner's L2 reading level becomes (e.g., Bossers, 1991). In terms of the level of importance of specific components of reading ability for reading ability as a whole, research findings rooted in the concepts of automaticity (e.g., Segalowitz, 2003) and working memory capacity (e.g., Baddeley, 1986; suggest that once readers have reached a certain level of ability in bottom-up processing, encompassing for example orthographic, phonological, and lexical processing, more attentional resources can be allocated to top-down processing (e.g., Crosson & Lesaux, 2010;Farnia & Geva, 2011). The exact level of bottom-up processing ability needed is a function of L2 reading proficiency (e.g., Carrell, 1991), task type (e.g., Bernhardt & Kamil, 1995), task complexity (e.g., Taillefer, 1996), and L1 -L2 linguistic relationship (e.g., Koda & Zehler, 2008).
As processes of receptive comprehension, listening and reading share a significant number of component processes, such as decoding, parsing, and meaning-building (e.g., Mecartty, 2000;Vandergrift, 2006). Indeed, research findings from studies in L2 listening suggest that the level of contribution from basic L2 processing skills, such as efficiency of phoneme-level recognition and word segmentation, is much greater in beginning listeners than in their advanced counterparts (e.g., French, 2003) and that this is particularly the case in listening to concatenated speech (e.g., Goh, 2000;Graham, 2006). A range of research findings on the use of listening strategies has indicated that while skilled listeners make use of a wider range of both cognitive and metacognitive strategies, in particular of strategies to self-regulate their listening process, such as elaborating questioning and monitoring comprehension, and have higher self-awareness of their own listening problems (e.g., Goh, 2000), less skilled counterparts are likely to rely on bottom-up aspects of processing such as word-by-word lexical access (e.g., Graham, 2006;Osada, 2001) and on-line translation (e.g., Vandergrift, 2003). Findings from Gra-ham (2006), with a sample of intermediate-level, adolescent (aged 16 -18) English L1 speakers learning French reflecting on their own listening problems, indeed suggest that their listening difficulties appeared to stem from a lack of awareness of pronunciation and intonation features that are crucial for accurate French perception. Goh (2000) reported that less proficient listeners had remarkably greater difficulty in phoneme-and word-level recognition compared to more proficient listeners. These insights and those cited above collectively imply strongly that with listening as with reading, the less effortful bottom-up processing becomes, the greater the amount of attentional resources that can be allocated to top-down processing (e.g., Field, 2008).

L2 Listening Pedagogy
A range of practical teaching techniques have been proposed to develop lower-level processing efficiency in L2 listening, including focused analysis of the target script (Goh, 2002), the word-spotting task (Al-Jasser, 2008), dictation (e.g., Kiany & Shiramiry, 2002), dictogloss (e.g., Wilson, 2003), exposure to "i-1" level passage accompanying the script of the materials (Hulstijn, 2001), and various other remedial exercises. However, most of these techniques are intended to work only with repeated exposure to aural text, sometimes with varied speed control, and ultimately they mainly aim to provide learners with opportunities to "accumulate and categorize acoustic, phonemic, syllabic, morphological and lexical information" (Hulstijn, 2003: p. 422). However, the neuroimaging studies cited above provide a robust empirical basis for the effectiveness of instructional techniques aimed at developing articulatory accuracy/fluency to establish reliable productive, as opposed to receptive, phonology in the L2 (see Walter, 2008, for a similar argument), in particular for beginning L2 users whose L2 pronunciation ability is not yet well developed.
Despite the view of several reading researchers that reading aloud activities have not always been favorably considered in L2 classrooms, possibly because of "the misuse of the technique [of reading out loud] around the class" (Nation, 2009: p. 66), the usefulness of this approach for the development of lower-level processing efficiency has been widely confirmed in L2 reading research (e.g., Birch, 2007;Janzen, 2007;Gibson, 2008). In particular, reading aloud helps L2 learners establish accurate phonological representations (e.g., Gibson, 2008), encouraging their awareness of not only segmental but also suprasegmental features, such as rhythm, stress and intonation, by using connected texts rather than decontextualized vocabulary items (e.g., Kato, 2012). There has been a range of research in this regard among Japanese EFL learners, which has generally found that reading aloud significantly improves silent reading rate (Suzuki, 1998), reading performance (Miyasako, 2008), and reproduction of key words and phrases (Shichino, 2006). Miyasako (2008), for instance, investigated the contribution among upper-secondary level Japanese EFL users of six weeks of reading aloud practice for L2 reading performance; it was found that it significantly improved phonological decoding and reading comprehension performance, and that this practice effect was more pronounced with less proficient readers.
A range of relevant studies has been conducted using Elicited Imitation Test (EIT) or Elicited Oral Response (EOR) testing, with the aim of exploring the validity and reliability of these measurements as effective assessment tools particularly for L2 oral proficiency (e.g., Vinther, 2002;Cox & Davies, 2012;Wu & Ortega, 2013). These tests require the participants to listen and then repeat as exactly as possible, often after a brief interval of silence, a number of sentences with either a constant or increasing length; they show a significant effect on L2 speaking ability. Cox and Davies (2012) examined effects on speaking proficiency of several factors: EOR scores calculated by an automatic speech recognition (ASR) system and scores on listening, reading, writing, and grammar as measured by other tests, among English L2 users with various L1 backgrounds. Although their results indicated significant correlations for all the variables, the highest correlation was found between EOR performance and the listening test. However, their scores were based on the number of words repeated correctly enough to be recognized by the ASR system, not on the actual quality of productions compared with the stimuli. Nevertheless, their conclusion that "listening ability is an important component of both conversational speaking and the ability to process and repeat phrases in a second language" (p. 614) is supported; as mentioned, the reverse is also true: repetition ability can be a pre-or co-requisite of listening ability.

Participants
The participants were 33 adult native speakers of Japanese (17 women and 16 men) enrolled in undergraduate and graduate courses in the faculty of education at a university in Japan. The mean age of the participants was 21.7 years old, with a range from 20 to 24.

Procedure
The tests were carried out in two separate sessions. The listening comprehension test and grammaticality judgment test (task) were conducted in the first session; all participants simultaneously took the tests together in a dedicated language laboratory at the university. After an interval of 4 to 7 days, the reading aloud task was conducted individually in a dedicated room at the university, as a second session. The first session lasted about 30 minutes and the second session about 15 minutes for each participant.

• Listening Comprehension Test
The material used for this test was adapted from the listening section of an exercise book for the TOEIC Test (Iwamura & Smillie, 2007). The test comprised three parts, each of which contained 15 multiple-choice questions (for a total of 45 questions). The duration of this test was approximately 25 minutes, including time for instructions. The number of correct answers (out of 45) was used as data in the subsequent analyses.

• Grammaticality Judgment Task
For this test, 45 items were developed by the authors. The word count of these sentences ranged from 4 to 13, with a mixture of simple, compound, and complex sentences. The vocabulary and grammatical structures were all covered in the national English curriculum at the lower secondary level (i.e., junior high school) or the first year of the upper secondary (high school) level (respectively corresponding to 7th-9th grade and 10th grade in North American schools). The goal was to provide test sentences of varying degrees of difficulty and so to ensure sufficient discriminative power. Grammatical error types tested included number agreement, as in Kevin has three blue shirt; declension, as in Carol is cooking dinner for hers family; inflection, as in He wore his new hat yesterday and The people looked surprising when they heard her song; word order, as in I do not know what do you mean; word choice, as in This is a book what I bought in the United States; and tense, as in They have known each other for ten years when they got married. Half of the target sentences were grammatically correct.
All the test sentences were presented as a list on a sheet of A4-sized paper; participants were requested to read each sentence silently and answer whether it was grammatically correct (without giving further details on the nature of the error if any). They were requested to start from the beginning and to answer as quickly and accurately as possible within a time limit of 5 minutes for the full task. Further, they were instructed to skip sentences only when they did not have a clue about the answer, in order to avoid random guessing, and otherwise to proceed in order down the list.
• Reading Aloud Task The material was taken from New Edition Unicorn English Course 1 (Ichitani, 2011), which an English textbook adopted in the national curriculum for tenth grade. The test text was 127 words long; its Flesch-Kincaid Grade level was 8.04 and its Flesch Reading Ease index was 62.4 (Kincaid, Fishburn, Rogers, & Chissom, 1975). The topic of the text was Natsume Soseki (1867-1916), a major Japanese novelist and also a researcher in English literature. The text briefly presents an episode from Natsume's widely familiar (in Japan) time on a research fellowship in the United Kingdom from 1900 to 1902. Given this text selection, it is reasonably unlikely that the language level or topic of the text led to significant variation in comprehension accuracy among the participants, since all of them had completed the national secondary curriculum for the subjects of English (covering the vocabulary and grammar used in the text) and Japanese (covering the content). The text is presented in Appendix.
The overall testing procedure and assessment scheme were adopted from Shimizu (2009). First, the participants were requested to read the text silently for 1 minute, and then to read it aloud at a normal speed. They were asked to perform an expressive reading aloud that could make the content comprehensible to an imagined audience. The rating was conducted in terms of the three criteria mentioned above: segmental features, supra segmental features, and phrasing. The criterion of segmental features focused on the correctness of individual phonemes. The criterion of supra segmental features focused on the sound features such as intonation, rhythm and sentence-level stress. The criterion of phrasing focused on the correctness of segmenting, or chunking, within each sentence in accordance with meaning. The rating scale for these criteria is presented in Table 1, adapted from Shimizu (2009, p. 183). A full score for each criterion was 5 points, and thus a total of 15 points were possible for each reading aloud performance.  All performances were audio-recorded. Each was scored by two raters, one a native speaker of English with an MA degree in education and the other a native speaker of Japanese with a PhD in applied linguistics; each rater had more than 10 years' experience teaching English language teaching (ELT) courses at a Japanese university. The findings of Shimizu (2009) suggest that the use of two experienced raters with a rating scale featuring three criteria provides the best reliability and cost efficiency for the measurement of reading aloud test results (p. 189). The raters in the current study first scored the recorded performances of each participant individually. They then got together to check all the performances to see if there were any differences between their scores of 2 points or more in any criterion, and held further discussion to reduce these gaps to 1 point or less. The final scores used for the subsequent analyses were calculated by adding the scores of each rater, and hence ranged from 6 to 30.
To sum up, the present study attempts to examine the explanatory power of two independent variables, namely, reading aloud performance and L2 grammatical knowledge, on L2 listening ability as a dependent variable. Following Shimizu (2009), reading aloud performance was measured in terms of three categories: individual sound-production accuracy (henceforth, segmental features), supra segmental sound-production fluency (henceforth, supra segmental features), and chunking (or phrasing) accuracy (henceforth, phrasing). Grammatical knowledge was measured on the basis of a grammaticality judgment task. The specific research questions for this study were set as follows: 1) How are L2 reading aloud performance, as measured in terms of three components-the production performance of segmental features, supra segmental features, and phrasing-and L2 grammatical knowledge, as measured by the grammaticality judgment task, related to L2 listening performance?
2) Do the relationships found in the first research question, if any, reveal any change as a function of L2 listening proficiency?

Results
Descriptive statistics for all measures for the participants as a whole appear in Table 2. Figures 1-3 present boxplots of the listening comprehension, reading aloud, and grammaticality judgment scores respectively. Boxplots are used instead of barplots on the basis of the insight of Larson-Hall and Herrington (2010) that the former graphics provide richer information about the dispersion and skewness of scores and the existence of outliers (p. 370). Bold lines in the boxes indicate medians, and the top and bottom end of each plot (Larson-Hall and Herrington call them "whiskers") show the minimum and maximum scores of the distribution. The ends of the box show the 25th and 75th percentiles of scores, or the interquartile range (IQR); beyond 1.5 times the IQR above or below the box are outliers (Larson-Hall & Herrington, 2010: p. 371). As shown in the boxplot for the reading aloud task, there are two outliers on the reading aloud measure, one of whom is also an outlier on the listening comprehension test and the other on the grammaticality judgment test. Therefore, all data for these two participants was excluded from the subsequent analyses.     Figure 4 shows multiple scatter plots between all the variables, with a loess curve. Since the sample size of the current study is under 50, a Shapiro-Wilk and not a Kolmogorov-Smirnov goodness-of-fit test was conducted to confirm the normality of the distribution. The results revealed normal distribution for listening, segmental features, and grammaticality; however, phrasing scores did not indicate a normal distribution (Statistic = .910, df = 31, p < .05) and the scores for supra segmental features were marginal (Statistic = .936, df = 31, sig. = .082). As seen in the corresponding scatterplots in Figure 4, this indicated that these two measures were negatively skewed, leading the null hypothesis (that the sample is normally distributed) to be rejected (e.g., Bachman, 2004). To solve this problem, log-transformation was performed on the scores from all three components of the reading aloud measure, including the scores for segmental features along with supra segmental and phrasing scores in order to obtain consistent data across the three components. Only transformed values were used in the subsequent analyses (e.g., LaFrance & Gottardo, 2005).
After the data transformation, Pearson product-moment correlation coefficients were conducted to examine interrelationships between all the measures, including components of total reading aloud scores. Table 3 shows the results.
Next, hierarchical regression analysis was conducted to examine the predictive value of the four explanatory variables (L2 grammaticality judgment scores and the three components of reading aloud performance) for L2 listening comprehension performance as a dependent variable. In the first model (Model 1), grammaticality scores were entered first to partial out the shared variance of this variable. The order of entry of the three components in the following three models (Models 2 to 4) was based on the hierarchical level of the phonological unit in focus; that is, the component of segmental features, focusing on the individual phonemes, is at the lowest level; next comes the component of supra segmental features, covering rhythm, stress, and intonation; and, finally, the phrasing component, covering segmentation/chunking and sentential stress. Cook's distance and Mahalanobis distance values were checked to see if there were any influential outliers in this particular analysis, but the results suggested that there were not: the mean value for Cook's distance was .075 and that for Mahalanobis distance was 3.871 (see, e.g., Field, 2005). Variance Inflation Factor (VIF) was also checked to see if there was a possibility of multicollinearity, but in spite of the high level of intercorrelations among the explanatory variables, the result yielded did not indicate any concerns for the problem at hand: the highest value was 4.186, for fluency, from the fourth model of the regression analysis (cf. Larson-Hall, 2010;Heiberger & Holland, 2004). Table 4 shows the results of the analysis. The first regression model found L2 grammatical knowledge to be a significant predictor of L2 listening ability, accounting for 40% of the variance: β = .632, t (29) = 4.396, p < .001. The second model showed that while the predictive value of L2 grammatical knowledge turned out to be marginally significant, β = .191, t (2, 28) = 1.798, sig. = .083, the component of segmental features accounted for an additional 39% of the variance in L2 listening, β = .764, t (2, 28) = 7.187, p < .001. The third model indicated that the component of supra segmental features accounted for an additional 8% of the variance, β = .502, t (3, 27) = 4.076, p < .001, and the fourth model showed that the phrasing component accounted for an additional 5% of the variance: β = .429, t (4, 26) = 0.3.688, p < .01.
In order to investigate the second research question-how the relationship between the explanatory variables reading aloud performance (three components plus their total scores) and grammaticality judgment scores and the dependent variable L2 listening ability changes as a result of proficiency in L2 listening-the participants were divided into two groups (proficient and less proficient listeners) on the basis of their scores on the listening   comprehension test. The proficient listeners' group consisted of 15 participants with test scores of 30 and above, and the less proficient listeners' group consisted of 16 participants whose scores were 29 points and below. Table 5 presents mean scores and standard deviations for each variable by group. Pearson product-moment correlation coefficients were conducted to examine how each of the five types of data was correlated with listening comprehension performance; Table 6 and Table 7 show the results. A remarkable contrast was found between the two groups; that is, while most of the explanatory variables, excepting only phrasing, revealed significant correlations with L2 listening ability among less proficient listeners, none indicated a significant correlation among proficient listeners.

Discussion
In order to answer the first research question, the intercorrelations between explanatory and dependent variables were first investigated among the participants as a whole. The results revealed that all the explanatory variables were significantly correlated to L2 listening ability. In particular, reading aloud performance, which was calcu-   lated on the basis of the total scores for all three component measures, was found to be correlated to L2 listening, with an exceptionally high level of significance (r = .95, p < .001). The corresponding scatterplots in Figure 2, that is, (LIS, RA) and (RA, LIS), show that most of the markers are lined up in an orderly fashion on the regression line, indicating a nearly perfect linear fit between the two variables. Especially when one considers that there has been only a limited amount of previous literature examining the direct relationship between these variables, and that therefore there are only very few sources for comparison, this remarkably high correlation serves as a robust support for the idea of a close link between L2 speech production and perception. A hierarchical regression analysis was then conducted to examine the degree to which L2 listening ability is explained by L2 grammatical knowledge and by each of the three reading aloud components. The results revealed that L2 grammatical knowledge explained 40% of the variance of L2 listening ability, and that after it was accounted for, each of the three components made significant contributions as well, jointly explaining 52% of the total variance. This strong connection between L2 speech production ability and L2 listening (e.g., Walter, 2008;Wilson & Iacoboni, 2006) yields solid evidence that accurate and fluent L2 production ability contributes critically to L2 speech perception, particularly in light of the growing amount of empirical findings supporting a (partial) causal relationship between production and perception, for example, those showing impaired perception performance from disrupted motor activity (e.g., Meister et al., 2007), positive impact of non-word repetition practice on auditory activation (e.g., Rauschecker, Pringle, & Watkins, 2008), or performance improvement of phonological coding as a result of production practice (e.g., Miyasako, 2008), as well as from the wide range of studies suggesting that articulatory-to-auditory/-orosensory forward mapping (in terms of the level of cortical activations of the relevant brain regions) is a function of L2 proficiency (e.g., Callan et al., 2004;Gandour et al., 2007).
However, there are a few novel points in the current results. One is related to the sheer predictive value of L2 grammatical knowledge, accounting for 40% of the total variance, in explaining L2 listening. This is much higher than the results obtained by previous studies, such as Vandergrift (2006), where L2 proficiency explained 25% of L2 listening comprehension performance, or Mecartty (2000), where it did not turn out to be a significant predictor for L2 listening. However, in prior L2 reading studies, findings on the shared variance between L2 knowledge and L2 reading have been comparable to those in the current study. For instance, the results of Bernhardt and Kamil (1995) showed that L2 proficiency accounted for 38% of variance, and Lee and Schallert (1997) also found it accounting for approximately 40%. Although a closer look into the relationship between L2 listening and reading is beyond the scope of the present study and research questions as they were conceived, this contrast could perhaps come from factors such as differences in the linguistic profiles of the participants: those in Vandergrift (2006) were learning French in an L2 setting in Canada, and some of them had been enrolled in French immersion programs or even had francophone parents, which may have helped reduce the impact of L2 grammatical knowledge compared to that seen in the current participants. Another candidate to explain the pronounced contribution of L2 grammatical knowledge in this study is the relationship between the participants' L2 proficiency in general and their score on the listening comprehension test in particular. As mentioned in the literature review section of this paper, the impact of general L2 proficiency on L2 reading ability is relative to factors such as the complexity of a reading text, and this pattern can certainly be applied to the present case: that is, the difficulty level of the listening materials used in this study (the practice TOEIC texts) may have been high enough to strain the participants' L2 proficiency. A third possible factor is related to the differences between specific L1-L2 relationships: whereas the participants in Vandergrift (2006) and Mecartty (2000) were L1 English speakers learning French or Spanish as an L2, those in the current study were L1 speakers of Japanese learning English, which leads us to speculate that greater L1-L2 distance may have contributed to the greater impact of L2 proficiency in this case (e.g., Jarvis & Pavlenko, 2007;Koda, 2008). Indeed, the participants in the above-mentioned reading studies were either beginner level (in Bernhardt and Kamil's study) or had an L1-L2 distance much like the present participants (Lee and Schallert's study, which involved L1 Korean speakers learning English). These arguments are all relevant to the analysis of the second research question, and will be returned to later in this section.
After the predictive value of L2 grammatical knowledge, the second point of note in the findings, and one more crucially related to the aim of this study, is the finding that the predictive value of L2 grammatical knowledge became non-significant, though still marginal (sig. = .083), when the production performance of segmental features was entered into the regression. This can be explained by a range of empirical evidence from cognitive psychology suggesting that the efficiency of encoding and maintenance of phonological information bears a causal relationship to overall language acquisition (e.g., Baddeley, Gathercole, & Papagno, 1998), on the assumption that accurate phonological representations certainly entail phonological awareness. In particular, a range of robust L2 research evidence has indicated a significant contribution of phonological awareness and phonological memory to the acquisition of vocabulary (e.g., Ellis & Sinclair, 1996;Service, 1992) and grammar (e.g., Robinson, 1997;Williams & Lavatt, 2005). These empirical findings support the assumption that the predictive power of pronunciation accuracy is already been reflected in the predictive value assigned to L2 grammatical knowledge in regression Model 1. The previous studies observing the impact of L2 proficiency on L2 listening performance could have provided a different picture if L2 pronunciation accuracy had been included as an explanatory variable; the current study serves as a precursor in this sense.
The third point of note is the magnitude of the predictive value of the production performance on individual speech sounds, which accounted for as much as 39% of the variance, underscoring the impact of accurate production of segmental features on L2 listening ability. Further speculation is required to explain this finding: for instance, are orthographic and phonological processing skills, which are presumably reflected in accurate and fluent production ability, involved here? Remember that production performance was measured using a reading aloud task, which seems to inevitably involve orthographic and phonological processing (knowledge of L2 grapheme-phoneme correspondences.) A body of empirical evidence has suggested that (visual-dependent) acquisition of L2 orthography does modify L2 phonological awareness (e.g., Morais, Bertelson, Cary, & Alegria, 1986) and therefore yields significant impact on on-line L2 speech processing (e.g., Dehaene et al., 2010;Nation & Hulme, 2011). This in turn indicates an effect of L2 orthographic experience on the development of L2 phonological awareness, and therefore, L2 speech perception, implying that efficient orthographic-phonological processing based on accurate letter-sound mapping is at least partially reflected in phoneme-level pronunciation accuracy, which may have amplified the predictiveness of production performance on segmental features for L2 listening.
Detailed consideration from a crosslinguistic perspective may provide further support for a crucial role played by the lower-level processing skills in the remarkable impact of the current component on L2 listening. The development of L2 speech production is closely linked to the acquisition of L2 phonological awareness (e.g., Price, 2012), and crosslinguistic studies focusing on the impact of transfer of the L1 sound system in the acquisition of the L2 speech perception have demonstrated that the development of L2 phonological awareness is crucially related to the overlap between pronunciation features in the listener's L1 and L2 (e.g., Jarvis & Pavlenko, 2007). Some studies report that participants had difficulty perceiving the distinction between sound pairs that were phonemic (contrastive) in the L2 but not the L1 (e.g., Aoyama, 2003;Escudero & Boersma, 2004). Japanese, the L1 of the current study's participants, has a mora-based rhythm with a consonant-vowel (CV) open syllable as the canonical syllable, while that of English is stress-timed and features a wide variety of both open syllables, including CV and V, and closed syllables, such as CVC, CCVC, and CCVCC (closed syllable) (see Laver, 1994;Ohata, 2004, for a detailed review). At the level of segmental features, while Japanese distinguishes only 5 vowels, (Standard US) English identifies roughly 15 of them including diphthongs (e.g., Mannell, Cox, & Harrington, 2009. Other dialects may recognize more: Roach, 2004, for instance, identifies 11 monophthongs and 19 diphthongs in Received Pronunciation); and with regard to consonants, English has a considerably wider variety of fricatives and affricatives than Japanese. It is reasonable to assume that the distinctions between these sounds are difficult for Japanese L1 learners to perceive. Although several other factors likely affect crosslinguistic transfer (e.g., phonetic environment, universal phonological constraints), L1 -L2 distance in terms of (supra-) segmental sound system may yield the most significant impact on the efficiency of learning and development of L2 phonological representations (e.g., Hancin-Bhatt & Bhatt, 1997), which may be a crucial explanatory factor for the pronounced predictive value of the production performance of the individual speech sounds found in the current study.
Returning to the regression analysis, each of the other two components-the production performance of suprasegmental features and phrasing-was also found to be a significant predictor of L2 listening ability, though less so than that of segmental features, with the former explaining an additional 8% of shared variance, and the latter a further 5%. This finding indicates that the suprasegmental production efficiency, including intonation, sentence-level stress, and rhythm, and the accurate segmenting/phrasing in production significantly contributed to L2 listening. However, also noteworthy is that when these components were added to the model, the level of predictiveness of L2 grammatical knowledge for L2 listening became non-significant (i.e., sig. =.465, for fluency [Model 3] and sig. = .740, for segmentation [Model 4]). Therefore, it seems reasonable to assume that, since both of these two components (the production performance of suprasegmental features and phrasing) tap both semantic and syntactic processing (e.g., Nuttall, 2005), a large part of the explanatory value of both components, as well as the segmental feature component, had been incorporated in the L2 grammatical knowledge variable. That is, the predictive power that initially seemed to be explained by L2 knowledge proved in fact to be accounted for, step by step, by the three components as each was entered in the regression.
Regarding the second research question, that is, how the respective contributions from the explanatory variables will change depending on L2 listening proficiency, the result revealed a distinctive contrast between the two participants (which were defined by their L2 listening comprehension performance). Specifically, while highly significant intercorrelations were maintained for all variables among less proficient listeners (see Table  6), more than half these correlations (8 out of 15) lost significance among the proficient listeners (see Table 7). Even when significance was maintained, it was at lower levels, except for listening and reading aloud. Perhaps the most important point here is that while L2 grammatical knowledge was highly significantly correlated with all other variables in the less proficient group, all these intercorrelations completely disappeared in the proficient group. (This contrast can be observed in the rightmost and bottom scatterplots in Figure 2, all of which visualize similar relationships between grammatical knowledge and the other variables: for each variable, the loess curves become almost flat among proficient performers). The finding that the correlations between listening and reading aloud as well as the segmental and suprasegmental feature components remained highly significant among proficient listeners indicates that these variables remain closely linked even as listening proficiency improves, suggesting that pronunciation accuracy and fluency still play an important role in explaining L2 listening ability among fairly advanced listeners. The non-significance of the correlations between L2 grammatical knowledge and listening and reading aloud performance among proficient listeners may possibly be explained by the linguistic threshold for L2 listening/reading ability (e.g., Vandergrift, 2006, for listening;e.g., Bossers, 1991, for reading). That is, once L2 listeners/readers have attained a certain level of L2 knowledge, the predictive powers of these correlations may no longer be significant. The finding of Mecartty (2000) wherein L2 grammatical knowledge was not a significant predictor of L2 listening ability is also compatible with this argument. Mecartty indeed argues that the reason for the non-significant contribution of L2 grammar knowledge to L2 listening in her study could have been that her participants were college students, with enough listening proficiency to mask the contribution of L2 knowledge (p. 337).
Thus, the overall predictiveness of L2 knowledge for L2 listening (explaining as much as 40% of the total variance) originated from the highly correlated relationship between these variables observed among less proficient listeners. Therefore, the argument that factors such as L1-L2 distance or the interaction between L2 proficiency and the difficulty of listening material may have amplified the impact of L2 grammatical knowledge can only apply to the less proficient group; in contrast, the proficient participants may have reached a level where the impact of L2 knowledge is negligible. However, it is important to note that the three components of reading aloud are still significantly correlated with listening ability, suggesting that pronunciation accuracy and fluency still matter for this group. Additional support for this argument comes from the finding that the significance of the phrasing component becomes lower (.64 * , p < .05) than the other two components (.68 ** , p < .01, for segmental features and .67 ** , p < .01, for suprasegmental features) among proficient listeners, indicating that phrasing may be more closely linked to grammatical knowledge than the other two components.
The present study attempted to examine the relationship between reading aloud (RA) performance, grammatical knowledge, and listening ability in a sample of 31 college-level Japanese EFL learners. The results demonstrated highly significant intercorrelations between all the explanatory variables-grammatical knowledge and the three components of reading aloud-and listening ability. The subsequent regression analyses showed remarkably high contributions to listening from all the variables, with production accuracy of segmental features showing an especially pronounced impact. Supplementary analyses dividing the participants by listening ability demonstrated a remarkable contrast: while on the one hand, a significant correlation was maintained between grammatical knowledge and listening among less proficient listeners but disappeared among more proficient listeners; however, on the other hand, a significant correlation between RA and listening performance was maintained in both groups, indicating that production accuracy/fluency still plays an important role in L2 listening at an advanced level.

Conclusion
Newborns start to smile at their caregivers around one month after birth (e.g., Bertin & Striano, 2006). Initially, their smiles are just an innate motoric imitation of maternal smiles; only later do they become associated with a feeling of pleasure through face-to-face interaction over mutual contingent smiles with their caregivers (though exactly when that transition occurs has yet to be clarified; e.g., Wörmann, Holodynski, Kärtner, & Keller, 2012). This "motor imitation first" paradigm can also be observed in verbal behaviors of newborns, as evidenced by (inter alia) Mampe, Friederici, Christophe, & Wermke (2009), who found that newborns' cry melody was already influenced by native language intonation patterns in the first week of life, and Chen, Striano, & Rakoczy (2004), who found that newborns of 1-7 days old were able to perform mouth movements corresponding to both vowel and consonant vocal models, both of which suggest closely linked innate auditory-articulatory mapping.
In both L1 and L2 research, explanations of causality in language acquisition have traditionally been based on the mostly tacit consensus that "listening is the basis for overall […] proficiency" (e.g., Krashen & Terrell, 1983;Feyten, 1991). L2 studies investigating the impact of perceptual training on production suggest that improvement in L2 listening perception can indeed serve as a "pre-requisite or co-requisite to making connections to meaning and use" (Derwing et al., 2012: p. 252) and thus to more accurate productions (e.g., Derwing et al., 2012;Thomson, 2012). However, few of these studies have referred to the possible involvement of subvocal repetition or rehearsal during or after perception instruction, which could by no means have been ruled out as a factor unless, for instance, the training had been conducted under a condition of irrelevant background speech noises, which has been evidenced to impair verbal recall performance by interfering with the maintenance of auditorily presented target stimuli (e.g., Salamé & Baddeley, 1982;Kato, 2009a); or under a condition of articulatory suppression, which requires repeated articulation of irrelevant speech sounds (such as the words blah or the) during a serial memory task, preventing the participants from rehearsing items to be recalled in memory (e.g., Levy, 1971;Kato, 2009b). The present study thus attempted to investigate whether the paradigm of "production facilitates perception" holds water in the context of the acquisition of L2 listening ability by investigating how L2 reading aloud performance, along with L2 grammatical knowledge, is related to L2 listening ability.
The first correlational analyses of the participants as a whole and the subsequent multiple regression analyses aimed to examine the first research question. The results of the correlational analyses demonstrated highly significant intercorrelations between all the explanatory variables-L2 grammatical knowledge and three components of reading aloud as well as the total reading aloud score-and L2 listening ability. The regression analyses in fact showed remarkably high contributions from all the explanatory variables to L2 listening, with production performance of individual speech sounds showing an especially pronounced impact. Based on previous findings on the relationship between phonological awareness and overall L2 acquisition (e.g., Service, 1992;Williams & Lavatt, 2005), it was also suggested that pronunciation accuracy of segmental features, along with that of suprasegmental features and phrasing (the other two components), partly explained the role of L2 grammatical knowledge in total variance of L2 listening.
The second correlational analysis, with the participants divided into two groups on the basis of L2 listening test scores, demonstrated a remarkable contrast in terms of the level of correlation between L2 grammatical knowledge and L2 listening ability. That is, while high significance was maintained between these variables among less proficient listeners, the relationship was non-significant among the (more) proficient listeners, suggesting that the latter group's L2 knowledge had reached a level where it no longer affected L2 listening performance, whereas that of less proficient listeners still made a significant contribution to L2 listening ability. In contrast, significant correlations were maintained among proficient listeners between the three components of reading aloud and listening ability, as was the highly significant relationship between reading aloud total score and listening ability, indicated that pronunciation accuracy and fluency played an important role in L2 listening ability for both groups.
These findings provide robust support for the argument that the establishment of pronunciation accuracy/ fluency is crucial for the development of listening ability and that this impact of production ability may linger to a fairly advanced stage of L2 listening learning, in particular as a function of factors such as participants' L1 -L2 relationship and the relationship between their L2 proficiency and the familiarity and difficulty of listening materials. Moreover, the differences in significance between the three components of reading aloud mean that the acquisition of pronunciation accuracy seems paramount to successful development of L2 listening ability. This supposition fits well with the concept of the "bottom-up primacy" of successful listening conditions proposed in an L1 context by Marslen- Wilson (1989), where accurate individual sound perception played a critical role in narrowing down potential candidate words in the mental lexicon while listening. Wilson (2003) also argued for an important role of bottom-up processing in L2 listening, mentioning that although utilizing compen-satory top-down strategies is widely encouraged across all proficiency levels of L2 learners, the ultimate goal of the development of L2 listening ability is to rely less on contextual guesswork and more on hearing what was actually said, further indicating that "better bottom-up processing ought to lead in turn to better top-down processing, and that teaching should reflect this" (p. 342). The proposal of "bottom-up primacy" by Marslen- Wilson (1989) is compatible with the theoretical framework presented by Robinson (2005), which assumed a hierarchical relationship, using concentric circles, between changes in the relative contribution of various aptitude factors to the development of overall L2 acquisition. Robinson placed ability factors such as processing speed, phonological working memory capacity, phonological working memory speed, and rote memory in the inner circle of the diagram, meaning that they make an initial contribution to L2 acquisition, after which groups of peripheral factors such as task aptitude and pragmatic/interactional abilities/traits may affect overall development (p. 52). Hulstijn (2011) proposed a definition of language proficiency (LP) in which the phonetic-phonological, morphophonological, morphosyntactic, and lexical domains form the core components of LP, whereas areas of a less linguistic or non-linguistic nature, such as strategic or metacognitive abilities, constitute peripheral components (p. 242). These arguments are in line with the idea of an "acquisition hierarchy" in L2 listening development that is adopted in the present study, whereby the acquisition of pronunciation accuracy should come first as a basis for the development of higher-level processing ability.
The findings of the current study have important implications regarding the usefulness of reading aloud activities in the development of L2 listening. Kato (2012) provided a range of reading aloud activities, such as choral/unison reading, in which learners read a text aloud, together, at the same speed, following a model reading; individual reading, in which learners read the text aloud at their own speed; and other variations such as overlapping or synchronized reading, which requires that learners read the text aloud as they listen to the model reading and try to keep up as accurately as possible. The shadow talk technique, in which learners "speak out" a text without access to a script, again following a model reading, has also been reported to be useful in establishing an L2 sound repertoire (e.g., Mochizuki, 2004).
Future research should adopt a longitudinal design in order to focus on the cause-effect relationship between long-term reading aloud practice and development of L2 listening ability, possibly with a larger amount of participants than the present study and certainly with more varied L2 proficiency and L2 learning contexts. Research to examine the long-term effect of practice can also incorporate more refined test designs so as to look at the relationship between programs for training in particular (supra-) segmental articulations and the improvement of perception ability.
Given the high levels of intercorrelation found in the present study between reading aloud performance, grammatical knowledge, and listening ability, as well as the previously evidenced strong link between reading aloud performance and reading ability (e.g., Miyasako, 2008), future research should also explore the potential of the reading aloud technique as an effective and practical assessment tool to measure overall L2 proficiency, especially for the purpose of screening and placement in various L2 research/educational settings. Various types of assessment tools are used for the measurement of L2 proficiency in the research arena of SLA, from the major testing systems (e.g., for English, IELTS, TOEFL, and TOEIC), testing tools in particular linguistic fields (such as the Peabody Picture Vocabulary Test and Nation's Vocabulary Level Test), and more general testing techniques such as the cloze test and C-test (see Hulstijn, 2012, for a comprehensive review of the measurement of language proficiency). Elicited Imitation Test (EIT) and Elicited Oral Response (EOR) testing have also been proposed as effective assessment tools, particularly for L2 oral proficiency (e.g., Vinther, 2002;Cox & Davies, 2012;Wu & Ortega, 2013). In contrast to EIT and EOR, which examine correctness of repetitions of the target sentences (on a single four-point correctness scale in the former test, and by the number of automatically recognized words in the latter), the scoring procedure for reading aloud performance in the current study is more finegrained, tapping three different components (the segmental feature component, to focus on individual sounds; the suprasegmental feature component, for rhythm, sentential stress, and intonation; and the phrasing component), which have been shown to reflect L2 grammatical knowledge as well as the accuracy and fluency of segmental and suprasegmental features. Furthermore, the reading aloud processing obligatorily involves basic component skills such as phonological and orthographic processing skills, which have been empirically shown to play important roles in reading development (e.g., Kato, 2009b; see also Nassaji, 2014, for what is currently the latest comprehensive review). These features indicate the great potential of the reading aloud technique as an effective, viable, and practical assessment tool to measure overall L2 proficiency covering listening, reading, and speaking, as well as underlying L2 grammatical knowledge. Future extensions of this research with further refined designs and more varied population profiles in more varied educational settings are needed to further articulate the tool's possible utility in a way that is valid and reliable.
To sum up, the present study attempted to examine a twofold research question: 1) How are L2 reading aloud performance, as measured in terms of three components-the production performance of segmental features, suprasegmental features, and phrasing-and L2 grammatical knowledge, as measured by the grammaticality judgment task, related to L2 listening performance? and 2) Do the relationships found in the first research question, if any, reveal any change as a function of L2 listening proficiency? Regarding the first research question, the results of the correlational analyses demonstrated highly significant intercorrelations between all the explanatory variables-grammatical knowledge and three components of reading aloud-and listening ability. The subsequent regression analyses showed remarkably high contributions from all the explanatory variables to listening ability, with production accuracy of segmental features showing an especially pronounced impact. The second correlational analysis, with the participants divided into two groups on the basis of listening test scores, was carried out to examine the second research question, and the result demonstrated a remarkable contrast. That is, while a significant correlation was maintained between grammatical knowledge and listening among less proficient listeners, it disappeared among more proficient listeners. In contrast, a significant correlation between RA and listening performance was maintained in both groups, indicating that production accuracy/fluency still plays an important role in L2 listening at an advanced level. These findings provide robust support for the argument that the establishment of pronunciation accuracy/fluency is crucial for the development of listening ability, even at a fairly advanced stage of L2 listening learning.