Application of Form-Focused Instruction in English Pronunciation : Examples from Mandarin Learners

Training L2 learners’ pronunciation by using a controlled perception procedure has long been the mainstream of L2 speech pedagogy research. However, endeavors have also been done to explore more communicative teaching methods. The current study presents a paradigm that tests both communicative teaching methods’ renderings of pronunciation pedagogy and a form-focused instruction which is less used in pronunciation pedagogy. The authors introduced a focus-on-form method to pronunciation teaching by giving participants information about the speech articulators to help five Mandarin-speaking students to improve the accuracy in production of English /r/. For the sake of contrasting, another group of five students received communicative training for the same amount of time. Stimuli words for both groups in both pre-test and post-test were embedded in a discourse for participants to read aloud. Productions were recorded and went through both acoustic analysis and native speaker perception for the measurement of nativeness. Results showed that the focus-on form method is more effective at least in the presented participants to improve segmental pronunciation performance.


Introduction
Pedagogical researchers and practitioners have been searching throughout the past century for an effective method to tackle the "most difficult and persistent part of language learning", pronunciation.The discussion of pronunciation teaching has come through days of direct method before the reform movement, and developed into behavioral (e.g., audiolingualism, Lado, 1964;total physical response, Asher & Price, 1967;Asher, 1969) and more cognitive approaches (e.g., the silent way, Gattegno, 2010;desuggestopedia, Lozanov & Gateva, 1988).Both methods stressed pronunciation to be taught by imitation and pattern drills, such as minimal pairs and tape recordinglistening.Overall, the teaching of pronunciation was seen as itself, as a purely psychological factor that requires either behavioral stimulus-response cycle or a cognitive-development process which requires a journey of stages (Stevick, 1976).
However, the rise of Communicative Language Teaching (CLT) has the belief that teaching pronunciation should include more meaningful contexts (Celce-Murcia, 1996).CLT was also referred to as "meaning-focused" instruction, which is later challenged by "form-focused" (or Focus on Form, FonF) instruction because the latter has re-emphasized on the formal and linguistic aspect of language learning (Long, 1998).The debate has been intensive in areas involving reading and speaking classrooms but very few studies have empirically tested these two methods in a classroom of pronunciation.
According to the Speech Learning Model (SLM, Flege, 1995), L2 sounds that is similar to L1 categories are difficult to be acquired because their differences will be easily overlooked and the two categories will be classified equivalently.The /r/ sound is presented in the English phonological inventory; and /r/ is at the same time an allophone of the "ri mu", or "r" onset traditionally transcribed as /ʐ/ in Mandarin.This onset has two allophonic variations with an English-like /r/ and a retroflex fricative (Shi, 2012).This distribution of one phoneme corresponding to an allophonic variation would make the sound difficult to be categorized because of the L1 influence (Flege, 1987).
This study compares the teaching outcome of both CLT and FonF instructions on Mandarin ESL learners' pronunciation performance of English /r/, especially in cluster, and intends to draw a conclusion that the FonF instruction is more targeted to students and thus more desirable in the pronunciation classroom.

Two Teaching Methods: A Review of Literature Pronunciation Learning in CLT
CLT was a very influential teaching method (or more exactly, an approach of teaching) that emphasizes on the importance of involving real communication in language teaching.In its supporters' point of view, only involving communication in teaching would give students authentic input, as the function of language is communicative (Widdowson, 1978).The discussion of CLT had been very popular in pronunciation education.It has been explicitly explained as follows, representatively.
"This focus on language as communication brings renewed urgency to the teaching of pronunciation, since both empirical and anecdotal evidence indicates that there is a threshold level of pronunciation for nonnative speakers of English" (Hinofotis & Bailey, 1980).
The importance of communicative intelligibility was brought about by Derwing and Munro (2005).His work confirmed that learning can only be effective when communication takes place.
Many other studies echoed this idea and it had become an important message CLT delivered on L2 pronunciation teaching.Morley (1994) had pointed out that heavy cognitive loads in independent speech tasks would lead to more pronunciation errors and hence it is necessary to incorporate guided practice and spontaneous speech together in an integrated curriculum.In his curriculum guidelines for instructional planning of pronunciation courses, a practice mode that moves from dependent practice through guided practice to independent practice was introduced.The last one was represented by an extemporaneous speech, which gave students full freedom to generate their own meaningful speech.
The communicability of pronunciation practices should be included in the curriculum and more authentic materials, as well as native speakers as teachers should be utilized as part of the communicative logistics of the classroom as a part of its function (Celce-Murcia, 1996) to approximate the real-time situation in language contact.
The core of CLT is creating a genuine context which involves the negotiation of information (Widdowson, 1978).However, it seems to be only useful for information processed as long-term memory that has discrete functions, such as the diverse semantic and pragmatic functions of a certain expression.For lower-end elements, such as L2 phonology, the "function" is not defined in the meaningful context inherently and is more easily interfered by L1 than lexical or syntactic constraints (Lado, 1961).Moreover, in a UG perspective of L2 acquisition, phonology may also become fossilized more easily in development (Krashen & Terrell, 1988).
The predecessors of CLT might have already noticed the limitation of CLT in guiding the teaching practices in the pronunciation classroom.Guidelines of CLT-based pronunciation curriculum has always been a blend of both formal and meaningful tasks, as Celce-Murcia (1996) has identified, the major activities in a communicative pronunciation classroom should include "listen and imitate, phonetic training, minimal pair drills, contextualized minimal pairs, visual aids, development approximation drills, reading aloud or recitation" are just echoing back methods used in the behavioral and cognitive times.The innovation lies in putting the drills in a real-time communication framework, such as contextualized minimal pair practice, to stimulate students' pronunciation learning through meaningful contexts.
Interestingly, Foote et al. (2013) recently found, through a corpus-based study, that communicative language teaching could be more effective in pronunciation when real-time contextualized feedback is given to students.Learning was proved to be in effect when spontaneous recast was given to students.This is actually a core concept in form-focused instruction.It may seem that this method, though less discussed in the context of pronunciation learning, might be of some merits.

Pronunciation Learning in Focus on Form
The primitives of speech is still linguistic, and we all resort to linguistic form of speech sounds to get the meaning across, no matter the primitive is acoustic (Kuhl, 2000) or gestural (Fowler, 1986).
The method of FonF was born under this consideration.It is argued that classroom feedback of recast (Long, 1998;Doughty & Williams, 1998).FonF instruction is a method of L2 instruction focusing both linguistic forms and communication.It is an alternative to focus on meaning instruction of the school of Natural Approach (Krashen & Terrell, 1988), which prohibits direct grammar teaching and promotes natural input of L2 texts and listening materials only.The defect of meaning-focused instruction becomes clear "when classroom second language learning is entirely experiential and meaning-focused" (Doughty & Williams, 1998).
FonF will enable students to contemplate after absorption of language knowledge, which is a precious process of selflearning after class, and will also foster interaction between the instructor and students, because the discussions of linguistic forms is much inclusive, interesting and inspiring than mere imitation.The FonF method has been widely used in linguistic forms other than phonetic acquisition in L2 English teaching.For example, in syntax, instead of solely presenting sentences for students to pattern after, instructors often make use of generative grammar to introduce abstract terms like NP, VP and clauses.But there are no previous empirical studies that test whether focus on form can be applied in pronunciation instruction by experiment.That is why this study is proposed and I think this study will serve as a ground-breaker in broadening this method's application in Greater China, which is of some significance.
However, the biggest problem of the method is to avoid excessive theoretical dogmatization in instruction.Language learners are not expected to, nor interested in (in most cases) of the complex anatomical, phonetic and phonological aspects of the speech sounds.Simply giving them form-focused linguistic information will do harm to their motivation and increase anxiety, which will in turn affect learning outcomes (MacIntyre & Gardner, 2006).

Method
Two groups of participants, namely the experiment group and control group, were given 15 minutes of CLT and FonF instructions about the English /r/ cluster respectively.Before and after the instruction, a pre-test and a post-test were done for both groups.Their task is to read a short discourse with stimuli embedded inside.Their productions were examined by both acoustic measuring and native speaker perception.

Participant
Participants were ten adults enrolled as students of Master of Arts in Translation program at the department of Chinese, Translation and Linguistics at City University of Hong Kong (6 females, 4 males, mean age = 25.5).They had over 20 years of learning English and started English learning before the age of 6.They used English as their working language and in their daily communication.They were all raised up from Beijing, with their parents or care-takers speaking monolingual Beijing Mandarin.None of them had exposure to other foreign languages except English.All participants were right-handed with no reported hearing or motor-control defects.They did not have prior exposure to phonetic or musical training.For controlling, two native monolingual English speakers (1 female and 1 male, mean age = 26.5)from California, US also participated in the study and went through the same procedure in the pre-test and post-test.

Instruction
Two Instruction methods are carried out as below.The two types of instruction were both given in the same classroom with same amount of time (15 minutes).For form-focused instruction, the instruction time was more approximate because of difficulty to control the exact time in this non-syllabus-based method.

•
CLT Instruction For communicative language teaching, the plan follows the major activities brought about by Celce-Murcia (1996) as much as possible to be applied to the instruction of /r/.
Students first read aloud the material to be tested in the study for five minutes.Teacher will give students guidance and students will do role play to take turns to read as well.After that, contextual minimal pair exercise will be done for students for five minutes.Minimal pairs should consider /r/-/w/ contrast, /r/-deletion contrast and /r/-/l/ contrast, which is found in many studies (Hung, 2002;Chan, 2006;Au, 2002).First, a listening exercise will be conducted by playing native English speakers' production of sentences like "I would like to buy the toy ring" and "I would like to buy the toy wing"; "Could you help us watch the load?" and "Could you help us watch the road?" • FonF instruction For form-focused information, recasting were used when students have problem of pronouncing the /r/ because recasting rather than deliberate syllabus-based learning is the core to form-focused pedagogy.However, a detailed general guideline is listed below to serve as a summary of how formal instruction may be effectively given when students are in need.
Firstly, the knowledge of basic theory of articulatory phonetics could help students whose pronunciation has already been partly fossilized to improve pronunciation.These students may have problems to manage right sounds because they don't have the knowledge of exact variables to control a sound and articulate it.If students depict problems of articulator control of the /r/ sound, here is a sample recast for the student.Firstly the teacher will give out a description of /r/ sound: /r/ is an approximant, and there are two variations for initial /r/: curled /r/ and bunched /r/ by different varieties of English speakers.It contains three major gestures: tip-curling or raising (Tongue Tip), tongue body retracting (Tongue Body) and lip-rounding, and teacher will demonstrate with his own voice and the computer-aided demonstration as well (Browman & Goldstein, 1992).Contrast with gestures of /l/ (Tongue Tip and Tongue Body but contact alveolar ridge) and /w/ (more Lip protrusion and no Tongue Tip gesture) will also be demonstrated.
Secondly, the aid of visual phonetics can make theories easier to understand.After all, the whole system of phonetics maybe too informative to learn by heart easily and the textual introduction seems tedious and scattered.A pioneering example was a Spanish pronunciation learner's helper created by Barrutia (1970) into our types of instruction activities, which can be easily adapted, catering to English pronunciation learners.Here, to help students understand the abstract movements in the vocal tract which is not easily seen, an fMRI mid-sagittal display with highlighted places of gestures will be given to students.To further provide visual aid, hand gestures can be given to students.Different from previous groups using arbitrary symbols, we use hand gesture to imitate tongue positions.It might be more iconic and easier for student to recall after they find confusion again in later production (see Figure 1).
Basic knowledge of syllable structure will enable students to improve pronunciation with a more targeted awareness for errors concerning the L2 syllable.For example, mainland students intend to add a schwa sound in between initial consonantr clusters.It will influence the /r/ production in the cluster, partly because of lack of knowledge in the difference of syllable structures (the syllable of Mandarin being [(C)V(C)] and English being

Stimuli for Tests
Identical stimuli words containing initial clusters were used for both Mandarin and English speakers.Stimuli consisted of three types of words: target words and two types of control words.Target words were real words containing initial C-r clusters (e.g., break, treat, and great).All stimuli words were guaranteed of high word-frequency by choosing them from a primary school word list (Graham et al., 1993).Word frequency was considered to ensure that all participants read the stimuli fluently and easily.
Target words were measured for acoustic reliability for the physical property of /r/ according to four variables: (a) The third formant value (F3).This is a measurement used in Lehiste (1962).
(b) The distance between the F3 and the second formant (F2).This is also a measurement used to distinguish /r/ with other approximants (Ladefoged & Johnson, 2010).
(c) Duration of the r, which can be used to see whether /r/ has been fully articulated.
(d) Native speaker's perception, in the procedure of 0 and 1 force-choice of whether the sound is native-like, was done as well.
Filler words are chosen to contrast with the target words in two ways: For contrasting C-r cluster with non-[r] environment, words with single stop initial such as bit, tip, and keen are chosen; for contrasting C-r cluster with an environment without onset plosives and to see if cluster [r] is assimilated to other approximants, words with single liquid or glide initial such as rear, win, and lean are chosen.These words were not included in the data analysis.
All stimuli were put into a coherent short passage (see Appendix) to ensure that all participants read the stimuli fluently and naturally in conversational style, without any problem in spelling out the words.Some originally included stimuli tokens are deleted because the vowels are articulated significantly wrong.e.g., pronouncing "grind" as [grInd].
Originally we designed a set of stimuli consisting of six tokens per place category with distinction on vowels.But it is extremely hard to control the vowels and at the same time keep the word frequency high, so we did not control strictly about the vowel distribution.As a result the number of tokens is altogether 55 for each speaker in the pre-test and post-test.So in total there are 55*10 = 550 tokens for Mandarin participants and 55*5 = 275 analyzed.

Procedure for Tests
All participants did the recordings in the Research Laboratory for Phonetic s and Cognitive Studies at City University of Hong Kong.They were recorded at the sampling frequency of 44100Hz in mono channel.Participants were instructed to read the passage three times and for each stimulus, the steadiest word in three repetitions was picked for analysis.
Speakers were asked to sit in front of the recording apparatus and read the stimuli with chair height adjusted to their ease of reading.Microphone was placed with a 10 cm, 45 degree distance away from the speaker's mouth.The stimuli sheet, both pages in a paper stand, was placed in front of the speakers' eye and subject to the speakers' adjustment.The font size was 24.Speakers have the chance to try and familiarize with the equipment.The researcher left the room while recording was in progress.Duration data was not collected for native English speech due to technical reasons, and will be compared in the groupwise results in the next section.

Difference by Groups
The comparison of pre-test and post-test in control and experiment groups will be presented below.

General Discussion
The results of acoustic difference by Mandarin and English speakers in an overall manner had showed that Mandarin and English speech of /r/ do have significant differences acoustically.The difference in individual pre-test or post-test also echoes back the same significance level.
The acoustic comparison of control (CLT instruction) and experiment (FonF instruction) groups showed that the acoustic improvement patterns are different between the control group and the experiment group.In the control group, only the improvement of duration is significant.F3-F2 is near-significant as well.However, we should note that the F3 indicator for /r/ quality is the most important one, and the significance of F3-F2 without a significance of F3 only indicates the change of F2, which corresponds to more global tongue positions in the vocal tract, and is not a desirable result for improvement of /r/.Therefore, the spectral aspect did not improve much for the control group.Probably it is because in a communicative instruction environment, students are more sensitive to prosodic factors, and hence temporal cues of speech sounds rather than spectral cues.This has been introduced by Celce-Murcia (1996) in the main effect of CLT application in pronunciation.In fact, supra-segmental features of pronunciation are more emphasized in CLT-based syllabi (Morley, 1994).
On the contrary, the experiment group had significant improvement in all three aspects, indicating a full-ranged improvement on the pronunciation.This is probably due to the ability for form-focused instruction to give direct and comprehensive recasts immediately after an occurrence of a pronunciation error (Long, 1998) irrespective it being spectral or temporal.Moreover, all errors are not responded just by repetition of the sound stimuli which can only have influence on the sensory store, but also by linguistic knowledge and articulatory mechanisms which could be restored in the learner's long term memory or to be rote-learned and work as self-generated feedbacks when similar errors occur again.Therefore, we could include FonF as a useful method in teaching speech sounds.Moreover, the perception data by native English speaker across the pre-test and post-test had improved more significantly in the experiment group, than that in the control group.Therefore both physically and perceptually, FonF method would introduce more improvements in this case.
From the results we could see clearly that the effect of teaching in FonF method is better than CLT.However, a cautious explanation of this result would be concerned on the limitation of the procedure.
As has been touched upon in the literature review, the FonF method is inherently a highly spontaneous teaching method independent from a pre-designed syllabus.The effect of which is highly dependent on the experience and competence of the instructor.Therefore, the effect of both methods should also be examined by different instructors.However, due to lack of resources and difficulty of controlling the variant of teaching performance, especially the effect of recast in FonF which may vary on the specific situations, only one instructor, the researcher, had participated in the instruction.

Conclusion
This study has examined, from both the perspective of acoustic quality and perceptual native-likeness, the effect of two teaching methods of pronunciation of /r/ in cluster.At least in the current case, we could arrive at a conclusion that pronunciation teaching can be better assisted by form-focused learning rather than a communication context because of its ability to capture both temporal and spectral errors in speech.

Figure 1 .
Figure 1.Hand gestures imitating tongue position by the instructor and also to be performed by students: the gesture of /r/ is on the left and /l/ on the right.The fingers represent tongue tip and the palm represents tongue dorsum.

Figure 2 .
Figure 2. Overall comparison of F3 and F3-F2 values (in Hz) in English and Mandarin speakers' production (irrespective of groups and tests).An F3 value which is lower than 2000 Hz indicates a canonical /r/ production.The Blue bar represents F3 and The green bar, F3-F2.

Figure 3 .
Figure 3.Comparison of pre-test and post-test's F3, F3-F2 (in Hz) and duration (in ms.) values in Mandarin experiment group's production.An F3 value which is lower than 2000 Hz indicates a canonical /r/ production.The Blue bar represents F3 and The green bar, F3-F2.

Figure 4 .
Figure 4. Comparison of pre-test and post-test's F3, F3-F2 (in Hz) and duration (in ms.) values in Mandarin control group's production.An F3 value which is lower than 2000 Hz indicates a canonical /r/ production.The Blue bar represents F3 and The green bar, F3-F2.

Figure 5 .
Figure 5.Comparison of native speaker perceived accuracy rate in the control group's pre-test and post-test (two bars on the left) and the experiment group's pre-test and post-test (two bars on the right), respectively.