Analysis of Lexical Bundles in Academic Writing among Expert Writer and Student Writer ()
1. Introduction
English is widely used in the world. Most countries use English as the official language of television programming, and almost half of all emails are written in English. As a common language, English is of great importance to the communication among countries. Therefore, English remains the dominant language globally. Academic publication is not only served as the convention of country’s knowledge building and academic achievement, but also the important criterion of teaching and research in universities, and whether individuals can pursue further education. In terms of international journals, most of the papers were written by Western scholars. Although China channels resources and significant funds to academic writing research and development, it remains challenging to catch up with western countries in academic publication in international journals.
Academic English is essential for the learners who want to publish articles in foreign journals. Learners may encounter many difficulties in publishing article abroad if they do not have adequate knowledge about academic English. For instance, Chinese scholars have to follow Western standards and use English as the language of writing if they want to achieve their academic career-related goals. For academic English writing, on the one hand, scholars need to search for papers suitable for their research topics in the database and organize their writing ideas. On the other hand, they are required to translate texts from their mother tongue to English, which is a prerequisite for them to employ precise academic vocabulary to express their views. This convention is challenging not only for Chinese scholars, but also scholars from all non-native English-speaking countries.
Academic English is often used at the postgraduate or doctoral level. Gao Jun & Yang, (2018) investigated the structure of four-word lexical bundles and the similarities and differences between the academic texts of science and engineering graduate students and native English speakers. Si Yuanyuan (2020) conducted a comparative analysis of the use of four-word lexical bundles in academic papers by Chinese English master’s students and native English writers. In addition, most researchers are concerned more about the use of lexical bundles in Chinese student’s academic English writing.
Pan Fan (2016) compares the difference between the structure and function of four-word lexical bundles used by native English speakers and Chinese English learners in mechanical journals. From both structural and functional perspectives, Li Mengxiao & Liu, (2016) explored the similarities and differences of four-word lexical bundles used in empirical journal papers of applied linguistics by Chinese scholars and native speakers. Hu Yuanjiang et al. (2020) analyzed the structural and functional features of three-word lexical bundles used by Chinese English learners and native speakers. These studies show that non-native speakers do not have sufficient knowledge and ability in using lexical bundles, with issues such as overuse, underuse or misuse of certain types of lexical bundles or their variants.
However, seldom of research had been done among expert writer and student writer in western countries, this was indeed the gap that researchers can investigate more. The purpose of this study is to explore the usage of lexical bundles among expert writer and student writer in western countries to provide insights on improving students’ academic English writing abilities. Hopefully, the findings will contribute to the understanding of lexical bundles features and provide enlightenment relevant to teaching academic article writing.
2. Literature Review
Academic English (Bailey, 2007; Scarcella, 2003) is often referred to in the literature as “the language of education”, “the language of school/schooling” (Schleppegrell, 2001), or “scientific language”, it mainly refers to the language for the purpose of acquiring new knowledge and skills or disseminating new information in an educational or academic environment (Chamot & O’Malley, 1994: p. 40).
In academic English writing, the most prominent usage feature is the use of nouns, especially noun phrases with back-and-forth modifiers (Chen & Wen, 2020). Hyland (2006) pointed out that the most obvious feature of academic style is the high degree of formality, which is mainly realized through a large number of nominalization expressions and impersonal structures, which were often manifested through lexical bundles.
Lexical bundles have been found to be of considerable significance in academic texts. It is not only means of expressing solidarity with fellow members of the community (Cortes, 2006) but also indicators of credible communal voices (Pang, 2010). However, it is difficult for many Chinese English learners to use lexical bundles proficiently as native speakers. A large number of domestic writing studies have found that English learners in China have not fully mastered the usage of lexical bundles, and they tend to overuse, underuse and misuse. Learners’ excessive use of lexical bundles reflects the traces of L1 transfer. In terms of structure of lexical bundles, Wen Qiufang et al. (2003) found that most Chinese students use “we” and “you” as the subjects of sentences in their writing, which is significantly different from that of English native speakers. In addition, students used too much modal verb collocation in their articles to express their views, and almost all core modal verbs were involved in this process, which made the articles with strong emotional color (Wang & Zhang, 2006). In addition, from the functional perspective, Zhao Xiaolin and Wei Naixing (2010) found that most of the lexical bundles were used by students to express explicit stance, and (Zhao, 2009) students prefer to use few specific lexical bundles in the use of stance adverbs.
Learners’ insufficient use, in terms of structure of lexical bundles, is mainly manifested in fewer types, verb collocation types and formal expressions in written language. Ma Guanghui (2009) analyzed the results of students’ time-limited writing in second language and found that the output weaknesses of lexical bundles include verb bundles containing past tense, noun and prepositional phrase bundles, and bundles containing appositives or attributive clauses. Through their research on the active structure guided by I, Wang Lifei and Zhang Yan (2006) found that native speakers use a wide variety of verbs with I, while Chinese students rely heavily on a few limited verbs. In terms of function, compared with native speakers, Sun Li (2020) found that the overall frequency of identity, evaluator identity and interlocutor identity constructed by Chinese postgraduate students in their papers is significantly lower than that of international scholars, which is mainly embodied in fewer use of judgement markers, appreciation markers, self-mentions and code glosses, but a significantly more proportion of frame marker and endophoric marker. In addition, Xu Fang (2011) also finds that insufficient lexical bundles that learners use to highlight the author’s identity, and use few implicit expression means (Zhao & Wei, 2010), while students (Zhao, 2009) also lack of using stance adverbs in their article.
Lexical bundles have also been reported to vary across disciplines. Cortes (2004), through the comparison of expressions in history and biology, found that while the former employed only noun and prepositional phrases, the latter made use of a wide range of bundle structures. Another important difference between these two disciplines was the higher frequency of bundles with a hedging and mitigating function in biology texts. Hyland (2008) also examined the use of Lexical bundles in four university disciplines and showed that the texts of electrical engineering used the highest number of bundles with business, applied linguistics, and biology following it. While electrical engineering and biology used a wide range of passive bundles, the two soft knowledge fields used more prepositional bundles. In terms of function, while the two hard science fields used more research-oriented bundles, business and applied linguistics made wider use of text-oriented and stance bundles.
As briefly reviewed above, the majority of the studies have focused on the use of lexical bundles among Chinese students across different disciplines. We aim at finding how differently lexical bundles are used by expert writers and student writers abroad. Therefore, this study is guided by the following questions:
1) How are lexical bundles be used in academic English writing among expert writer and student writer in terms of structure and function?
2) What are the usage similarities and differences in academic English writing among expert writer and student writer?
3) What are the most frequently used lexical bundles in academic English writing among expert writer and student writer?
3. Methods
3.1. Corpora
Based on corpus research methodology, this present study was implemented in May 2023, for there are many real and typical corpora which can be analyzed through corpus software.
Based on the searching key words of “linguistics” and “language”, we compiled two self-built corpora. The first corpora is the student corpora, which is a total of 40 papers downloaded from British Academic Written English Corpus (BAWE). Another corpora was a control corpora of 20 articles in AntCorGen. The corpus in this study were all selected randomly and the way of selection, to some extent, can greatly avoid the occasional results of data analysis.
3.2. Lexical Bundles
3.2.1. Identification
The definition of lexical bundles varies in different literature. Cortes (2004) defined lexical bundles are extended collocations, sequences of three or more words that statistically co-occur in a register, or chunks of language of varying lengths with remarkable formal, functional, and statistical attributes (Saadatara, Kiany, & Talebzadeh, 2023).
From the perspective of structure, most four-word lexical bundles contains three-word lexical bundles (Biber et al., 2004; Cortes 2004). In addition, four-word lexical bundles are used more than five-word and six-word (Biber et al., 2004), for its complete structure and meaning (Biber et al., 1999). Based on these investigations, therefore, four-word lexical bundles are chosen for the present analysis.
To the definition of overuse. As Chen (2006) and Lei (2012) pointed out, there is no precise definition of overuse and the approach used by previous studies was to compare the “frequency figures to determine the overall usage pattern”. Since the identification of overuse is somewhat arbitrary, the approach used in the present study from a quantitative perspective. That is, the difference of 10 between the frequency of occurrence per million words of the student corpus and that of the control corpus.
3.2.2. Structural and Functional Classification
Based on previous (Biber et al.,1999) research, lexical bundles have been generally identified into two registers: academic prose and conversation. The majority of lexical bundles were divided into 14 and 12 categories based on their grammatical structures in we mentioned above the two main registers. The structural classification in two registers is shown in the following table:
Lexical Bundles in Conversation |
Lexical Bundles in Academic Prose |
1) personal pronoun + verb phrase |
1) noun phrase with of phrase fragments |
2) pronoun/noun phrase + be + … |
2) noun phrase with other post modifier fragment |
3) active verb phrase |
3) prepositional phrase with embedded of-phrase fragment |
4) yes-no question fragments |
5) wh-question fragments |
4) other prepositional phase fragment |
6) lexical bundles with wh-clauses |
5) anticipatory it + verb phrase/adjective phrase |
7) lexical bundles with to-clauses |
6) passive verb + prepositional phrase fragment |
8) verb + that-clause fragments |
7) copulas be + noun phrase/ adjective phrase |
9) adverbial clause fragments |
8) (verb phrase+) that-clause fragment |
10) noun phrase fragments |
9) (verb/adjective+) to-clause fragment |
11) prepositional phrase fragments |
10) adverbial clause fragments |
12) quantifier expressions |
11) pronoun/noun phrase + be (+…) |
13) other expressions |
12) other expressions |
14) meaningless sound bundles |
|
In terms of functional classification of lexical bundles, Cortes’ (2004) divided these lexical bundles into three categories which were showed as following.
Referential bundles were mainly used as time or place markers, or to introduce quantities or amounts.
Text organizers indicates comparison/contrast or inference, or, in the case of framing bundles, identifying textual conditions used to interpret prior or forthcoming discourse.
Stance bundles are a category of expressions which convey a certain degree of probability/lack of probability and they are mainly used as hedges, introducing a degree of tentativeness to what is being reported.
3.3. Corpus Extraction
The analysis of collected data will be processed through corpus software (MAT and AntConc). Then the results of data are presented in tables. To compile the two corpus, we used whole texts and removed author and publication information, table of contents, acknowledgement and dedications, tables, figures, footnotes, endnotes, references, and appendices.
In the study by Xu Fang (2012), the student corpus comprises 2.7 million tokens, while the expert corpus is nearly 2 million tokens. The extracted lexical bundles must appear 20 times per million words and be present in 5 different texts. Compared to previous research, the corpus size in this study is smaller, necessitating adjustments to the lexical bundles extraction criteria. Specifically, the extracted lexical bundles need to appear at least 5 times and in 3 different texts to ensure the significance of the data results.
The specific analysis steps are as follows:
1) Use the corpus keyword analysis software MAT to tag the texts and AntConc (Version 3.5.7) to extract all four-word lexical bundles in two corpora and clean.
2) Based on the previous extraction standard (Biber et al., 2004; Xu, 2012) for high-frequency lexical bundles, the selected method of this paper is that the lexical bundles which appear 5 or more times per million words and across 3 different texts tend to be extracted.
3) According to Cortes’ (2004) classification criteria of structure and function, four-word lexical bundles in the corpus were classified by researcher.
4) Based on the structural and functional classification of lexical bundles, this paper conducts a comparative analysis of characteristics of high-frequency used lexical bundles between expert writer and student writer.
4. Results and Discussion
As shown in Table 1, the student corpus used in this study is from BAWE corpus with a total of 40 articles and a total of 84,785 tokens, while the expert corpus is from AntCorGen corpus with a total of 20 articles and a total of 114,311 tokens. All corpora were searched with the keyword “language” and obtained by random sampling.
Table 1. Lexical bundles statistics of the corpora.
|
Article Number |
Tokens |
Students |
40 |
84,785 |
Expert |
20 |
114,311 |
4.1. Structural and Functional Characteristics of High-Frequency Lexical Bundles in Expert Corpora
According to statistics, there are 49 academic four-word lexical bundles which appear more than 5 times and across 3 different texts in expert corpus. The structure is dominated by prepositional phrases and other expressions, and most of these lexical bundles were functioned as Text organizers.
According to the structural classification of academic lexical bundles by Cortes (2004), 49 high-frequency lexical bundles in applied linguistics can be divided into 3 categories. As shown in Table 2, prepositional phrases and other expressions, which contain “(noun phrase/pronoun/adjective) + verb + (complement)” lexical bundle, “it + be verb + adjective + (clause fragment)” lexical bundle, “(modality) be verb + noun-complement/shape bundle” and verb-centered phrases, account for 36.8% and 32.6%. Finally, noun phrases account for 30.6% of the total.
Then based on the functional classification of lexical bundles (Cortes, 2004). As shown in Table 3, among the high-frequency lexical bundles used by expert writers, the frequency of functional usage characteristic is: referential bundles (40.8%), text organizers (48.9%) which contains four functions: contrast/comparison, inference, focus and framing and stance bundles (10.3%).
Table 2. Structural classification of the two corpora.
|
Student |
Expert |
Noun phrases |
11 |
26.1% |
15 |
30.6% |
Prepositional phrases |
16 |
38.1% |
18 |
36.8% |
Other expressions |
15 |
35.8% |
16 |
32.6% |
Table 3. Functional classification of the two corpora.
|
Student |
Expert |
Referential bundles |
18 |
42.8% |
20 |
40.8% |
Text organizers |
11 |
26.2% |
24 |
48.9% |
Stance bundles |
13 |
31.0% |
5 |
10.3% |
4.2. Structural and Functional Characteristics of High-Frequency Lexical Bundles in Student Corpora
Through comparative analysis between two corpora, we identified that the total number of extracted lexical bundles of student writers is 42, which is less than lexical bundles used by expert writers. As shown in Table 2, the proportion of noun phrases is 26.1%, while prepositional phrases and other expressions account for 38.1% and 35.8%.
In terms of structure, the proportion of difference in prepositional phrases used among student writers and expert writers is much similar (1.3%). The most obvious difference between two groups is the usage of noun expressions (4.5%), and range of noun phrases used by student are less than expert writers.
In terms of function, underuse and overuse in student corpora are more prominent. In student corpora, as shown in Table 3, lexical bundles which functioned as text organizer are insufficient, which about 22.7% less than expert writers. This indicates that the student writers unable to fully highlight the key points of the content in their paper, and they rely heavily on only some lexical bundles to draw readers’ attention to content on which they intends to elaborate.
In addition, the use of stance bundles is relatively overused, indicating that there are too many subjective expressions in students’ academic English writing, which leads to the lower objectivity of the articles.
Compared with text organizer and stance bundles, lexical bundles which functioned as referential expressions are similar, the difference between two corpora is 2%. It shows that both student writers and expert writers emphasis much on limiting conditions in the paper and they want to give a clearer explanation of the content and register to make it easier for readers to understand.
4.3. Frequency Analysis of Student Writers and Expert Writers
Compared the frequency and common used lexical bundles in two corpora, researcher found that the overall frequency of phrases used by students is insufficient than that of expert writers and 5 common used lexical bundles were found,which are “can be found in”, “in the case of”, “it is important to”, “on the other hand” and “the fact that the” which is, in the top 10 most frequently used lexical bundles, the only one that employed by expert writers and student writers, as shown in Table 4 and Table 5.
Table 4. Frequency of lexical bundles of expert corpora.
Rank |
Lexical Bundles |
Frequency |
Range |
1 |
with respect to the |
21 |
9 |
2 |
are more likely to |
19 |
3 |
3 |
the total number of |
18 |
3 |
4 |
as a function of |
17 |
9 |
5 |
in the case of |
16 |
7 |
6 |
it is important to |
13 |
7 |
7 |
is the number of |
12 |
4 |
8 |
the fact that the |
12 |
7 |
9 |
the size of the |
10 |
4 |
10 |
a function of the |
9 |
7 |
Table 5. Frequency of lexical bundles of student corpora.
Rank |
Lexical Bundles |
Frequency |
Range |
1 |
to be able to |
20 |
12 |
2 |
on the other hand |
18 |
11 |
3 |
as a result of |
16 |
9 |
4 |
the way in which |
15 |
7 |
5 |
it is possible to |
13 |
7 |
6 |
the use of the |
12 |
4 |
7 |
at the beginning of |
9 |
5 |
8 |
of the target language |
9 |
3 |
9 |
the fact that the |
9 |
8 |
10 |
the meaning of the |
9 |
7 |
These results suggests that the lexical bundles used by students were seldom employed by experts. The possible reasons for the situations are as follows: first of all, students do not have sufficient vocabulary storage like expert writers. Secondly, students are unable to classify academic English and general English clearly. Finally, students are not familiar with the functional characteristics of lexical bundles and can not use them flexibly in paper. Therefore, student writers choose some phrases that they familiar with to organize paper, which is probably also the reason that why students rely on some specific lexical bundles in text organizer.
5. Conclusion
Through this present study, we found that there are differences in usage of lexical bundles between student writers and expert writers. In terms of structure, the proportion of prepositional phrases used is similar. Relatively, the obvious difference is the use of noun expressions. In terms of function, the results show a significant difference between two corpora, especially in the usage of text organizer
Additionally, students overuse stance lexical bundles. It can be seen that western students have a certain gap in their mastery of academic English writing, and the excessive use of subjective expressions by students leads to the objectivity of their texts being questioned. They need to further improve their understanding of lexical bundles.
However, after comparing the differences in the use of lexical bundles between western students and domestic students (Xu, 2012) with experts respectively, it was found that western students perform better in selecting and using lexical bundles than domestic students. This may be due to the influence of interlanguage and the teaching environment. Teachers should strive to find similarities between the two languages in their teaching, leveraging the positive influence of the mother tongue to promote student learning.
There is still a gap in academic English writing education between our domestic students and western standards. It is recommended to incorporate lexical bundle teaching into classroom instruction, providing examples and explanations of lexical bundles to help students understand which lexical bundles are appropriate and how to apply them in suitable contexts. This will enhance students’ ability to structure their writing and choose appropriate words.
The limitation of this study is that the selected range of student corpus and expert corpus is relatively narrow. Therefore, researchers should focus more on the selection of corpora in future research to overcome this limitation. Additionally, this study is limited to the field of English. In today’s era of information integration and globalization, researchers should pay more attention to academic English research in other disciplines and interdisciplinary fields.
Through the study of corpora from abroad, this paper also provides some enlightenment for the class adjustment of academic English writing teaching model in China. For example, researchers should investigate more on the teaching method that enables students to arrange the structure of their academic English writing like expert writers and use lexical bundles reasonably. This could be a future research direction which can be explored more.