A Corpus Study on “Begin”/“Start” in Academic Writing: A VARBRUL Approach


The present study aims to investigate the factors affecting the choice of aspectual verbs “begin” or “start”, the factors affecting the choice of their complements, and the possible relation between the two choices in English Academic Writing. With the VARBRUL approach, a model has been found to account for these questions. The choice of “begin” or “start” is affected by the preceding syntactic tags, the choice of the complement is thus controlled by the choice of the aspectual verb, and both the two choices are affected by a social factor. In addition, the model of the VARBRUL analysis can be further applied to Natural Language Processing and English Academic Writing Teaching or Assistance, so a VARBRUL analysis is thus very economic. Other findings show the comparison of the model prediction among Logistic Regression, Decision-Tree-based Logistic Regression and Neural Network with linguistic data.

Share and Cite:

Ruan, J. (2014) A Corpus Study on “Begin”/“Start” in Academic Writing: A VARBRUL Approach. Open Journal of Modern Linguistics, 4, 260-274. doi: 10.4236/ojml.2014.42021.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Biber, D., Conrad, S., & Reppen, R. (1998). Corpus Linguistics: Investigating Language Structure and Use. New York: Cambridge University Press.
[2] Biber, D., Johansson, S., Leech, G., Conrad, S., & Fineqan, E. (1999). Longman Grammar of Spoken and Written English. Harlow, Essex: Pearson Education.
[3] Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Belmont, CA: Chapman and Hall.
[4] Dixon, R. M. W. (2005). A Semantic Approach to English Grammar (2nd ed.). Oxford: Oxford University Press.
[5] Eddington, D. (2010). A Comparison of Two Tools for Analyzing Linguistic Data: Logistic Regression and Decision Trees. Italian Journal of Linguistics, 22, 265-286.
[6] Egan, T. (2008). Non-Finite Complementation: A Usage-Based Study of Infinitive and -ing Clauses in English. Amsterdam: Rodopi.
[7] Freed, A. F. (1979). The Semantics of English Aspectual Complementation. Dordrecht: Springer.
[8] Gawlik, O. (2012). On the Complementation of Start, Begin and Continue in Spoken Academic American English. Token: A Journal of English Linguistics, 1, 159-170.
[9] Gries, S. T., & Hilpert, M. (2010). Modeling Diachronic Change in the Third Person Singular: A Multifactorial, Verband Author-Specific Exploratory Approach. English Language and Linguistics, 14, 293-320.
[10] Inkpen, D. (2007). A Statistical Model for Near-Synonym Choice. ACM Transactions on Speech and Language Processing, 4, 2.1-2.17.
[11] Johansson, S. (1978). Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Oslo: Department of English, University of Oslo.
[12] Johnson, D. E. (2009). Getting off the Goldvarb Standard: Introducing Rbrul for Mixed-Effects Variable Rule Analysis. Language and Linguistics Compass, 3, 359-383.
[13] Kilgarriff, A., & Tugwell, D. (2001). WORD SKETCH: Extraction and Display of Significant Collocations for Lexicography. In Proceedings of the ACL Workshop on COLLOCATION: Computational Extraction, Analysis and Exploitation (pp. 32-38). Toulouse: Association for Computational Linguistics.
[14] Labov, W. (1969). Contraction, Deletion, and the Inherent Variability of the English Copula. Language, 45, 715-762.
[15] Leitner, G. (1994). Begin and Start in British, American and India English. Hermes, Journal of Linguistics, 13, 99-122.
[16] Leitner, G. (1993). Where to Begin or Start? Aspectual Verbs in Dictionaries. In M. Hoey (Eds.), Data, Description, Discourse: Papers on the English Language in Honour of John McH Sinclair on His Sixtieth Birthday (pp. 50-63). London: HarperCollins.
[17] Lin, Y., Michel, J., Aiden, E. L., Orwant, J., Brockman, W., & Petrov, S. (2012). Syntactic Annotations for the Google Books Ngram Corpus. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, (pp. 169-174). Jeju Island, Korea: Association for Computational Linguistics.
[18] Mair, C. (2003). Gerundial Complements after Begin and Start: Grammatical and Sociolinguistic Factors, and How They Work against Each Other. In G. Rohdenburg &, B. Mondorf (Eds.), Determinants of Grammatical Variation in English (pp. 329-346). Berlin, New York: Mouton de Gruyter.
[19] Mair, C. (2009). Infinitival and Gerundial Complements. In P. Peters, P. Collins, & A. Smith (Eds.), Comparative Studies in Australian and New Zealand English: Grammar and Beyond (pp. 263-276). Amsterdam: John Benjamins.
[20] Professional English Research Consortium (n.d.). PERC Corpus.
[21] R Core Team (2013). R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing.
[22] Reiter, E., & Sripada, S. (2004). Contextual Influences on Near-Synonym Choice. Proceedings of the International Natural Language Generation Conference, Brockenhurst, 14-16 July 2004, 161-170.
[23] Rousseau, P., & Sankoff, D. (1978). Advances in Variable Rule Methodology. In D. Sankoff (Eds.), Linguistic Variation: Models and Methods (pp. 57-69). New York: Academic Press.
[24] Saito Sigley, H. (1999). Dependence and Interaction in Frequency Data Analysis in SLA Research. Studies in Second Language Acquisition, 21, 453-475.
[25] Sigley, R. (2003). The Importance of Interaction Effects. Language Variation and Change, 15, 227-253.
[26] Szmrecsanyi, B. (2006). Morphosyntactic Persistence in Spoken English: A Corpus Study at the Intersection of Variationist Sociolinguistics, Psycholinguistics, and Discourse Analysis. Berlin, New York: Mouton de Gruyter.
[27] Therneau, T., Atkinson, B., & Ripley, B. (2013). Rpart: Recursive Partitioning. R Package Version 4.1-3.
[28] Young, R., & Yandell, B. (1999). Top-Down versus Bottom up Analysis. Studies in Second Language Acquisition, 21, 477488.

Copyright © 2022 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.