Is This a Paraphrase? What Kind? Paraphrase Boundaries and Typology


A precise and commonly accepted definition of paraphrasing does not exist. This is one of the reasons that have prevented computational linguistics from a real success when dealing with this phenomenon in its systems and applications. With the aim of helping to overcome this difficulty, in this article, new insights on paraphrase characterization are provided. We first overview what has been said on paraphrasing from linguistics and the new lights shed on the phenomenon from computational linguistics. Under the light of the shortcomings observed, the paraphrase phenomenon is studied from two different perspectives. On the one hand, insights on paraphrase boundaries are set out analyzing paraphrase borderline cases and the interaction of paraphrasing with related linguistic phenomena. On the other hand, a new paraphrase typology is presented. It goes beyond a simple list of types and is embedded in a linguistically-based hierarchical structure. This typology has been empirically validated through corpus annotation and its application in the plagiarism-detection domain.

Share and Cite:

Vila, M. , Martí, M. & Rodríguez, H. (2014). Is This a Paraphrase? What Kind? Paraphrase Boundaries and Typology. Open Journal of Modern Linguistics, 4, 205-218. doi: 10.4236/ojml.2014.41016.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] Androutsopoulos, I., & Malakasiotis, P. (2010). A Survey of Paraphrasing and Textual Entailment Methods. Journal of Artificial Intelligence Research, 38(1), 135-187.
[2] Bagha, K. N. (2011). Generative Semantics. English Language Teaching, 4(3), 223-231.
[3] Bannard, C., & Callison-Burch, C. (2005). Paraphrasing with Bilingual Parallel Corpora. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), Ann Arbor (MI), 597-604.
[4] Baroni, M., & Lenci, A. (2010). Distributional Memory: A General Framework for Corpus-Based Semantics. Computational Linguistics, 36(4), 673-721.
[5] Barreiro, A. (2008). Make It Simple with Paraphrases. Automated Paraphrasing for Authoring Aids and Machine Translation. Ph.D. Thesis, Porto: Universidade do Porto.
[6] Barrón-Cedeno, A., Vila, M., Martí, M. A., & Rosso, P. (2013). Plagiarism Meets Paraphrasing: Insights for the Next Generation in Automatic Plagiarism Detection. Computational Linguistics, 39(4), 917-947.
[7] Barzilay, R. (2003). Information Fusion for Multidocument Summarization: Paraphrasing and Generation. Ph.D. Thesis, New York: Columbia University.
[8] Barzilay, R., & McKeown, K. (2001). Extracting Paraphrases from a Parallel Corpus. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), Toulouse, 50-57.
[9] Barzilay, R., McKeown, K., & Elhadad, M. (1999). Information Fusion in the Context of Multi-Document Summarization. Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics (ACL 1999), College Park (MD), 550-557.
[10] Bhagat, R. (2009). Learning Paraphrases from Text. Ph.D. Thesis, Los Angeles: University of Southern California.
[11] Boonthum, C. (2004). iSTART: Paraphase Recognition. Proceedings of the ACL 2004 Student Research Workshop, Barcelona, 31-36.
[12] Boyer, M., & Lapalme, G. (1985). Generating Paraphrases from Meaning-Text Semantic Networks. Computational Intelligence, 1(1), 103-117.
[13] Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge: MIT Press.
[14] Chomsky, N. (1986). Knowledge of Language: Its Nature, Origin, and Use. New York: Praeger Publishers.
[15] Clough, P. (2003). Measuring Text Reuse. Ph.D. Thesis, Sheffield: University of Sheffield.
[16] Cruse, D. A. (1986). Lexical Semantics. Cambridge: Cambridge University Press.
[17] Culicover, P. (1968). Paraphrase Generation and Information Retrieval from Stored Text. Mechanical Translation and Computational Linguistics, 11(1-2), 78-88.
[18] Dolan, B., Quirk, C., & Brockett, C. (2004). Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), Geneva, 350-356.
[19] Dolan, B., & Brockett, C. (2005). Automatically Constructing a Corpus of Sentential Paraphrases. Proceedings of the 3rd International Workshop on Paraphrasing (IWP 2005), Jeju Island, 9-16.
[20] Dorr, B. J., Green, R., Levin, L., Rambow, O., Farwell, D., Habash, N., Helmreich, S., Hovy, E., Miller, K. J., Mitamura, T., Reeder, F., & Siddharthan, A. (2004). Semantic Annotation and Lexico-Syntactic Paraphrase. Proceedings of the LREC 2004 Workshop on Building Lexical Resources from Semantically Annotated Corpora, Lisbon, 47-52.
[21] Dras, M. (1999). Tree Adjoining Grammar and the Reluctant Paraphrasing of Text. Ph.D. Thesis, Sydney: Macquarie University.
[22] Dutrey, C., Bernhard, D., Bouamor, H., & Max, A. (2011). Local Modifications and Paraphrases in Wikipedia’s Revision History. Procesamiento del Lenguaje Natural, 46, 51-58.
[23] Faigley, L., & Witte, S. (1981). Analyzing Revision. College Composition and Communication, 32(4), 400-414.
[24] Fuchs, C. (1988). Paraphrases Prédicatives et Contraintes énonciatives. In G. G. Bès, & C. Fuchs (Eds.), Lexique et Paraphrase, number 6 in Lexique (pp. 157-171). Villeneuve d’Ascq: Presses Universitaires de Lille.
[25] Fuchs, C. (1994). Paraphrase et énonciation. Paris: Ophrys.
[26] Fujita, A. (2005). Automatic Generation of Syntactically Well-Formed and Semantically Appropriate Paraphrases. Ph.D. Thesis, Nara: Nara Institute of Science and Technology.
[27] Halliday, M. (1994). An Introduction to Functional Grammar (2nd ed.). New York: Edward Arnold.
[28] Harris, Z. (1954). Distributional Structure. Word, 10(2-3), 146-162.
[29] Harris, Z. (1957). Co-occurence and Transformation in Linguistic Structure. Language, 33(3), 283-340.
[30] Herrera, J., Penas, A., & Verdejo, F. (2007). Paraphrase Extraction from Validated Question Answering Corpora in Spanish. Procesamiento del Lenguaje Natural, 39, 37-44.
[31] Hirst, G. (2003). Paraphrasing Paraphrased. Keynote Address for the 2nd International Workshop on Paraphrasing: Paraphrase Acquisition and Applications (IWP 2003), Sapporo.
[32] Hiz, H. (1964). The Role of Paraphrase in Grammar. Monograph Series on Language and Linguistics, 17, 97-104.
[33] Honeck, R. P. (1971). A Study of Paraphrases. Journal of Verbal Learning and Verbal Behavior, 10, 367-381.
[34] Kotlerman, L., Dagan, I., Szpektor, I., & Zhitomirsky-Geffet, M. (2010). Directional Distributional Similarity for Lexical Inference. Special Issue on Distributional Lexical Semantics. Natural Language Engineering, 16(4), 359-389.
[35] Kouylekov, M., & Magnini, B. (2005). Recognizing Textual Entailment with Tree Edit Distance Algorithms. Proceedings of the 1st PASCAL Recognising Textual Entailment Challenge (RTE I), Southampton, 11-13 April 2005, 17-20.
[36] Kozlowski, R., McCoy, K. F., & Shanker, V. K. (2003). Generation of Single-Sentence Paraphrases from Predicate/Argument Structure Using Lexico-Grammatical Resources. Proceedings of the 2nd International Workshop on Paraphrasing: Paraphrase Acquisition and Applications (IWP 2003), Sapporo, 1-8.
[37] Lakoff, G. (1971). On Generative Semantics. In D. D. Steinberg, & L. A. Jakobovits (Eds.), Semantics: An Interdisciplinary Reader in Philosophy, Linguistics and Psychology (pp. 232-296). Cambridge: Cambridge University Press.
[38] Lareau, F. (2002). La Synthèse Automatique de Paraphrases Comme Outil de vérication des Dictionnaires et Grammaires de Type Sens-Texte. Master’s Thesis, Montreal: Université de Montréal.
[39] Levin, B. (1993). English Verb Classes and Alternations: A Preliminary Investigation. Chicago: University of Chicago Press.
[40] Lin, D., & Pantel, P. (2001). DIRT-Discovery of Inference Rules from Text. Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2001), San Francisco (CA), 20-23 August 2001, 323-328.
[41] Madnani, N., & Dorr, B. J. (2010). Generating Phrasal and Sentential Paraphrases: A Survey of Data-Driven Methods. Computational Linguistics, 36(3), 341-387.
[42] Martin, R. (1976). Inférence, Antonymie et Paraphrase. Paris: Librarie C. Klincksieck.
[43] McKeown, K. (1983). Paraphrasing Questions Using Given and New Information. American Journal of Computational Linguistics, 9(1).
[44] Mel’cuk, I. (1992). Paraphrase et Lexique: La Théorie Sens-Texte et le Dictionnaire Explicatif et Combinatoire. In I. A. Mel’cuk, N. Arbatchewsky-Jumarie, L. Iordanskaja, & S. Mantha (Eds.), Dictionnaire Explicatif et Combinatoire du Francais Contemporain. Recherches Lexico-Sémantiques III (pp. 9-58). Montreal: Les Presses de l’Université de Montréal.
[45] Milicevic, J. (2007a). La Paraphrase. Modélisation de la Paraphrase Langagière. Bern: Peter Lang.
[46] Milicevic, J. (2007b). Semantic Equivalence Rules in Meaning-Text Paraphrasing. In L. Wanner (Ed.), Selected Lexical and Grammatical Issues in the Meaning-Text Theory (pp. 267-296). Amsterdam: John Benjamins.
[47] Penas, A., & Ovchinnikova, E. (2012). Unsupervised Acquisition of Axioms to Paraphrase Noun Compounds and Genitives. In A. F. Gelbukh (Ed.), CICLing 2012, Part I, LNCS 7181 (pp. 388-401). Berlin: Springer-Verlag.
[48] Potthast, M., Stein, B., Barrón-Cedeno, A., & Rosso, P. (2010). An Evaluation Framework for Plagiarism Detection. Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, 23-27 August 2010, 997-1005.
[49] Pustejovsky, J. (1995). The Generative Lexicon. Cambridge: MIT Press.
[50] Recasens, M., & Vila, M. (2010). On Paraphrase and Coreference. Computational Linguistics, 36(4), 639-647.
[51] Rinaldi, F., Dowdall, J., Kaljurand, K., Hess, M., & Mollá, D. (2003). Exploiting Paraphrases in a Question Answering System. Proceedings of the 2nd International Workshop on Paraphrasing: Paraphrase Acquisition and Applications (IWP 2003), Sapporo, 11 July 2003, 25-32.
[52] Romano, L., Kouylekov, M., Szpektor, I., Dagan, I., & Lavelli, A. (2006). Investigating a Generic Paraphrase-Based Approach for Relation Extraction. Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), Trento, 3-7 April 2006, 409-416.
[53] Rus, V., McCarthy, P. M., Graesser, A. C., & McNamara, D. S. (2009). Identification of Sentence-to-Sentence Relations Using a Textual Entailer. Research on Language and Computation, 7(2-4), 209-229.
[54] Shimohata, M. (2004). Acquiring Paraphrases from Corpora and Its Application to Machine Translation. Ph.D. Thesis, Nara: Nara Institute of Science and Technology.
[55] Shinyama, Y., Sekine, S., & Sudo, K. (2002). Automatic Paraphrase Acquisition from News Articles. Proceedings of the 2nd International Conference on Human Language Technology Research (HLT2002), San Francisco (CA), 24-27 March 2002, 313-318.
[56] Smaby, R. M. (1971). Paraphrase Grammars, volume 2 of Formal Linguistics Series. Dordrecht: D. Reidel Publishing Company.
[57] Vila, M. (2013). Paraphrase Scope and Typology. A Data-Driven Approach from Computational Linguistics. Ph.D. Thesis, Barcelona: Universitat de Barcelona.
[58] Vila, M., Bertran, M., Martí, M. A., & Rodríguez, H. (Submitted). Corpus Annotation with Paraphrase Types. New Annotation Scheme and Inter-Annotator Agreement Measures.
[59] Vila, M., Rodríguez, H., & Martí, M. A. (2013). Relational Paraphrase Acquisition from Wikipedia. The WRPA Method and Corpus. Natural Language Engineering.
[60] Zolkovskij, A., & Mel’cuk, I. (1965). O Vozmoznom Metode i Instrumentax Semanticeskogo Sinteza. Naucno-Texniceskaja Informacija, 5, 23-28.
[61] Wintner, S. (2009). What Science Underlies Natural Language Engineering? Computational Linguistics, 35(4), 641-644.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.