A New Method for Calculating Similarity between Sentences and Application on Automatic Abstracting
Wenqian JI, Zhoujun LI, Wenhan CHAO, Xiaoming CHEN
.
DOI: 10.4236/iim.2009.11007   PDF         4,824 Downloads   9,618 Views   Citations

Abstract

Sentence similarity computing plays an important role in machine question-answering systems, machine-translation systems, information retrieval and automatic abstracting systems. This article firstly sums up several methods for calculating similarity between sentences, and brings out a new method which takes all factors into consideration including critical words, semantic information, sentential form and sen-tence length. And on this basis, a automatic abstracting system based on LexRank algorithm is implemented. We made several improvements in both sentence weight computing and redundancy resolution. The system described in this article could deal with single or multi-document summarization both in English and Chinese. With evaluations on two corpuses, our system could produce better summaries to a certain degree. We also show that our system is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents. And in the end, existing problem and the developing trend of automatic summariza-tion technology are discussed.

Share and Cite:

W. JI, Z. LI, W. CHAO and X. CHEN, "A New Method for Calculating Similarity between Sentences and Application on Automatic Abstracting," Intelligent Information Management, Vol. 1 No. 1, 2009, pp. 36-42. doi: 10.4236/iim.2009.11007.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Hu Guo-Quan, Chen Jia-Jun, Dai Xin-Yu. A Examle-based Chi-nese-English Machine Translation Strategy [J]. Computer Engi-neering and Design, 2005, 26(4): 900-903. (in Chinese) (胡國全, 陳家俊, 戴新宇. 一種基於實例的漢英機器翻譯策略[J]. 電腦工程與設計, 2005, 26(4): 900-903.)
[2] Zhang Qi, Huang Xuan-Jing, Wu Li-De. A New Method for Computing Similarity between Sentences and Application on Automatic Text Summarization [J]. Journal of Chinese Infor-mation Processing. 2005, 19(2): 93-99.(in Chinese) (張奇, 黃萱菁, 吳立德. 一種新的句子相似度度量及其在文本自動摘要中的應用[J]. 中文資訊學報, 2005, 19(2): 93-99.)
[3] K. Chidananda Gowda, E. Diday. Symbolic Clustering Using a New Similarity Measure[J]. IEEE Transactions on System, Man and Cybernetic, 1992, 22(2).
[4] Li S, Zhang J. Journal of Computer Science and Technology, 2008, 17(6): 933-939.
[5] Wei Zhi-Fang, Yu Shi-Wen. A Dependency-based Model for Sentence Similarity Computing [C]. ICCIP'98, 1998. (in Chinese) (魏志方, 俞士汶. 基於骨架依存樹的句子相似度計算模型[C]. 中文資訊處理國際會議. (ICCIP'98), 1998.)
[6] Che Wan-Xiang. Similar Chinese Sentence Retrieval based on Improved Edit-Distance [J]. Chinese High Technology Letters. 2004. (In Chinese) (車萬翔等. 基於改進編輯距離的中文相似度句子檢索[J]. 高技術通訊. 2004.)
[7] Jin Yao-Hong. Text Similarity Computing Based on Context Framework Model [J]. Computer Engineering and Application. 2006(16). (In Chinese) (晉耀紅等. 基於語境框架的文本相似度計算[J]. 電腦工程與應用. 2006(16).)
[8] Pan Qian-Hong. Text Similarity Computing based on Attribute Theory [J]. Chinese Journal of Computer. 1999: 22(6). (In Chi-nese) (潘謙紅等. 基於屬性論的文本相似度計算[J]. 電腦學報. 1999: 22(6).)
[9] Chatterjee N. A Statistical approach for similarity measurement between sentences for EBMT. 1999.
[10] Luhn H P. The Automatic Creation of Literature Abstracts [J]. IBM Journal of Research and Development, 1958: 159-165.
[11] Edmundson, Wyllys. Automatic Abstracting and Indexing: Sur-vey and Recommendations. Communication of the ACM, 1961, 4(5): 226-234.
[12] Edmundson. New methods in automatic abstracting [J]. Journal of the Association for Computing Machinery, 1996, 16(2): 264-285.
[13] Pollock J J, Zamora A. Automatic Abstracting Research at Chemical Abstracts Service [J]. Journal of Chemical Information and Computer Sciences, 1975, 15(4): 226-232.
[14] Paice C D. The Automatic Generation of Literature Abstracts: An Approach Based on the Identification of Self-Indicating Phrases. Information Retrieval Research.
[15] Schank C, Abelson P. Scripts, Plans, Goals, and Understanding: An Inquiry into Human Knowledge Structures [J]. Lawrence Erlbaum Associates, Hillsdale, New Jersey, 1977.
[16] Lisa F Rau, Jacobs P S. SCISOR: Extracting Information Online News[J]. Communication of the ACM, 1990, 33(11): 88-97.
[17] S Blair-Goldensohn. Columbia University at DUC 2004[C]. In DUC ’04, 2004.
[18] Gunes Erkan, Dragomir R Radev. LexRank: Graph-Based Cen-trality as Salience in Text Summarization [J]. Journal of Artifi-cial Intelligence Research 22(2004), 12/2004.
[19] Lin, Chin-Yew, E. H. Hovy. Automatic Evaluation of Summa-ries Using N-gram Co-occurrence Statistics [J]. In Proceedings of 2003 Language Technology Conference (HLT-NAACL 2003), Canada, 2003.
[20] Zajic, David, Bonnie Dorr, Richard Schwartz. BBN/UMD at DUC-2004: Topiary. In Proceedings of the Fourth Document Understanding Conference (DUC ’04), 2004: 112-119.
[21] Huang Li-Qiong. Research on Chinese Automatic Summariza-tion and Its Evaluation Method [D]. Chongqing University, 2007. (In Chinese) (黃麗瓊. 中文自動文摘及評價方法的研究[D]. 重慶大學, 2007).
[22] Chin-Yew Lin, Eduard Hovy. The Potential and Limitations of Automatic Sentence Extraction for Summarization [J]. Univer-sity of Southern California, 2008: 73-80.
[23] Lin, C. Y. Improving summarization performance by sentence compression: A pilot study[C]. In Proceedings of the Sixth In-ternational Workshop on Information Retrieval with Asian Lan-guages. 2003: 1-9.
[24] Qin Bing, Liu Ting, Li Sheng. Summarization Based on Physical Features and Logical Structure of Multi Documents[J]. High Technology Letters, 2005, 11(2): 133-136.
[25] Zheng Yi, Huang Xuan-Jing, Wu Li-De. Research and Imple-mentation of Automatic Multi-Document Summarization System [J]. Journal of Computer Research and Development. 2003, 40(11): 107-110. (In Chinese) (鄭義,黃萱菁,吳立德. 文本自動綜述系統的研究與實現[J]. 電腦研究與發展. 2003, 40(11): 107-110.)
[26] Xu Yong-Dong. Research on Key Technology of Multiple Documents Automatic Summarization [D]. Harbin Institute of Technology, 2007. (In Chinese) (徐永東. 多文檔自動文摘關鍵技術研究[D]. 哈爾濱工業大學, 2007.)
[27] Wang Yong-Cheng, Xu Hui-Min. OA Automatic Abstracting System on Chinese Documents [J]. Journal of the China Society for Scientific and Technical Information, 1997, 16(2): 128-132. (In Chinese) (王永成, 許慧敏. OA中文文獻自動摘要系統[J]. 情報學報, 1997, 16(2): 128-132.)
[28] Xu Yong-Dong, Xu Zhi-Ming, Wang Xiao-Long. Multi-Document Automatic Summarization Technique based on Information Fusion [J]. Chinese Journal of Computers, 2007, 30(11): 2049-2054. (In Chinese) (徐永東, 徐志明, 王曉龍. 基於資訊融合的多文檔自動文摘技術[J]. 電腦學報, 2007, 30(11): 2049-2054.)
[29] Wang Ji-Cheng, Wu Gang-Shan, Zhou Yuan-Yuan, Zhang Yan-Fu. Research on Automatic Summarization of Web Docu-ment Guided by Discourse [J]. Journal of Computer Research and Development. 2003, 40(3): 398-405. (In Chinese) (王繼成, 武港山, 周源遠, 張福炎. 一種篇章結構指導的中文Web文檔自動摘要方法[J]. 電腦研究與發展. 2003, 40(3): 398-405.)
[30] Wang Meng. Research of Chinese Text Automatic Summariza-tion Based on Conceptual Vector Space Model [D]. Department of Computer Science Central China Normal University, 2005. (In Chinese) (王萌. 基於向量空間模型的中文自動文摘研究[D]. 華中師範大學, 2005.)
[31] Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu. Dragon Toolkit: Incorporating Auto-learned Semantic Knowledge into Large- Scale Text Retrieval and Mining [C]. In proceedings of the 19th IEEE International Conference on Tools with Artificial Intelli-gence (ICTAI). 2007: 29-31.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.