Multi-Document Summarization Model Based on Integer Linear Programming

Abstract

This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main content of the text, and summaries are created by extracting the highest scored sentences from the original document. The model formalized as a multiobjective integer programming problem. An advantage of this model is that it can cover the main content of source (s) and provide less redundancy in the generated sum- maries. To extract sentences which form a summary with an extensive coverage of the main content of the text and less redundancy, have been used the similarity of sentences to the original document and the similarity between sentences. Performance evaluation is conducted by comparing summarization outputs with manual summaries of DUC2004 dataset. Experiments showed that the proposed approach outperforms the related methods.

Share and Cite:

R. Alguliev, R. Aliguliyev and M. Hajirahimova, "Multi-Document Summarization Model Based on Integer Linear Programming," Intelligent Control and Automation, Vol. 1 No. 2, 2010, pp. 105-111. doi: 10.4236/ica.2010.12012.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] X. Cai, W. Li and Y. Ouyang, “Simultaneous Ranking and Clustering of Sentences: A Reinforcement Approach to Multi-Document Summarization,” Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 23-27 August 2010, pp. 134-142.
[2] C. C. Yang and F. L. Wang, “Hierarchical Summarization of Large Documents,” Journal of the American Society for Information Science and Technology, Vol. 59, No.6, 2008, pp. 887-902.
[3] Y. Tao, S. Zhou, W. Lam and J. Guan, “Towards More Text Summarization Based on Textual Association Networks,” Proceedings of the 2008 4th International Conference on Semantics, Knowledge and Grid, Beijing, China, 3-5 December 2008, pp. 235-240.
[4] M. A. Fattah and F. Ren, “GA, MR, FFNN, PNN and GMM Based Models for Automatic Text Summarization,” Computer Speech and Language, Vol. 23, No. 1, 2009, pp. 126-144.
[5] X. Wan, “Using Only Cross-Document Relationships for Both Generic and Topic-Focused Multi-Document Summarizations,” Information Retrieval, Vol. 11, No. 1, 2008, pp. 25-49.
[6] R. M. Aliguliyev, “Clustering Techniques and Discrete Particle Swarm Optimization Algorithm for Multi-Docu- ment Summarization,” Computational Intelligence, Vol. 26, No. 4, 2010, pp. 1-29.
[7] I. Mani and M. T. Maybury, “Advances in Automatic Text Summarization,” MIT Press, Cambridge, 1999.
[8] R. M. Alguliev and R. M. Aliguliyev, “Effective Summarization Method of Text Documents,” Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Compiegne, France, 19-22 September 2005, pp. 264-271.
[9] D. Radev, H. Jing, M. Stys and D. Tam, “Centroid-Based Summarization of Multiple Documents,” Information Processing and Management, Vol. 40, No. 6, 2004, pp. 919-938.
[10] R. M. Aliguliyev, “A Novel Partitioning-Based Clustering Method and Generic Document Summarization,” Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, Hong Kong, China, 18-22 December 2006, pp. 626-629.
[11] R. M. Alguliev and R. M. Alyguliev, “Automatic Text Documents Summarization through Sentences Clustering,” Journal of Automation and Information Sciences, Vol. 40, No. 9, 2008, pp.53-63.
[12] R. M. Aliguliyev, “A New Sentence Similarity Measure and Sentence Based Extractive Technique for Automatic Text Summarization,” Expert Systems with Applications, Vol. 36, No. 4, 2009, pp. 7764-7772.
[13] R. M. Alyguliyev, “The Two-Stage Unsupervised Approach to Multidocument Summarization,” Automatic Control and Computer Sciences, Vol. 43, No. 5, 2009, pp. 276-284.
[14] R. M. Alguliev and R. M. Aliguliyev, “Evolutionary algorithm for Extractive Text Summarization,” Journal of Intelligent Information Management, Vol. 1, No. 2, 2009, pp. 128-138.
[15] C. Shen and T. Li, “Multi-Document Summarization via the Minimum Dominating Set,” Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 23-27 August 2010, pp. 761-769.
[16] D. Wang and T. Li, “Many are Better than One: Improving Multi-Document Summarization via Weighted Consensus,” Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland, 19-23 July 2010, pp. 809-810.
[17] W.-T.Yih, J. Goodman, L. Vanderwende and H. Suzuki, “Multi-Document Summarization by Maximizing Informative Content-Words,” Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 6-12 January 2007, pp. 1776-1782.
[18] G. Erkan and D. R. Radev, “LexPageRank: Prestige in Multi-Document Text Summarization,” Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25-26 July 2004, pp. 365-371.
[19] E. Filatova and V. Hatzivassiloglou, “A Formal Model for Information Selection in Multi-Sentence Text Extraction,” Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland, 23-27 August 2004, pp. 397-403.
[20] R. McDonald, “A Study of Global Inference Algorithms in Multi-Document Summarization,” Proceedings of 29th European Conference on IR Research, Rome, Italy, 2-5 April 2007, Springer-Verlag, LNCS, No. 4425, 2007, pp. 557-564.
[21] H. Takamura and M. Okumura, “Text Summarization Model Based on Maximum Coverage Problem and Its Variant,” Proceedings of the 12th Conference of the European Chapter of the ACL, Athens, Greece, 30 March -3 April 2009, pp.781?789.
[22] D. Shen, J.-T. Sun, H. Li, et al., “Document Summarization Using Conditional Random Fields,” Proceedings of the 20th International Joint Conference on Artificial Intelligence, Hyderabad, India, 6-12 January 2007, pp. 2862-2867.
[23] H. Takamura and M. Okumura, “Text Summarization Model Based on the Budgeted Median Problem,” Proceedings of the 18th ACM International Conference on Information and Knowledge Management, Hong Kong, China, 2-6 November 2009, pp. 1589-1592.
[24] D. Wang, S. Zhu, T. Li, and Y. Gong, “Multi-Document Summarization Using Sentence-Based Topic Models,” Proceedings of the ACL-IJCNLP, Singapore, 2-7 August 2009, pp. 297-300.
[25] Y. Ouyang, W. Li, Q. Lu and R. Zhang, “A Study on Position Information In Document Summarization,” Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 23-27 August 2010, pp. 919-927.
[26] J. G. Carbonell and J. Goldstein, “The Use of MMR, Diversity-Based Reranking for Reordering Documents And Producing Summaries,” Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 24-28 August 1998, pp. 335-336.
[27] Document Understanding Conferences: http://duc.nist.gov/
[28] English stoplist: ftp://ftp.cs.cornell.edu/pub/smart/english.stop
[29] Porter Stemming Algorithm: http://www.tartarus.org/martin/PorterStemmer/
[30] GNU Linear Programming: http://www.gnu.org/software/glpk/
[31] C.-Y. Lin, “ROUGE: A Package for Automatic Evaluation Summaries,” Proceedings of the Workshop on Text Summarization Branches Out, Barcelona, Spain, 25-26 July 2004, pp. 74-81

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.