Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization

Abstract

Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches.

 

Share and Cite:

M. Mehr, "Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization," Intelligent Information Management, Vol. 5 No. 6, 2013, pp. 182-190. doi: 10.4236/iim.2013.56020.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] [1] S. D. Afantenos, V. Karkaletsis and P. Stamatopoulos, “Summarization from Medical Documents: A Survey,” Journal of Artificial Intelligence in Medicine, Vol. 33, No. 2, 2005, pp. 157-177.
http://dx.doi.org/10.1016/j.artmed.2004.07.017
[2] I. Mani, “Automatic Summarization,” John Benjamins Publishing Company, Amsterdam/Philadelphia, 2001.
[3] X. Wan, J. Yang and J. Xiao, “Manifold-Ranking Based Topic-Focused Multi Document Summarization,” Proceedings of the 20th International Joint Conference on Artifical Intelligence, Hyderabad, 6-12 January 2007, pp. 2903-2908.
[4] H. Jing and K. McKeown, “Cut and Paste Based Text Summarization,” Proceedings of the 1st Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2000), Seatle, Washington DC, June 2000.
[5] K. Knight and D. Marcu, “Summarization Beyond Sentence Extraction: A Probablistic Approach to Sentence Compression,” Artificial Intelligence, Vol. 139, No. 1, 2002, pp. 91-107.
[6] E. H. Hovy, “Automated Text Summarization,” In: R. Mitkov, Ed., The Oxford Handbook of Computational Linguistics, Oxford University Press, Oxford, 2005, pp. 583-598. http://dx.doi.org/10.1093/oxfordhb/9780199276349.013.0032
[7] D. R. Radev, H. Jing, M. Sty and D. Tam, “Centroid-Based Summarization of Multiple Documents,” Journal of Information Processing and Management, Vol. 40, No. 6, 2004, pp. 919-938.
http://dx.doi.org/10.1016/j.ipm.2003.10.006
[8] H. Hilda, “Cross-Document Summarization by Concept Classification,” Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, 11-15 August 2002, pp. 121-128.
[9] M. Mitra, S. Amit and B. Chris, “Automatic Text Summarization by Paragraph Extraction,” ACL/EACL-97 Workshop on Intelligent Scalable Text Summarization, Madrid, 20 July 1997, pp. 31-36.
[10] K. Knight and D. Marcu, “Summarization beyond Sentence Extraction: A Probablistic Approach to Sentence Compression,” Artifcial Intelligence, Vol. 139, No. 1, 2002, pp. 91-107.
[11] B. Regina, K. R. McKeown and M. Elhadad, “Information Fusion in the Context of Multi Document Summarization,” Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, Morristown, 20-26 June 1999, pp. 550-557.
[12] J. Manuel and MANA-LOPEZ, “Multi-Document Summarization: An Added Value to Clustering in Interactive Retrieval,” ACM Transactions on Information Systems, Vol. 22, No. 2, 2004, pp. 215-241.
[13] Y.-X. He, D.-X. Liu, D.-H. Ji and C. Teng, “MSBGA: A Multi-Document Summarization System Based on Genetic Algorithm,” Proceedings of the 5th International Conference on Machine Learning and Cybernetics, Daliban, 13-16 August 2006, pp. 2659-2664.
[14] G. J. Carbonell and J. Goldstein, “The Use of MMR, Diversity-Based Re-Ranking for Reordering Documents and Producing Summaries,” Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, 24-28 August 1998, pp. 335-336.
[15] J. Kupiec, J. O. Pedersen and F. Chen, “A Trainable Document Summarizer,” Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington DC, 1995, pp. 68-73.
[16] L. D. Day, L. Hirschman, R. Kozierok, S. Mardis, T. McEntee, et al., “Real Users, Real Data, Real Problems: The MiTAP System for Monitoring Bio Events,” Proceedings of the Conference on Unified Science & Technology for Reducing Biological Threats & Countering Terrorism (BTR 2002), University of New Mexico, Mexico, 2002, pp. 167-177.
[17] D. B. Johnson, Q. Zou, J. D. Dionisio, V. Z. Liu and W. W. Chu, “Modeling Medical Content for Automated Summarization,” Annals of the New York Academy of Sciences, Vol. 980, 2002, pp. 247-258.
[18] R. Gaizauskas, P. Herring, M. Oakes, M. Beaulieu, P. Willett, H. Fowkes, et al., “Intelligent Access to Text: Integrating Information Extraction Technology into Text Browsers,” Proceedings of the Human Language Technology Conference (HLT 2001), San Diego, 2001, pp. 189-193.
[19] H. Chen, A. Lally, B. Zhu and M. Chau, “HelpfulMed: Intelligent Searching for Medical Information over the Internet,” Journal of American Society for Information Science and Technology (JASIST), Vol. 54, No. 7, 2003, pp. 683-694. http://dx.doi.org/10.1002/asi.10260
[20] M. Fiszman, T. Rindflesch and H. Kilicoglu, “Abstraction Summarization for Managing the Biomedical Research Literature,” Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics, Stroudsburg, 2004, pp. 76-83.
[21] P. Chen and R. Verma, “A Query-Based Medical Information Summarization System Using Ontology Knowledge,” Proceedings of the 19th IEEE Symposium on Computer Based Medical Systems, Salt Lake City, 2006, pp. 37-42.
[22] R. Barzilay and M. Elhadad, “Using Lexical Chains for Text Summarization,” In: I. Mani and M. T. Maybury, Eds., Advances in Automatic Text Summarization, The MIT Press, Cambridge, 1999, pp. 111-121.
[23] L. H. Reeve, H. Han and A. D. Brooks, “The Use of Domain-Specific Concepts in Biomedical Text Summarization,” Journal of Information Processing and Management, Vol. 43, No. 6, 2007, pp. 1765-1776.
http://dx.doi.org/10.1016/j.ipm.2007.01.026
[24] R. Barzilay, N. Elhadad and K. McKeown, “Sentence Ordering in Multi-Document Summarization,” Proceedings of the Human Language Technology Conference, San Diego, 2001, pp. 1-7.
[25] J. R. Quinlan, “C4.5: Programs for Machine Learning,” Morgan Kaufmann, California, 1993.
[26] G. Salton and C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing and Management, Vol. 24, No. 5, 1998, pp. 513-523.
http://dx.doi.org/10.1016/0306-4573(88)90021-0
[27] J. Silla, C. Nascimento, G. L. Pappa, A. A. Freitas and C. A. A. Kaestner, “Automatic Text Summarization with Genetic Algorithm-Based Attribute Selection,” Lecture Notes in Artificial Intelligence, 2004.
[28] C. Y. Lin and E. Hovy, “Identifying Topics by Position,” Proceedings of the 5th Applied Natural Language Processing Conference, 1997, pp. 283-290.
[29] G. Erkan and D. R. Radev, “LexRank: Graph-Based Lexical Centrality as Salience in Text Summarization,” Journal of Artificial Intelligence Research (JAIR), Vol. 22, 2004, pp. 457-479.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.