Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model
Junhua Wu, Longxia Liu
DOI: 10.4236/jilsa.2010.23017   PDF    HTML     4,469 Downloads   8,662 Views   Citations


Currently, large amounts of information exist in Web sites and various digital media. Most of them are in natural lan-guage. They are easy to be browsed, but difficult to be understood by computer. Chunk parsing and entity relation extracting is important work to understanding information semantic in natural language processing. Chunk analysis is a shallow parsing method, and entity relation extraction is used in establishing relationship between entities. Because full syntax parsing is complexity in Chinese text understanding, many researchers is more interesting in chunk analysis and relation extraction. Conditional random fields (CRFs) model is the valid probabilistic model to segment and label sequence data. This paper models chunk and entity relation problems in Chinese text. By transforming them into label solution we can use CRFs to realize the chunk analysis and entities relation extraction.

Share and Cite:

J. Wu and L. Liu, "Chunk Parsing and Entity Relation Extracting to Chinese Text by Using Conditional Random Fields Model," Journal of Intelligent Learning Systems and Applications, Vol. 2 No. 3, 2010, pp. 139-146. doi: 10.4236/jilsa.2010.23017.

Conflicts of Interest

The authors declare no conflicts of interest.


[1] E. C. Mary and J. M. Raymond, “Relational Learning of Pattern-match Rules for Information Extraction,” Ph.D. Thesis, University of Texas, Austin, 1998.
[2] S. Stephen, “Learning Information Extraction Rules for Semi-Structured and Free Text,” Machine Learning, Vol. 34, No. 13, 1999, pp. 233-272.
[3] D. Freitag and A. McCallum, “Information Extraction with HMM Structures Learned by Stochastic Optimization,” Proceedings of 18th Conference on Artificial Intelligence, AAAI Press, Edmonton, 2002, pp. 584-589.
[4] R. Souyma and C. Mark, “Representing Sentence Structure in Hidden Markov Models for Information Extraction,” Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann, Washington, 2001, pp. 1273-1279.
[5] T. Scheffer, C. Decomain and S. Wrobel, “Active Hidden Markov Models for Information Extraction,” Proceedings of the Fourth International Symposium on Intelligent Data Analysis, Springer, Lisbon, 2001, pp. 301-109.
[6] D. Freitag, A. McCallum and F. Pereira, “Maximum En-tropy Markov Models for Information Extraction and Segmentation,” Proceedings of the Seventeenth Interna-tional Conference on Machine Learning, Morgan Kauf-mann, San Francisco, 2000, pp. 591-598.
[7] H. L. Sun and S. W. Yu, “Shallow Parsing: An Over-view,” Contemporary Linguistics, 2000.
[8] S. Miller, M. Crystal, H. Fox, L. Ramshaw, R. Schwartz, R. Stone and R. Weischedel, “Algorithms that Learn to Extract Information-BBN: Description Of The SIFT Sys-tem as Used for MUC-7, Proceedings of MUC-7, Fairfax, 1998.
[9] J. Lafferty, A. McCallum and F. Pereira, “Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data,” Proceedings of the International Conference on Machine Learning (ICML), 2001, pp. 282-289.
[10] Y. Y. Luo and D. G. Huang, “Chinese Word Segmentation Based on the Marginal Probabilities Generated by CRFs,” Journal of Chinese Information Processing, Vol. 23, No. 5, 2009, pp. 3-8.
[11] M.-C. Hong, K. Zhang, J. Tang and J.-Z. Li “A Chinese Part-of-Speech Tagging Approach Using Conditional Random Fields,” Computer Science, Vol. 33, No. 10, 2006, pp. 148-152.
[12] S. P. Abney and C. Tenny, “Parsing by Chunks. Principle based Parsing: Computation and Psycholinguistics,” Kluwer Academic Publishers, Dordrecht, 1991, pp. 257-278.
[13] F. Erik, “Tjong Kim Sang and Sabine Buch holz. Intro-duction to the Conll-2000 Shared Task: Chunking,” Pro-ceedings of CoNLL-2000 and LLL2000, Lisbin, 2000, pp. 127-132.
[14] L. Ramshaw and M. Marcus, “Text Chunking Using Transformation-Based Learning,” In: D. Yarovsky and K. Church, Eds., Proceedings of the Third Workshop on Very Large Corpora, Association for Computational Linguistics, Somerset, 1995, pp. 82-94.
[15] J. Hammerton, M. Osborne, S. Armstrong and W. Daelemans, “Introduction to Special Issue on Machine Learning Approaches to Shallow Parsing,” Journal of Machine Learning Research, Vol. 2, No. 3, 2002, pp. 551-558.
[16] K. Nanda, “Combining Lexical, Syntactic and Semantic Features with Maximum Entropy Models for Extracting Relations,” Proceedings of the ACL 2004 on Interactive poster and demonstration sessions, Barcelona, 2004, pp. 22-25.
[17] D. Zelenko, C. Aone and A. Richardella, “Kernel Methods for Relation Extraction,” Journal of Machine Learning Research, Vol. 3, 2003, pp. 1083-1106.
[18] C. Whitelaw, A. Kehlenbeck, N. Petrovic, et al., “Web- Scale Named Entity Recognition,” Proceeding of ACM 17th Conference on Information and Knowledge Man-agement, Napa Valley, 2008, pp. 123-132.
[19] Z. Q. Chen, D. V. Kalashnikov and S. Mehrotra, “Adap-tive Graphical Approach to Entity Resolution,” Proceed-ings of ACM IEEE Joint Conference on Digital Libraries, Vancouver, 2007, pp. 204-213.
[20] X. P. Han and J. Zhao, “Person Name Disambiguation Based on Web-Based Person Mining and Categorization,” 2nd Web People Search Evaluation Workshop in con-junction with WWW2009, Madrid, 2009.
[21] S. D. Pietra, R. L. Mercer and S. Roukos, “Adaptive Language Modeling Using Minimum Discriminate Esti-mation,” Proceedings of the Speech and Natural Language DARPA Workshop, San Francisco, 1992, pp. 103-106.
[22] R. Rosenfeld, “Adaptive Statistical Language Modeling: A Maximum Entropy Approach,” Ph.D. Thesis, School of Computer Science, Carnegie Mellon University, Pittsburgh, 1994.
[23] A. McCallum and W. Li, “Early Results for Named Entity Recognition with Conditional Random Fields Feature Induction and Web-Enhanced Lexicons,” Proceedings of CoNLL-2003 Association for Computational Linguistics, Daelemans, 2003, pp. 188-191.
[24] K. Tjong, E. F. Sang and S. Buchholz, “Introduction to the CoNLL-2000 Shared Task: Chunking,” Proceedings of CoNLL-2000 and LLL-2000 Association for Computa-tional Linguistics, Lisbon, 2000, pp. 127-132.
[25] K. Tjong, E. F. Sang and J. Veenstra, “Representing Text Chunks,” Proceedings of EACL’99, Association for Computational Linguistics, Bergen, 1995, pp. 173-179.
[26] J. Zhao, “A Survey on Named Entity Recognition, Dis-ambiguation and Cross 2 Lingual Conference Resolution,” Journal of Chinese Information Processing, Vol. 23, No. 2 March 2009, pp. 3-17.

Copyright © 2023 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.