International Journal of Intelligence Science

Volume 2, Issue 4 (October 2012)

ISSN Print: 2163-0283   ISSN Online: 2163-0356

Google-based Impact Factor: 0.58  Citations  

Sequence Validation Based Extraction of Named High Cardinality Entities

HTML  Download Download as PDF (Size: 752KB)  PP. 190-202  
DOI: 10.4236/ijis.2012.224025    4,323 Downloads   8,076 Views  Citations

ABSTRACT

One of the most useful Information Extraction (IE) solutions to Web information harnessing is Named Entity Recognition (NER). Hand-coded rule methods are still the best performers. These methods and statistical methods exploit Natural Language Processing (NLP) features and characteristics (e.g. Capitalization) to extract Named Entities (NE) like personal and company names. For entities with multiple sub-entities of higher cardinality (e.g. linux command, citation) and which are non-speech, these systems fail to deliver efficiently. Promising Machine Learning (ML) methods would require large amounts of training examples which are impossible to manually produce. We call these entities Named High Cardinality Entities (NHCEs). We propose a sequence validation based approach for the extraction and validation of NHCEs. In the approach, sub-entities of NHCE candidates are statistically and structurally characterized during top-down annotation process and guided to transformation into either value types (v-type) or user-defined types (u-type) using a ML model. Treated as sequences of sub-entities, NHCE candidates with transformed sub-entities are then validated (and subsequently labeled) using a series of validation operators. We present a case study to demonstrate the approach and show how it helps to bridge the gap between IE and Intelligent Systems (IS) through the use of transformed sub-entities in supervised learning.

Share and Cite:

K. Kalegele, H. Takahashi, K. Sasai, G. Kitagata and T. Kinoshita, "Sequence Validation Based Extraction of Named High Cardinality Entities," International Journal of Intelligence Science, Vol. 2 No. 4A, 2012, pp. 190-202. doi: 10.4236/ijis.2012.224025.

Cited by

[1] Four Decades of Data Mining in Network and Systems Management
IEEE Transactions on Knowledge and Data Engineering, 2015
[2] Multiagent–based processing and integration of system data
International Journal of Intelligent Systems Technologies and Applications, 2013
[3] A Data Reservoir Agent for KDD-based Systems Analytics
IW-STEIC, 2013

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.