E. Agichtein and V. Ganti, “Mining reference tables for automatic text segmentation,” Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 20--29. - References

Journals by Subject

Publish with us

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Article citationsMore>>

E. Agichtein and V. Ganti, “Mining reference tables for automatic text segmentation,” Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004, pp. 20--29.

has been cited by the following article:

TITLE: Sequence Validation Based Extraction of Named High Cardinality Entities

AUTHORS: Khamisi Kalegele, Hideyuki Takahashi, Kazuto Sasai, Gen Kitagata, Tetsuo Kinoshita

KEYWORDS: Entity Recognition; Supervised Learning; Sequence Validation; Intelligent Systems; Text Mining

JOURNAL NAME: International Journal of Intelligence Science, Vol.2 No.4A, November 1, 2012

ABSTRACT: One of the most useful Information Extraction (IE) solutions to Web information harnessing is Named Entity Recognition (NER). Hand-coded rule methods are still the best performers. These methods and statistical methods exploit Natural Language Processing (NLP) features and characteristics (e.g. Capitalization) to extract Named Entities (NE) like personal and company names. For entities with multiple sub-entities of higher cardinality (e.g. linux command, citation) and which are non-speech, these systems fail to deliver efficiently. Promising Machine Learning (ML) methods would require large amounts of training examples which are impossible to manually produce. We call these entities Named High Cardinality Entities (NHCEs). We propose a sequence validation based approach for the extraction and validation of NHCEs. In the approach, sub-entities of NHCE candidates are statistically and structurally characterized during top-down annotation process and guided to transformation into either value types (v-type) or user-defined types (u-type) using a ML model. Treated as sequences of sub-entities, NHCE candidates with transformed sub-entities are then validated (and subsequently labeled) using a series of validation operators. We present a case study to demonstrate the approach and show how it helps to bridge the gap between IE and Intelligent Systems (IS) through the use of transformed sub-entities in supervised learning.

Open Access

Articles

Τhe Exception of Text and Data Mining from the Academic Libraries Standpoint

Maria-Daphne Papadopoulou, Krystallenia Kolotourou, Maria Bottis

Open Journal of Social Sciences Vol.9 No.5, May 25, 2021

DOI: 10.4236/jss.2021.95028
Open Access

Articles

Detection of Knowledge on Social Media Using Data Mining Techniques

Aseel Abdullah Alolayan, Ahmad A. Alhamed

Open Journal of Applied Sciences Vol.14 No.2, February 29, 2024

DOI: 10.4236/ojapps.2024.142034
Open Access

Articles

Text and Data Mining in Directive 2019/790/EU Enhancing Web-Harvesting and Web-Archiving in Libraries and Archives

Μaria Bottis, Marinos Papadopoulos, Christos Zampakolas, Paraskevi Ganatsiou

Open Journal of Philosophy Vol.9 No.3, August 28, 2019

DOI: 10.4236/ojpp.2019.93024
Open Access

Articles

Related Research on “The Belt and Road” Initiative Based on Big Data Text Mining: Taking the Domestic Area and the Korean Peninsula as an Example

Hongyi Li, Zhezhi Jin

Open Access Library Journal Vol.6 No.9, September 16, 2019

DOI: 10.4236/oalib.1105742
Open Access

Articles

Empirical Research on Web Harvesting in the Process of Text and Data Mining in National Libraries of EU Member States

Marinos Papadopoulos, Maria Botti, M. A. Paraskevi (Vicky) Ganatsiou, Christos Zampakolas

Open Journal of Philosophy Vol.10 No.1, February 7, 2020

DOI: 10.4236/ojpp.2020.101007

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals by Subject

Publish with us

Article citationsMore>>

Home

About SCIRP

Service

Policies