Journal of Computer and Communications

Volume 5, Issue 14 (December 2017)

ISSN Print: 2327-5219   ISSN Online: 2327-5227

Google-based Impact Factor: 1.12  Citations  

Mathematical Expression Extraction in Text Fields of Documents Based on HMM

HTML  XML Download Download as PDF (Size: 2048KB)  PP. 1-13  
DOI: 10.4236/jcc.2017.514001    896 Downloads   2,700 Views  Citations

ABSTRACT

Aiming at the problem that the mathematical expressions in unstructured text fields of documents are hard to be extracted automatically, rapidly and effectively, a method based on Hidden Markov Model (HMM) is proposed. Firstly, this method trained the HMM model through employing the symbol combination features of mathematical expressions. Then, some preprocessing works such as removing labels and filtering words were carried out. Finally, the preprocessed text was converted into an observation sequence as the input of the HMM model to determine which is the mathematical expression and extracts it. The experimental results show that the proposed method can effectively extract the mathematical expressions from the text fields of documents, and also has the relatively high accuracy rate and recall rate.

Share and Cite:

Tian, X.D., Bai, R.H., Yang, F., Bai, J.Y. and Li, X.F. (2017) Mathematical Expression Extraction in Text Fields of Documents Based on HMM. Journal of Computer and Communications, 5, 1-13. doi: 10.4236/jcc.2017.514001.

Cited by

[1] Automated Creation and Human-assisted Curation of Computable Scientific Models from Code and Text
arXiv preprint arXiv …, 2022
[2] Mathematical Expression Extraction from Unstructured Plain Text
International Conference on Applications of Natural Language to Information Systems, 2019
[3] Automatic answer generation for math word problems
2019

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.