Intelligent Information Management

Volume 2, Issue 2 (February 2010)

ISSN Print: 2160-5912   ISSN Online: 2160-5920

Google-based Impact Factor: 1.6  Citations  

Text Extraction in Complex Color Document Images for Enhanced Readability

HTML  Download Download as PDF (Size: 2745KB)  PP. 120-133  
DOI: 10.4236/iim.2010.22015    9,171 Downloads   16,516 Views  Citations

Affiliation(s)

.

ABSTRACT

Often we encounter documents with text printed on complex color background. Readability of textual contents in such documents is very poor due to complexity of the background and mix up of color(s) of foreground text with colors of background. Automatic segmentation of foreground text in such document images is very much essential for smooth reading of the document contents either by human or by machine. In this paper we propose a novel approach to extract the foreground text in color document images having complex background. The proposed approach is a hybrid approach which combines connected component and texture feature analysis of potential text regions. The proposed approach utilizes Canny edge detector to detect all possible text edge pixels. Connected component analysis is performed on these edge pixels to identify candidate text regions. Because of background complexity it is also possible that a non-text region may be identified as a text region. This problem is overcome by analyzing the texture features of potential text region corresponding to each connected component. An unsupervised local thresholding is devised to perform foreground segmentation in detected text regions. Finally the text regions which are noisy are identified and reprocessed to further enhance the quality of retrieved foreground. The proposed approach can handle document images with varying background of multiple colors and texture; and foreground text in any color, font, size and orientation. Experimental results show that the proposed algorithm detects on an average 97.12% of text regions in the source document. Readability of the extracted foreground text is illustrated through Optical character recognition (OCR) in case the text is in English. The proposed approach is compared with some existing methods of foreground separation in document images. Experimental results show that our approach performs better.

Share and Cite:

P. Nagabhushan and S. Nirmala, "Text Extraction in Complex Color Document Images for Enhanced Readability," Intelligent Information Management, Vol. 2 No. 2, 2010, pp. 120-133. doi: 10.4236/iim.2010.22015.

Cited by

[1] An end to end system for subtitle text extraction from movie videos
2021
[2] Comparative Analysis of Multi-scale Wavelet Decomposition and k-Means Clustering Based Text Extraction
2019
[3] Preprocessing Techniques for High Quality Text Extraction from Text Images
2019
[4] Automatic Text Recognition Using Difference Ratio
Smart Computing and Informatics, 2018
[5] Optimization of Text Extraction on Law Enforcement Data
2018
[6] Optical Character Recognition based approach for automatic Image Marking Process
2018
[7] Применение методов классификации для анализа визитных карточек в мобильном телефоне
2018
[8] Text detection and character extraction in natural scene images using fractional poisson model
2017
[9] DOCUMENT LAYOUT ANALYSIS USING INVERSE SUPPORT VECTOR MACHINE (I-SVM) FOR HINDI NEWS PAPER IN IMAGE PROCESSING
2017
[10] A simple text detection in document images using classification-based techniques
2017
[11] Paper Form Digitization for Hand Written Text
2017
[12] Kannada Script Recognitions from Scanned Book Cover Images
International Journal of Applied Engineering Research [IJAER], 2017
[13] Script identification from camera based Tri-Lingual document
2017
[14] A Hybrid Method for Text Extraction from Mosaiced Image of Text Dominant Video
2016
[15] Text extraction in document images: highlight on using corner points
2016
[16] A New Approach to Extract Text from Images based on DWT and K-means Clustering
Numerical Heat Transfer Part A Applications, 2016
[17] An Audio Based Real Time Text Detection and Recognition Approach for Visually Impaired People
Journal of Computational and Theoretical Nanoscience, 2016
[18] Scene Text Detection & Language Translation Using SWT and EBMT
2016
[19] Mathematical Morphology and Region Clustering Based Text Information Extraction from Malayalam News Videos
Advances in Signal Processing and Intelligent Recognition Systems, 2016
[20] Mathematical Morphology and Region Clustering Based Text Information Extraction from Malayalam News Videos.
2015
[21] Design and Implementation of Scene Text Detection
2015
[22] Text Extraction from Document Images Using Gabor, Wavelet and Hough Technique: A Novel Approach
International Journal of Innovative Research in Computer Science & Technology, 2015
[23] Morphology Based Text Detection and Extraction from Malayalam News Videos
2015
[24] Improving OCR performance with background image elimination
Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on, 2015
[25] Text Extraction in Images Using DWT, Gradient Method And SVM Classifier
N Syal, NK Garg - ijetae.com, 2014
[26] 인식 개선을 위한 전자 메일의 시각적ㆍ인지적 구성 요소
THE KOREAN JOURNAL OF COGNITIVE AND BIOLOGICAL PSYCHOLOGY, 2014
[27] 인식 개선을 위한 전자 메일의 시각적, 인지적 구성 요소
한국심리학회지: 인지 및 생물, 2014
[28] 인식 개선을 위한 전자 메일의 시각적ㆍ인지적 구성 요소: 서비스 불만족 고객에 대한 사과 메일을 중심으로
2014
[29] Text extraction in images using dwt gradient method and svm classifier
2014
[30] Text Extraction from Document Images-A Review
International Journal of Computer Applications, 2013
[31] Unsupervised Text Extraction from G-Maps
arXiv preprint arXiv:1404.6075, 2013
[32] Comprehensive color segmentation system for noisy digitized documents to enhance text extraction
IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 2012., 2012
[33] Segmentation et classification dans les images de documents numérisés
2012
[34] Comprehensive color segmentation system for noisy digitized documents to enhance text extraction.
2012
[35] A Survey on various approaches of text extraction in images
Int. Journal of Computer Science and Engineering Survey, 2012
[36] An Optimised Approach on Object and Text Detection from Real Time Data using Histogram Equalization

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.