An Improved Fuzzy Based Algorithm for Detecting Text from Images Using Stroke Width Transform

The mounting escalation of multimedia content in information indexing and retrieval has made a high venture for extracting the text from videos. In this paper a new approach has been made using Stroke Width Transform (SWT) with intuitionistic fuzzy set theory for detecting text from images. The proposed methodology uses a fuzzy based edge detection method instead of the conventional edge detection methods. The fuzzy inference system is designed for edge detection. The edge detection is based on membership degree called hesitation degree and distance measure called intuitionistic fuzzy divergence. For text detection, SWT operator is used which tries to capture only the effectual text features. Geometric features of the text are used to differentiate between text and non-text regions of the text. The resulting system is tested on ICDAR 2003 dataset which produces promising results.


Introduction
Edge detection in image processing is an important task in the field of application of image processing. In this paper the two efficient techniques' image processing and soft computing are combined. Soft computing is a rising region for tribulations trading with uncertainties. Here fuzzy logic is used to detect the edges of images. Edge detection refers to the course of action of discovering pointed irregularities in an image. Soft computing is an up-and-coming promising field that includes fuzzy logic, genetic engineering, evolutionary computation and neural networks. The uncertainties may be due to noise, complex background, different orientations or intensi-ties. In fuzzy set theory, every element is identified with a degree of membership. The degree of non-membership is equal to one minus the degree of membership. Usually for text detection using SWT [1], the conventional canny edge detection is used. In our system we use a fuzzy based edge detection method followed by SWT. In the proposed system, the image is segmented into 3 × 3 binary matrices. Initially, we divide the image into divisions by means of a 3 × 3 binary template matrix. Fuzzy inference system is intended to be with eight inputs that keep up a correspondence with eight pixels of the template matrix and one output. At the stage in fuzzification, the input image is transformed to the fuzzy domain [0 1]. Then the hesitation degree or intuitionistic fuzzy index is prognosticated. The utmost value of the divergence value among the sixteen fuzzy rule templates and the original image of the identical size is prognosticated. During defuzzification, the edge image is transformed back to the image pixel domain in the interval [1 255]. PSNR error metric is prognosticated to compare them with canny edge detection method and the proposed fuzzy edge detection method is found to be efficient. SWT is used to find the similarity between strokes based on their width. Initial value of each element in SWT is set to infinity. Letters in an image are provided with parallel sides and hence we find the edges of the texts using fuzzy. Next we calculate the gradient at each edges of the pixel. Letter candidates are then formed and filtering is carried out. The eligible pairs are then integrated together which give the detected text.

Related Work
The authors in [2] proposed a modified technique using fuzzy logic which is rule based. In [3], a fuzzy based technique was designed for detecting edge without setting the threshold value. In [4], the edge of the gray scale image is determined using fuzzy logic. This work is demonstrated along with the existing Sobel and prewitt edge detector. [5] proposed image edge detection based on soft computing approach which enhances the edge detection where the histogram is applied with fuzzy logic. [6] proposed edge detection using fuzzy logic in matlab. This work is based on sixteen rules, which differentiate the target pixel. [7] proposed an edge detection method without setting threshold value. It utilizes the nominal 2 × 2 visor that skim over the entire image pixel by pixel. Fuzzy inference system and traditional edge operators are combined in [8]. In [9], the noise is removed at different level of processing. First and second derivative is implemented to FIS resultant image. In [10], heuristic rules were applied and a fuzzy based approach used to fulfill filtering and edge extraction simultaneously. Three linear spatial filters and spatial convolution techniques are used in [11]. A fuzzy based edge detection filter that passes through two stage of process to remove noise from grayscale images are used in [12]. SWT used in [13] to identify connected components and an unsupervised clustering to detect the text locations. The ICDAR dataset [14] [15] is the standard yardstick for text detection of natural images. In [16], a method which identifies texts regardless of its scale, direction, font and size are proposed. The method in [17] is based on corner points in a text. In [18], a Laplacian approach is used in order to detect text from video that handles text from any orientation. Later K-means clustering is achieved over the text. [19] employed a novel stroke like edge detection method and a temporal feature in extracting texts. Machine learning technique for scene text detection was used in [20]. They tested their system on ICDAR2005 and ICDAR2011 datasets. Maximally stable extremal regions (MSERs) are extracted using a pruning algorithm in [21]. A single-link clustering algorithm was proposed for grouping of text candidates. A text classifier was used to detect the text. In [22], a part based treestructured models (TSMs) were proposed for the detection and recognition of text simultaneously. The Viterbi algorithm was used to improve the word recognition and the experimental results were conducted on ICDAR-2003 dataset. An unsupervised method was used in [23] to detect scene text where the text object was modelled as a pictorial structure. Three new character features was proposed. The performance was evaluated on ICDAR 2003/2005 dataset. In our earlier work [24], we proposed a hybrid approach by combining region and connected component (CC) based method where artificial neural network (ANN) was used as the classifier.

Methodology
An RGB image is taken as input. The original image is changed into gray scale image and converted in a fuzzy domain (0, 1) using fuzzy rules. Hesitation degree is prognosticated with the help of membership degree and non-membership degree for edge detection. Then, the divergence value is predicted and defuzzification method is implied. The stroke width at every pixel is extracted. Connected components are formed considering the stroke width of two neighboring pixels. Letter candidates are then grouped together which gives the detected text. The overall architecture of the proposed system is shown in Figure 1.

Intuitionistic Fuzzy Index
Fuzzy sets are sets with boundaries which are not accurate. The input pixels are separated into fuzzy sets explicitly black and white whereas the output pixel is separated into three fuzzy sets specifically black, white and edge. The membership in a fuzzy set is a matter of degree. If the fuzzy set is A and the relevant object is X, the proposition "X is a member of A" may be true or false. A fuzzy set can be represented as: where E is defined as the pervasive values, and where A µ defines the degree of membership.
An intuitionistic fuzzy set (IFS) A is an entity of the outline: where E is the pervasive set, and The above equation defines the membership and non-membership degree respectively The above equation is called the hesitation series.

Divergence Value
The maximum of the divergence value between the 16 fuzzy rule template and the original image of the same size is prognosticated. The fuzzy template is considered as a mask of 3 × 3 and then it is slide over the fuzzy matrix. The divergence of 3 × 3 matrixes is prognosticating with fuzzy rule template 1 to 16. Then the minimum element of the matrix for all 16 fuzzy rule templates is predicted. The divergence value is predicted as follows: Here D i is the divergence with original image matrix and fuzzy rule template i where i varies from 1 to 16, A is the original image in 3 × 3 matrix, t i is the fuzzy template , p i t i is the intuitionistic fuzzy index.
In an image of size M × M with L distinct gray levels having probabilities 0 1 1 , , , L p p p −  , and the exponential entropy is defined as where n = M and , 0,1, 2, , is the membership degree of the (i,j) th pixel a ij in the image A and If A and B are two images then the membership degrees at (i,j) pixels for the images A and B is given by the following Equation (8). Based on entropy of fuzzy, the divergence between images A and B due to m 1 (A) and m 1 (B) may be given as Similarly, the divergence of B against A is: So the total divergence between the pixels a ij and b ij of the images A and B due to m 1 (A) and m 1 (B) is given in Equation (10).
Likewise, the total divergence between the pixels a ij and b ij of the images A and B due to m 2 (A) and m 2 (B) is Thus the overall intuitionistic fuzzy divergence IFD between the images A and B is defined as The above Equation (12) gives the total divergence between the pixels a ij and b ij of the images A and B due to

Fuzzy Inference Rules
The inference rules are based on the weight of the eight neighborhood gray level pixels. These rules encompass the capability to haul out the edges in the developed gray image competently. The consequential image has the form of black and the white regions. The input which undergoes a process of fuzzification and the output which

R E T R A C T E D
follows the process of defuzzification pixel is ranged from 0 -255. Finally the black, white and edge are detected. The following rules will decide the output pixels range of three decisions-white , black or edge fuzzy set. Figure 2 represents the rules for fuzzy inference system. The values of a & b are chosen using trial and error method.

Text Detection
Stroke Width Transform SWT is used to calculate the width of the most likely stroke containing the pixel. The stroke-width variance is determined and should not be too large or too small. The variance of the stroke-width is set based on trial and error basis. The stroke width is the key parameter in forming the letter candidates. Connected component algorithm is used for grouping the text. The SWT Detector can detect letters of different languages like English, Hebrew, and Arabic etc. The text can be of varying sizes, different orientation. The SWT can also detect handwritten text. The algorithm depicting the overall process for detecting text is given below.

Evaluation and Discussion
The ICDAR dataset [14] [15] is the standard yardstick for text detection of natural images. The ICDAR dataset consists of 258 images in the training set and 251 images in the test set. The images in the dataset are in full-color and it varies in size from 307 × 93 to 1280 × 960 pixels. We follow [1] for the evaluation of the proposed system. Our method is compared with respect to f-measure. The f-measure is in turn a combination of two other measures: precision and recall. Ground truth boxes called targets is made available in the dataset. The output which contains the bounding boxes (the green rectangles in Figure 6) for the correctly detected words are called estimate. The intersection area over the minimum bounding box area of both rectangles is called the match p m between the two rectangles. The best match ( ) ; m r R for a rectangle r among a set of rectangles R is defined as where T is the set of ground-truth and E is the set of estimated rectangles. Two more metric for performance evaluation is taken from [25] namely, Character-Recall (CR) and Word-Recall (WR). The CR is defined as the  a a a a a b b b b b a a b a  a  b   b a  a b  b a b a  a  b b b b   b a b b b b  a a a  b b a  a  b a a a   a a a a b   b b b a b  a a a   a a b a a a b  a a b  b a   a b  b  a a a b b a Figure 2. Rules for fuzzy inference system.

R E T R A C T E D
WR is defined as the ratio of the number of words detected to the total number of words.
no.of words detected WR total no.of words = In [25], five experiments were done and the CR ranges from 58.7% to 64.9% and WR ranges from 77.4% to 85.6%. In ICDAR 2003 competitions [15], the WR rate for the best performing system was 52%. In [1] Word Recall rate is 79.04%. For our system WR rate is 82% and CR is 63%. The result shows that fuzzy is more efficient than other edge detectors. For example: the canny operator cannot differentiate between real and unreal edges. So, it leads to low performance. But, in fuzzy edge detector it reduces noises in images and it improves blurred pixels to detect the edges of letters.
The smudge image pixel is improved by applying different threshold values to images. Therefore, fuzzy edge detector detects edges perfectly. Figure 3 illustrates the comparison between canny edge detector and our proposed fuzzy edge detector. For the performance evaluation of our system, we first highlight the fuzzy detector output on the input image. Our fuzzy detector is evaluated based on the PSNR metric and is compared against the canny edge detector. We are comparing with canny edge detector because most of the algorithms [1] use canny edge detector along with SWT.
From Figure 3 it can be observed that the obscure, blurred, slanting, inclined texts are veraciously succoured by the fuzzy edge detector. To assess the quality of the edge detected image PSNR value is taken as a metric to compare the quality of the image.
The PSNR is given by the following, 2 10 PSNR 10 log MSE where, R is the pixel range, MSE is the mean square error. The PSNR value is prognosticated for the original image across canny and proposed fuzzy edge detector. The PSNR block computes the peak signal-to-noise ratio between two images and is represented in decibels. The higher value of the PSNR, better the quality of the reconstructed image. From the PSNR calculation it is seen that fuzzy edge image has the higher PSNR value compared to canny edge detector. So this proves that the proposed Fuzzy edge detection method performs well. We performed PSNR calculation for almost 256 images which are procured from ICDAR 2003 dataset. In Figure 4, the higher peaks denotes better edge detection in fuzzy than canny edge detector. We then perform SWT technique on the image obtained from the fuzzy edge detector. Our system was able to detect text in many exigent scenarios such as non homogeneous background, shadowy & blurry images, slanting text, inclined text, non horizontal text, curved text, fonts of varying size. The examples shown in Figure 5 are taken from the ICDAR 2003 dataset. We have less false negatives in Figure 5. Figure 5 shows the results of our system on complex scenarios such as shadowy text, text overlapped on another text, text with small font size, non-horizontal text, and text with big font size. Green rectangles depict the text detected. Figure 6 depicts the cases of fancy fonts where the green rectangles in (a)-(g) depict the text detected and (h)-(n) depicts the finally detected text based on our algorithm. Figure 7 depicts the examples of complex scenarios which include shadowy text, text overlapped on another text as well as text with small font, nonhorizontal text and text with big font. Figure 8 shows the results of failure cases. The failure for the text detection were due to too much blur in the text, text that is not clear, text with too small a font size and text with too much curve. We also tested our system on cute80 dataset [26]. The dataset contains 80 images exclusively of curved text. The example results of our system on cute80 dataset are shown in Figure 9. The performance of our system on cute80 dataset is as follows: Precision 67% and Recall 68%.
Performance comparison of text detection algorithms are provided in Table 1 which gives the comparisons of precision, recall used by various algorithms which were tested on the ICDAR dataset. We have compared our system with other algorithms where the results are evaluated based on the ICDAR dataset.     . Examples of curved text detection. Samples are taken from cute 80 dataset [26].

Conclusion
In this paper we presented a new approach for text detection using fuzzy edge detection along with Stroke Width Transform (SWT). We first highlight the fuzzy detector output on the input image. It can be observed that the obscure, blurred, slanting, inclined texts are veraciously succoured by the fuzzy edge detector. Our fuzzy detector is evaluated based on the PSNR metric and is compared against the canny edge detector. We are comparing with canny edge detector because most of the algorithms use canny edge detector along with SWT. We then perform SWT technique on the image obtained from the fuzzy edge detector. We test our system on ICDAR 2003 dataset. Our system was able to detect text in many exigent scenarios such as non-homogeneous background, shadowy & blurry images, and text with fancy font, slanting text, inclined text, non-horizontal text, curved text, fonts of varying size. Our system is evaluated along with precision, recall with the following metrics: character recall and word recall. We also tested our system on cute80 dataset. The dataset contains 80 images exclusively of curved text. For our system WR rate is 82% and CR is 63% which demonstrate good potential in detecting text from images.