^{1}

^{*}

^{1}

^{*}

Pattern recognition is a task of searching particular patterns or features in the given input. The data mining, computer networks, genetic engineering, chemical structure analysis, web services etc. are few rapidly growing applications where pattern recognition has been used. Graphs are very powerful model applied in various areas of computer science and engineering. This paper proposes a graph based algorithm for performing the graphical symbol recognition. In the proposed approach, a graph based filtering prior to the matching is performed which significantly reduces the computational complexity. The proposed algorithm is evaluated using a large number of input drawings and the simulation results show that the proposed algorithm outperforms the existing algorithms.

In pattern matching application, a retrieval of input patterns from a database of model patterns is the most important issue. In graph based recognition techniques the model symbols and the input images are also represented using the primitive graph or by a set of sub graphs. These sub graphs may be considered as a pattern for symbol recognition. In this context, the graph based symbol recognition is the pattern matching task. Graph is always a versatile tool for representing and modeling the patterns. Graph can contain huge amount of data due to its representational power. So they are preferred to represent objects with a huge number of attributes. Finding object similarity is an important issue in many applications such as pattern recognition, information retrieval and data mining etc. Design strategy of any pattern recognition or matching algorithm depends on the representation of the object under consideration and attributes of the object. If graphs are used for object representation, then the problem of determining the similarity of objects becomes the problem of graph matching.

The general form of graph matching problem includes the graph and subgraph matching algorithms. An inevitable disadvantage of most of graph matching techniques is that they all suffer from high computational complexities [

In all classes of graph matching, subgraph isomorphism is a NP-complete problem [

In [

Although several statistical techniques have been proposed for symbol recognition [

The basic concept behind our algorithm is to save expensive graph matching operations by performing prior graph database filtering. The main purpose of our work is to propose a graph based novel algorithm for graphical symbol recognition. The major application of our proposed work is to reconstruct or restore historical or archeological survey plans or drawing which are available as input images which may be distorted or corrupted due to aging effect. Another application is to automate symbol recognition of online web application like copyright protection for trademark symbol, product review or sign language interpretation. In brief, we want to propose an algorithm for model symbol recognition where the model symbol may be regular or irregular shape like window, shelf, bed or any architectural symbol etc. These model symbols can be represented using primitive graphic structures like triangle, rectangle or irregular shape geometry etc. It allows one to represent original object in a better way to ease the process of pattern extraction and pattern matching. Here, only assumption is that any regular or irregular shape may be represented by the graph.

In the context of above discussion, we exploit the graph to represent any model symbol in our input image. Throughout the work, the terms, the primitive graphic symbol or model symbol, are used interchangeably with the meaning that it forms a single closed region or an entity or a set of closed regions and a set of entities which can be used as a model graph when the given graphical symbol object or input drawing is decomposed and represented by the graph. In short, from a given object the features or attributes are extracted and these features or attributes can be described using the graph. For example, a simple triangle has three sides and three angles which form an entity or a closed region with sides and angles as features or attributes of a given object. This triangle becomes a graph with a single vertex, representing a closed region with angles, area, sides as attributes of the vertex.

In our proposed algorithm, we use such graphical symbol objects or input drawings as an input which are decomposed into a set of closed regions or entities as described above and stored into the database. For each entity, we extract the features like area of the shape, angles if any, sides, centroid etc. which are nothing but attributes of an entity or primitive graph model symbol. The database with these decomposed objects is called as a graph database. Another important feature that we extract from the input is region adjacency graph (RAG) which provides neighborhood information for a set of closed regions or entities that how they are interconnected with each other. This is also stored as an attribute of a vertex along with other attributes. The RAG formation is the process of establishing a relationship between regions with the help of extracted features.

Our main task is to use these features and attributes to filter out the graph database to reduce to number of model graph comparisons with a test input. To perform this, we index our database using the features and attributes of a vertex. Here, the meaning of indexing is to reorganize the graph database by grouping of model symbols having similar entities or closed regions as a part of given model symbol. Then, searching is performed using this indexed database which is nothing but filtering of the database. This process is explained as shown below in _{1} which is a staircase and S_{2} is a shelf. S_{1} is decomposed into a set of rectangles and S_{2} is decomposed into four triangles as a set of closed regions or a set of entities, respectively. These model symbols are preprocessed and represented as region adjacency graphs. In RAG, each closed region is a

vertex. The adjacent regions’ attributes are represented by the edge between them. As depicted in _{1} is a model staircase, S_{2} is a shelf and S_{n} is a bed. The vertex of RAG represents a closed region in the symbol. The edge between two vertices in a RAG represents the neighbourhood attributes of the two adjacent regions in the symbol. The staircase contains all same regions which are rectangular in nature. The RAG of staircase is shown in red colour in a symbolic way. It indicates that all the vertices are of same type. So all the vertices are in red colour. The adjacency of the vertices is shown by the edges. As all the vertices have similar attributes, all the edges are also carrying similar edge properties; hence, all edges are red in color.

The symbol S_{2} is a shelf, which contains four similar closed regions which are triangular in nature. The RAG of shelf in _{n} shown in

After representing the model symbols as RAG, the objects are indexed in the database based on the shapes. We propose two variants, linear filtering and the hierarchical filtering. The linear filter filters the symbols in a linear manner, based on the number of shapes. If the model symbols contain exclusive shapes, linear filter performs well. If the input image contains almost all symbols, linear filter performs as good as no filtering of the database. The hierarchical filter arranges the symbol indices based on hierarchy of shapes. It performs well even if the input image is a good mix of all types of shapes.

For example, S_{1} and S_{2} are indexed using rectangle and triangle which allows one to do further filtering. If a given input object is decomposed such that one of the closed region is a rectangle then directly it searches to the subgroup of objects in the database which is indexed using rectangle which avoids a linear filtering or comparison with all objects stored in the database. If a given object is having different types of closed regions then it is indexed using each type of closed region which it consists of. The advantage of such representation is that for searching a given object; the algorithm explores only the subgroup of objects for which an index is matching, which is nothing but filtering out the large number of objects from database being compared. The filtering algorithms are elaborated in the next section. It also is important to note that the features or attributes can be of great help to filter the database. Given a graph database, the salient features are extracted by a feature extraction procedure. The procedure itself should be fast and efficient. The selection of features should be based on the nature of application. Once the features of the graph are extracted, then the database is filtered and only the selective graphs are matched against the input graph. So the matching of an input graph against all prototypes in the database is avoided. If the filtering process is fast enough, the graph matching algorithm saves considerable number of comparisons.

Moreover, the feature extraction must be efficient and fast. The selection of features to extract and to match, affect the filtering and graph matching procedure. The features should have the ability to discriminate between as many graphs in the database. If the given feature can distinguish more graphs, less graph candidates remain after filtering and potentially it achieves efficiency for the expensive matching task. For example, consider feature, constituent shapes of the model symbol. Consider two symbols S_{1} and S_{2} as shown in _{1} contains only rectangles while S_{2} contains only triangles. If the query image contains only rectangle, it has to be compared with S_{1}. If the query image contains only triangle, it has to be compared with S_{2}. If the query image contains both, rectangle and triangle, it has to be compared with both, S_{1} and S_{2}. Hence, the selection of feature shape, can play significant role in database filtering and matching. For filtering, graph, sub graph or error tolerant sub graph isomorphism [

For graph based symbol recognition the query image is given as input to the algorithm. The input drawing may consists of many graphical symbols. The feature extraction is performed. The closed regions are extracted. The input image is represented as a RAG. The graph based symbol database contains the dictionary of all symbols used in the application. These symbols are called database model symbols. The graph based filtering and matching is performed to identify the occurrence of model symbols in the input image. _{1} and the model symbol graph G_{M}, graph based filtering is performed. The filtering algorithm selects few promising symbols from the model database and filters out the non-promising ones. The graph matching algorithm matches the input graph with the selected promising model symbol graphs.

In our approach, we propose two types of the filtering, namely, graph based linear and hierarchical filtering for graphical symbol recognition. The overall symbol recognition is achieved in three steps as preprocessing, graph filtering followed by the graph matching. The filtering algorithm filters the model symbol database. If the model symbol G_{M} is likely to be similar with the symbols of the input image, it selects G_{M} as promising symbol else it is non-promising. We perform a shape based graph database filtering of the database. For filtering we consider two approaches as 1. Shape based graph database linear filtering and 2. Shape based graph database hierarchical filtering.

After the graph database filtering, the number of model symbols to be matched with the input graph reduces considerably, which reduces graph matching time. We use region adjacency graph (RAG) based string growing algorithm [

use RAG based string growing algorithm [

In this step, the input image and the model symbols of the database are preprocessed. It performs primitive operations like thinning, bridging and identification of junctions and end points, region separation etc. for the model symbols and input image. It extracts features like edges, corners, junctions, end-points, closed regions in the image. Each closed region is treated as a basic constituting entity in the symbol or input image. The structural features like area of region, centroid and shape of region etc. are extracted for each region. After region wise feature extraction, the symbols and input image is represented as a RAG. Each closed region is represented as a vertex of the graph. All the neighbouring regions are adjacent to each other with their extracted attributes. The relationship between these regions or vertices and their attributes is important for further processing.

The extracted features of the regions are represented as labels or attributes of the vertices or region in the RAG. After RAG formation each symbol or image is treated as a region adjacency graph with vertices as regions and edges as their inter-relationships. The perfection in feature extraction leads to more accuracy in the graph matching. The results of preprocessing the input image are shown in

RAGid | Square | Rectangle | Triangle | Parallelogram | Quadrilateral | Rhombus | Trapezoid |
---|---|---|---|---|---|---|---|

Bed | 0 | 1 | 2 | 0 | 0 | 0 | 0 |

Pillar | 0 | 0 | 2 | 1 | 0 | 0 | 0 |

Shelf | 0 | 0 | 4 | 0 | 0 | 0 | 0 |

Staircase | 0 | 6 | 0 | 0 | 0 | 0 | 0 |

Stove | 1 | 0 | 4 | 0 | 0 | 0 | 0 |

Table | 0 | 9 | 0 | 0 | 0 | 0 | 0 |

WC | 0 | 2 | 0 | 0 | 0 | 0 | 0 |

Window | 1 | 3 | 0 | 0 | 0 | 0 | 0 |

Door | 2 | 0 | 0 | 0 | 0 | 0 | 1 |

TV | 0 | 2 | 0 | 0 | 0 | 0 | 2 |

RAGid | Nr of Region | Nr of Edges | Symbol Name | Priority |
---|---|---|---|---|

Shelf | 4 | 4 | Shelf | 4 |

Staircase | 6 | 7 | Staircase | 3 |

Bed | 3 | 3 | Bed | 2 |

Window | 4 | 5 | Window | 1 |

Pillar | 3 | 2 | Pillar | 7 |

Table | 9 | 8 | Table | 5 |

Stove | 5 | 4 | Stove | 6 |

TV | 4 | 3 | TV | 8 |

Door | 3 | 2 | Door | 9 |

The prototype symbols and the input image are both preprocessed and structural features like edges, junctions, corners, end points, closed regions, area, centroid, shape of region, neighborhood information of regions etc. are extracted from those images.

For example,

After preprocessing, we apply graph based filtering algorithm which filters the database of model symbols with respect to the input image. It causes reduction in the number of symbols to be tested in the later phase of graph matching. It reduces the number of model symbols that are to be compared with the input image during graph matching. This filtering is done based on the constituent shapes of the input image and the constituent shapes of the model symbol graph database. The filtering algorithms are explained below.

Linear Filter

In linear filtering, we identify the constituent shapes in the input image. The model symbol database is already equipped with all the features of all the model symbols during the preprocessing step of the database. In linear filtering the prototype symbol shapes and input image shapes are compared linearly. If the constituting symbol shapes match with each other then only further graph matching is performed. If the constituent shapes of both prototype graph and input graph do not match further graph matching is avoided.

As shown in _{1}, a “staircase” and symbol S_{2} as a “stove”. The symbol S_{n} is shown as a “bed”. During the preprocessing, database records that S_{1} comprises six rectangular shapes, S_{2} constitutes four triangles and S_{n} comprises one rectangle and two triangular shapes. Consider, “bed” is the input image. Preprocessing of input image identifies that the input image contains one rectangle and two triangular shapes. So, at the time of graph matching, S_{1}, S_{2}, S_{n}, all three model symbols are compared with the input image as all three of them are containing either a rectangle or a triangle, which is there in the input image. If input image is a “staircase”, preprocessing of input image concludes that input comprises six rectangular shapes. So the matching is performed with S_{1} and S_{n} only as they are having rectangle as one of their constituents. Comparison of the input with S_{2} is avoided as S_{2} do not have rectangle as its constituent. If input image is a “stove”, comparison of input with S_{1} is is avoided as S_{1} do not have triangle as its constituent.

Hierarchical Filter

In hierarchical filtering the basic constituent shapes are arranged hierarchically in the database. The individual basic constituent shapes are at level 0. At level 1, a combination of any two shapes is considered for filtering. At level 2 a combination of any 3 shapes is considered and so on the hierarchy goes on. The advantage of this type of hierarchy is that the shapes that do not occur in the

input image are skipped in graph matching algorithm, which further reduces high cost graph matching operations.

approach provides an efficient solution for graph based matching. Few model symbols from the database are shown in

After the preprocessing and database construction, the graph based symbol matching is performed. For graph based matching, the RAG based string growing algorithm is applied for the architectural symbol recognition. An architectural plan containing various architectural symbols is given as an input. The identification of various model symbols from the database is performed in this step by using the string growing algorithm [

1) Read each prototype symbol linearly from the database.

2) For each closed input region perform polygon matching. Find substitution, shift and scaling cost.

3) Consider the regions, which are below threshold level for further matching.

4) Continue region matching till the prototype symbol is identified otherwise no match is found.

5) Conclude that the symbol is matching.

We evaluated our proposed algorithm using architectural drawings. The input architectural plans are line drawing images. We consider both types of architectural images, 1) normal images, where all symbols are properly visible and 2) distorted images where the symbols are error-prone. For our implementation we have a model symbol database which constitutes the large number of model symbols. The few such architectural model symbols are depicted in

Scenariol 1: Symbol detection with graph matching without database filtering; Scenariol 2: symbol detection with linear filtering the graph database and Scenariol 3: symbol detection with hierarchical filtering of graph database.

During preprocessing and feature extraction, we introduce the tolerance value to determine accurate features of the image.

vertices with increasing tolerance value is shown plotting the values in

After the preprocessing, the model symbol database is reorganized in hierarchical manner based on the constituent shapes. The model symbols are indexed on the basis of constituent shapes of the models. When an input image is given, the graph isomorphism algorithm is applied to recognize model symbols in it.

We observe a considerable variation in string edit cost [

system gets a particular level of stability for that tolerance limit. The error tolerance value is subjective to accuracy expected by the various applications. There cannot be any automatic inference method for tolerance which can suit globally to all pattern matching applications. Its estimation is a challenging job which purely depends on the type of application.

number of symbols take more number of comparisons. The conclusion is that on an average 2/3 number of times comparisons are avoided due to filtering than that of without filtering which is significant achievement.

There is an acceleration gain using hierarchical graph filtering. By inclusion of the filtering stage overall efficiency is enhanced for graph based symbol recognition compared to graph matching without filtering.

We compared the time taken for graph processing without using filtering [

and with filtering.

The hierarchical filtering algorithm takes maximum graph processing time in all the three scenarios. We compared the time taken for graph matching without using filtering [

matching time taken for symbol recognition with graph filtering and without graph filtering [

In this work we have proposed and evaluated graph filtering and matching for symbol recognition. Architectural plan images are provided as input to the algorithm. The graph matching is done by region adjacency graph matching [

The authors declare no conflicts of interest regarding the publication of this paper.

Pawar, V. and Zaveri, M. (2018) Graph Based Filtering and Matching for Symbol Recognition. Journal of Signal and Information Processing, 9, 167-191. https://doi.org/10.4236/jsip.2018.93010