Error Searching System with Keyword Extraction and Keyword Fuzzy Matching

Abstract

This paper has proposed an error searching method to search the solutions of errors that occurred in the unified commanding platform mix-deployed software (UCPMD). Because those errors belong to different stages or may be happened in different services, applications, IP ports, system software, or different versions of software, and those errors are also can be classified into different types. It is necessary to locate accurate reason that cause an error as well as find out its solution. The proposed error searching system applies Chinese keyword extraction and Chinese fuzzy matching between keywords, which considers the processed keywords as the index to find out the solutions of errors. Besides, the error searching system had made correspondence among errors, reasons, and solutions, and put them to different categories in terms of their characteristics, such that it is easy to manage, search, and use. Among others, we have added specialized thesaurus as the index of keywords, which enriches and completes the searching results. Because of the proposed error searching system evolves keyword extraction and keyword fuzzy matching technologies; it is more accurate to find out user-interested solutions.

Share and Cite:

Yang, F. , Dong, Z. and Liu, L. (2017) Error Searching System with Keyword Extraction and Keyword Fuzzy Matching. International Journal of Communications, Network and System Sciences, 10, 219-226. doi: 10.4236/ijcns.2017.105B022.

1. Introduction

Unified commanding platform mix-deployed software (UCPMD) integrates 22 sub-system from 4 different institutes, including 92 different software. Because of the differences of underlayer protocol and the differences of standard, there are many errors occurred during the stages of setup, configuration, and operation, which seriously affect the usage. Moreover, because those errors are various, which may be happened in different operation phases, stages, TCP/IP communication protocol layers, sub-sys- tem software, it is necessary to design a database system which can manage those errors. The proposed method provides a design of error searching database, which can search the errors occurred in the stages of setup, configuration, and operation, and also provides the reason that cause the error as well as the corresponding solution. The proposed method effectively finds out the solutions of errors occurred in the UCPMD platform.

2. Related Work

2.1. Keyword Extraction

We applied IK Analyzer [6] to extract keyword. IK Analyzer is a lightweight Chinese participle and open source develop toolkit based on Java, which combines dictionary participle as well as sematic participle. It adopts “forward iteration finest-grained participle algorithm” [7], to support two ways of participle mode, which are fine-grained and intelligent participles. Intelligent participle supports simple process of ambiguity exclusion [8] and combined output for quantifiers. Besides, IK Analyzer adopts multi-processor analysis mode [7], which can support English letters, digitals, and Chinese characters, etc.

However, this method can only separates words from text, even the unnecessary words, such as “a”, “as”, “of” etc. It cannot extract meaningful words from the separated words. The good news is that it allows user to configure self-defined “extension stop dictionary” which can make the separation more intelligent.

2.2. Keyword Fuzzy Search

Lucene is a developing toolkit for full text search engine [7], which supports for Java development. It provides Fuzzy searching (FuzzyQuery) function [9]. The reason why this paper applied FuzzyQuery for fuzzy searching is because FuzzyQuery makes use of similarity matching, which can recognize two similar words. FuzzyQuery makes use of the best string matching technical based on Damerau-Levenshtein Distance algorithm [10] to compute the transfer steps from one word to another, which is considered as the basis of marking similarity. If the similarity is less than a set value (normally, the value is 0.5), then the two words are considered as similar.

3. Proposed Method

3.1. System Function Design

Figure 1 shows the system function design. The error database system contains two modules, which are search engine and database, where database has three subfunction modules explained as follows.

1) Data import/enter

This function supports two ways of importing data. One is importing by Excel file directly, and the other is entering data by administrator.

2) Keyword fuzzy search

Figure 1. System function diagram.

have similar meaning, then they are still similar. By using two ways of fuzzy matching, the searching results are more accurate; otherwise, parts of searching results are missing, which might be the solution of the searched errors.

And the results are “百姓”、“基层” (“common people”, “grass roots”) if we apply the second way of fuzzy matching.

Search engine includes 4 sub-function modules.

1) Keyword extraction

2) Sorting for the searching results

This method sorts the searching results according to matching degree.

3) Second search

4) Visualization of searching results

This is to display the searching results visually.

3.2. Flowchart

Figure 2 gives the whole flow chart of the error searching system which contains 5 stages, which are keyword extraction, fuzzy matching, find out error ID, soring searching results, and visualization of searching results, respectively. As searching process is the focus of this paper, which only evolves stage1, 2 and 3, so we explain the 3 stages in more detail as Table 1.

Figure 2. System flowchart.

4. Experiments and Prototype

This paper has implemented a prototype of this searching system for verifying if the searching is valid and useful.

We have shown the prototype by Figure 3-6. Figure 3 shows the searching input interface, where user can type an error description in the edit box in the middle of this page, and click “故障诊断” (error diagnose). Or the user can use advanced search to refine the search contents by selecting advanced options, including error stages, type of error, error layers, occurred system, occurred soft

Table 1. Explanation on Database searching flowchart.

Figure 3. Search input interface.

Figure 4. Search results sorting page.

ware, etc. And Figure 4 shows the searching results sorting page. After searching for keyword “系统” (system), the searching results are list as Figure 4. By clicking any searched item in the list, such as the 3rd one. The reason that causes this error is shown as Figure 5. By clicking the reason shown in Figure 5, the corresponding solution is shown as Figure 6. User can search out interested results in this way.

5. Conclusion

This paper has proposed an error searching method to search the solutions of errors that occurred in the UCPMD. This method applies Chinese keyword extraction and Chinese keyword fuzzy matching technologies to find out user interested searching results. The searching results come from errors, reasons, and solutions, which means as long as an indexed keyword appears in any of the descriptions of errors, reasons, solutions, the corresponding set of error, reason,

Figure 5. A selected error and the reason that causes the error.

Figure 6. A selected error, the reason that causes the error, and the corresponding solution.

and solution would be listed in the searching results. We also provide a prototype of the method to show the effectiveness and correction of this method.

Conflicts of Interest

The authors declare no conflicts of interest.

References

[1] Wang, J.H., Fang, M.D., Gao, J.D., Lu, H.Y. and Dai, C.B. (2006) Basic Principle and Application of On-Board Diagnostics for Gasoline Fuelled Vehicles. Automotive Engineering, 28, 491-494.
[2] Lu, C., Yang, Y.-H. and Xu, G.-M. (2008) Exploitation of Computer Problem Repair and Require on Web System.
[3] Wang, L.-X. and Huai, X.Y. (2012) Semantic-Based Keyword Extraction Algorithm for Chinese Text. Computer Engineering, 38.
[4] Fang, J., Guo, L. and Wang, X.D. (2008) Semantically Improved Automatic Keyphrase Extraction. Computer Science, 35.
[5] Wang, J.-F., Wu, X.-J., Xia, Y.Q. and Zheng, F. (2007) An Approx-imate String Matching Algorithm for Chinese Information Retrieval Systems. Journal of Chinese information Processing, 21.
[6] Bai, Y.-C., Fu, W. and Xin, Y. (2014) Research and Simulation of Distributed Search Engine Based on Hadoop and Nutch. The 19th National Young People Communication Academic Annual Symposium.
[7] Gao, C.J. (2013) Research on Lucene Search Engine Based on PSP-BP Neural Network. China University of Petroleum (East China), Master Degree Thesis.
[8] Liu, Y.Z. (2005) Research on Chinese Auto Participle Exclude Ambiguity Algorithm. Chongqing University, Master Degree Thesis.
[9] Hu, H.B. (2015) The Implementation of a Variety of Sorting Methods Based on Lucene. Computer Knowledge and Technology, 11, 57-59. http://en.wikipedia.org/wiki/Damerau-levenshtein_distance

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.