Error Searching System with Keyword Extraction and Keyword Fuzzy Matching ()
1. Introduction
Unified commanding platform mix-deployed software (UCPMD) integrates 22 sub-system from 4 different institutes, including 92 different software. Because of the differences of underlayer protocol and the differences of standard, there are many errors occurred during the stages of setup, configuration, and operation, which seriously affect the usage. Moreover, because those errors are various, which may be happened in different operation phases, stages, TCP/IP communication protocol layers, sub-sys- tem software, it is necessary to design a database system which can manage those errors. The proposed method provides a design of error searching database, which can search the errors occurred in the stages of setup, configuration, and operation, and also provides the reason that cause the error as well as the corresponding solution. The proposed method effectively finds out the solutions of errors occurred in the UCPMD platform.
2. Related Work
2.1. Keyword Extraction
We applied IK Analyzer [6] to extract keyword. IK Analyzer is a lightweight Chinese participle and open source develop toolkit based on Java, which combines dictionary participle as well as sematic participle. It adopts “forward iteration finest-grained participle algorithm” [7], to support two ways of participle mode, which are fine-grained and intelligent participles. Intelligent participle supports simple process of ambiguity exclusion [8] and combined output for quantifiers. Besides, IK Analyzer adopts multi-processor analysis mode [7], which can support English letters, digitals, and Chinese characters, etc.
However, this method can only separates words from text, even the unnecessary words, such as “a”, “as”, “of” etc. It cannot extract meaningful words from the separated words. The good news is that it allows user to configure self-defined “extension stop dictionary” which can make the separation more intelligent.
2.2. Keyword Fuzzy Search
Lucene is a developing toolkit for full text search engine [7], which supports for Java development. It provides Fuzzy searching (FuzzyQuery) function [9]. The reason why this paper applied FuzzyQuery for fuzzy searching is because FuzzyQuery makes use of similarity matching, which can recognize two similar words. FuzzyQuery makes use of the best string matching technical based on Damerau-Levenshtein Distance algorithm [10] to compute the transfer steps from one word to another, which is considered as the basis of marking similarity. If the similarity is less than a set value (normally, the value is 0.5), then the two words are considered as similar.
3. Proposed Method
3.1. System Function Design
Figure 1 shows the system function design. The error database system contains two modules, which are search engine and database, where database has three subfunction modules explained as follows.
1) Data import/enter
This function supports two ways of importing data. One is importing by Excel file directly, and the other is entering data by administrator.
2) Keyword fuzzy search
have similar meaning, then they are still similar. By using two ways of fuzzy matching, the searching results are more accurate; otherwise, parts of searching results are missing, which might be the solution of the searched errors.
And the results are “百姓”、“基层” (“common people”, “grass roots”) if we apply the second way of fuzzy matching.
Search engine includes 4 sub-function modules.
1) Keyword extraction
2) Sorting for the searching results
This method sorts the searching results according to matching degree.
3) Second search
4) Visualization of searching results
This is to display the searching results visually.
3.2. Flowchart
Figure 2 gives the whole flow chart of the error searching system which contains 5 stages, which are keyword extraction, fuzzy matching, find out error ID, soring searching results, and visualization of searching results, respectively. As searching process is the focus of this paper, which only evolves stage1, 2 and 3, so we explain the 3 stages in more detail as Table 1.
4. Experiments and Prototype
This paper has implemented a prototype of this searching system for verifying if the searching is valid and useful.
We have shown the prototype by Figure 3-6. Figure 3 shows the searching input interface, where user can type an error description in the edit box in the middle of this page, and click “故障诊断” (error diagnose). Or the user can use advanced search to refine the search contents by selecting advanced options, including error stages, type of error, error layers, occurred system, occurred soft
Table 1. Explanation on Database searching flowchart.
ware, etc. And Figure 4 shows the searching results sorting page. After searching for keyword “系统” (system), the searching results are list as Figure 4. By clicking any searched item in the list, such as the 3rd one. The reason that causes this error is shown as Figure 5. By clicking the reason shown in Figure 5, the corresponding solution is shown as Figure 6. User can search out interested results in this way.
5. Conclusion
This paper has proposed an error searching method to search the solutions of errors that occurred in the UCPMD. This method applies Chinese keyword extraction and Chinese keyword fuzzy matching technologies to find out user interested searching results. The searching results come from errors, reasons, and solutions, which means as long as an indexed keyword appears in any of the descriptions of errors, reasons, solutions, the corresponding set of error, reason,
Figure 5. A selected error and the reason that causes the error.
Figure 6. A selected error, the reason that causes the error, and the corresponding solution.
and solution would be listed in the searching results. We also provide a prototype of the method to show the effectiveness and correction of this method.