Text Rank for Domain Specific Using Field Association Words

Text Rank is a popular tool for obtaining words or phrases that are important for many Natural Language Processing (NLP) tasks. This paper presents a practical approach for Text Rank domain specific using Field Association (FA) words. We present the keyphrase separation technique not for a single document, although for a particular domain. The former builds a specific domain field. The second collects a list of ideal FA terms and compounds FA terms from the specific domain that are considered to be contender keyword phrases. Therefore, we combine two-word node weights and field tree rela-tionships into a new approach to generate keyphrases from a particular domain. Studies using the changed approach to extract key phrases demonstrate that the latest techniques including FA terms are stronger than the others that use normal words and its precise words reach 90%.


Introduction
The knowledge available through a web is infinite most days. It frequently includes data of great quality in the form of online pages. But identifying relevant information automatically and choosing the highest set of data for a specific information need isn't an easy task. Text Rank is a natural language rating algorithm based on the general concept of a graph [1] [2] [3]. Essentially, Page Rank runs on a diagram precisely designed for an exactly NLP task. For keyphrase extraction [4] [5], it builds a graph using some set of text units as vertices. Keyphrase collection is essential for too many problems relating to NLP, along with also, in order to obtain a decent major party key phrase, we should use the stop-word list to erase stop words [8] [9]. Third, then choose key phrases from the selection key sentence menu utilising supervised or unsupervised methods.
The remainder of the paper is organized as follows: Section 2 illustrates FA words and their methodology for extraction. Text Rank for extracting of FA words is defined in Section 3. Section 4 shows corpus construct and experimental results.

Field Association Words
All traditional methods of text classification and document similarity are based on word information in the whole documents. The key idea in our new study is to extract a new term called (FA) words that can recognize fields by using specific words without reading the whole document. For example, word "election" can indicate the document filed "Politics". Document fields can be decided efficiently if there are many FA words and if the frequency rate is high. Therefore, five levels of FA words can be described.
Traditional method was building FA words by adding new FA word candidates to FA word dictionary manually, but there are many FA words not appended to the dictionary, and much time needed to revise that dictionary. A new method for selecting English field association terms of compound words and a method to append these FA words to that dictionary automatically in [10]- [17]. Using these specific words and new FA words dictionary, our target is to make a new research in all old information retrieval areas (Ex. Document classification and  [20] and passage retrieval [21], and holds a lot of potential for applications in natural language processing and information retrieval. Therefore, this chapter presents a method to extract candidates for FA words from large domain specific corpora.

FA Words
Definition 1: A standard FA word indicates a minimal unit (word) with intended meaning defining a given area. A field tree which contains 11 superfields, 70 median fields and 321 terminal fields (subfields) is used in our analysis. In Figure 1, for example, the path defines superfield "electronics".
Every FA Word is connected within a dynamic field tree to a particular specialty the one shown in Figure 1. Since an FA Term may relate to even more only one field, it is likely at much more than one node that the very same FA Term may be related to the field tree. In the FA Words database, its degree reflects that an FA Word belongs to more than one area or not.

Levels of FA Words
Many FA Words may know a given field individually, although some FA Words might refer to several or even more domains. Therefore each FA concept has a broader focus for associating with a sector. So take that into consideration, FA Terms are graded into five distinct levels [11] depending on how well they represent specific fields as illustrated in Table 1 Table 1 shows some examples of FA words and their ranks. The word "Microelectronics" in the field association path "<technology\electronics>" considered PFA words. The word "Biocontrol" in the field association path <technol-ogy\biological science> and <technology\agriculture> considered SPFA.

Comparison with Traditional Words
We mean either index term or terminology by traditional Words. An index phrase is a term which takes the meaning of a document's subject matter and is usually used in document retrieval [22]. Index word constitutes a standardized vocabulary for vocabulary use and is used as keywords for retrieving documents and text in an information system such as a catalog or search engine.
Via contextual expression compares the efficiency, for the purpose of recording and promoting correct usage [23]. It is commonly used in translation and in the representation of knowledge in a given domain. However, the words "terminology" and "index terminology" are often used interchangeably as in [22] [24].

Improving Text Rank Using FA Word Extraction
Therefore, the words to rank are sequences of one or more lexical term extracted from text, and these describe the vertices assigned to the text graph [4] [21]. Any relationship that can be established between two linguistic items is a useful link (edge) that can connect two vertices of this nature. We use a co-occurrence relationship, streaked by the distance between occurrences of words: joi is two vertices joint if they're identical lexical units co-occur within a field of maximum words. Co-occurrence links obvious relations between syntactic elements.

FA Word Weights
If FA words are often used in papers to index each document collection, database i D may be represented as a vector of terms in which the document-term weight is represented [12]. A weight of 0 is assumed for terms not assigned to a given document.
Two key words-weight elements of FA exist: The formula of Text Rank is proposed in [12] shown in the formula (1).
The d value is normally set at 0.85. jk FAw is the weight of the edge from the prior node j V to the current node k V .

( ) j
In V is the set of nodes that point to it (predecessors).
( ) k Out V is the set of nodes that node j V points to (successors).
is the summation of all edge weights in the previous node k V .

Text Rank Domain Specific Algorithm
Algorithm 1 describes our method for ranking text for specific fields. the inputs for this algorithm is the selected set of FA words and threshold ∝. in this algorithm we calculate the concentration ratio as follows, For the parent <S>, the child field <C>, the concentration ratio(Concentration (w, <C>)) of the FA word w in the field <C> is defined as in line 4. For the root = <S>, the child field = <S/C> of the field tree, the Formula in line 5 is used to judge whether or not the word w is a Perfect-FA word. After that, calculate the weight of these words in step 6. And repeat these steps for semi-perfect FA words at steps 7, 8 and 9.

Corpus
Field association terms serve as a highly simplified description for a domain, which can be used as domain labels and used in text classification, retrial of in-

Experimental Evaluation
Our experiment considers a comparison for Text Rank using normal keywords and FA words.  Figure 4.
From the evaluation results it turns out that the best performance is recorded in Text Rank with FA-words as obvious in Table 2. Moreover, the calculation of F-measure for each class separately using FA-words are more accurate than normal keywords.

Conclusions
Extraction of domain keyphrase is important for many tasks in the processing of language by nature. This method extracted keyphrases from all domains. We analyzed the development of Text Rank using field association words and a node weight. Experiment indicates that the Text Rank accuracy will grow markedly with Algorithm 1. Experiments show that the PFA and SPFA weights are of the highest precision when we extract Top words from a domain corpus.
Experiments also show that FA Text Rank's extraction precision will hit 90 percent when our corpus extracts keyphrases.