Development of a Best Answer Recommendation Model in a Community Question Answering (CQA) System

In this work, a best answer recommendation model is proposed for a Question Answering (QA) system. A Community Question Answering System was subsequently developed based on the model. The system applies Brouwer Fixed Point Theorem to prove the existence of the desired voter scoring function and Normalized Google Distance (NGD) to show closeness between words before an answer is suggested to users. Answers are ranked according to their Fixed-Point Score (FPS) for each question. Thereafter, the highest scored answer is chosen as the FPS Best Answer (BA). For each question asked by user, the system applies NGD to check if similar or related questions with the best answer had been asked and stored in the database. When similar or related questions with the best answer are not found in the database, Brouwer Fixed point is used to calculate the best answer from the pool of answers on a question then the best answer is stored in the NGD data-table for recommendation purpose. The system was implemented using PHP scripting language, MySQL for database management, JQuery, and Apache. The system was evaluated using standard metrics: Reciprocal Rank, Mean Reciprocal Rank (MRR) and Discounted Cumulative Gain (DCG). The system eliminated longer waiting time faced by askers in a community question answering system. The developed system can be used for research and learning purposes.


Introduction
Current Automatic QA frameworks have limited performance which can be enhanced by a framework of aggregate knowledge called Community Question Answering (CQA). In a CQA system, users can ask and answer questions in different classifications [1]. A significant number of web indices and interfaces including Answer.com and Stack Overflow have presented distinctive variants of CQA administration. The process involves an asker posting an inquiry in a CQA framework and afterward askers give answers to the question. After certain number of answers have been gathered, the most appropriate answer can be picked (voted) by users. Subsequent questions and coupling answers are archived in a database. The database supplements online inquiry, as in Naver's Ji-Sik-In (Knowledge iN) that had collected around 70 million sections [2]. In an ideal scenario, a search engine can serve similar questions or use the best answers as search result snippets to handle similar queries. It is assumed that the best answers from CQA services are good and relevant answers are useful for pairing these questions.
Posting and getting answers to a question in a CQA is an important procedure [3]. A user posts a question by selecting a category, and then enters the question subject (title) and, optionally, details (description) as shown in Figure 1.
A question remains valid in a CQA if it belongs to a category, thus can be answered, commented on and voted for. A question can only be answered by another member of a CQA and not the asker. Question status can be closed from receiving answers if the answer received satisfies the asker. If the asker is satisfied with any of the answers, he can choose it as the best, and provide feedback ranging from assigning stars or rating for the best answer, and possibly textual feedback. CQA supposes that in such cases, the asker is likely satisfied with at least one of the responses, usually, the one the asker chooses as the best answer.
The components of CQA services available to users include: 1) A mechanism for question submission; 2) A complementary mechanism to deliver answers; 3) A web-based platform to facilitate users' interactions. In a CQA system, users can ask or answer questions on different topics. This generally attracts numerous responses to single inquiry [4]. These kinds of services exhibit large-scale participation with the inherent challenge of ensuring   [5], comparable inquiry [1], and question quality expectation [6]. Research on CQA administration involves examining the client's experience, intentions, and strategies by which individuals look for and share data. It might likewise include framework advancement for supporting such exercises.
This research leveraged on characteristic advantages and limitations of existing CQA's different approaches to derive a model that pulls votes and closeness among words for the best answer selected in a (CQA) system and implement the resultant model.

Overview of Some Existing CQA
Yahoo! Answers was incorporated in 1995. It is a community-driven QA site which allows users to submit questions and answer questions from other users. It was finally made available for general use in 2006. Members are allowed to earn points based on Naver's Knowledge as an encouragement to participate on the platform. However, the platform is characterized by poorly formed questions and inaccurate answers. Quality of answers given in Yahoo Answers cannot be verified and can be misleading due to the missing relationship between answers given and vote received on individual answers. Voting pattern in Best Answers Recommendation Model is a means to identify misleading information thereby providing relevant answers received via vote to community users. Figure 2 depicts the simplified lifecycle of a question in Yahoo! Answers.
E-How is based on how-to guide which consists of more than 1 million articles, providing users with step-by-step instructions on how to write articles. Any E-How user can give comments to the article submitted, but only the article writers have the privilege to change the content of the articles. Content delivery approach of E-How result in low-quality content and for operating as a content

Related Works
More question answering recommender systems have been developed over time, each having its uniqueness in specified domain worked on, information filtered and data set used. Participant reputation was used in [8] while addressing two research questions, firstly by reviewing different link analysis schemes especially discussing the use of PageRank based methods since they are less commonly utilized in user reputation modeling. They also introduced Topical PageRank analysis for modeling user reputation on different topics. Comparative experimental results on data, released unto the team from Yahoo! Answers, shows that Page-Rank-based approaches are more effective than HITS-like schemes and other heuristics, and that topical link analysis can improve performance. In HITS scheme, [9] identifies two important properties for a web page: hubness and authority, and proposes a mechanism to calculate them effectively. The basic idea behind HITS is that pages functioning as good hubs will have hyperlinks pointing to good authority pages, and good authorities are pages to which many good hubs point. PageRank implemented by [10] is a static ranking of web pages based on the measure of prestige in social networks, hence it could be seen as a random surfer model. Although, the system is good at displaying most ranked users remarks on a question, it does not put into consideration the asker's thought, users' voting methods and other factors that can influence the voting behavior of the system.
Michael et al. [11] developed a system that focused on question generation (QG) for the creation of educational materials for reading practice and assessment. The goal was to generate fact-based questions about the content of a given article. The top-ranked questions could be filtered and revised by educators, or given directly to students for practice. They restricted their investigation to questions about factual information in texts. This type of system makes it possible to generate thousands of questions using Wiki documents, thus forming and generating more data set. The disadvantage of the system shows that, despite the generation of multiple questions and relating those questions to retrieved answers, there is a limit to the scope in terms of restriction to certain subject matters.
Shuo et al. [12], aimed at enhancing question routing algorithms by targeting indicates the degree of similarity of the two corresponding words. One of the advantages of this system was that Wikipedia documents were chosen for constructing the word correlation matrix, since they were written by more than 89,000 authors, with different writing styles, using various terminologies that cover a wide range of topics, and with diverse word usage and content. Furthermore, the words in the matrix are common words in the English language that appear in various online English dictionaries. However, this system shows that a change or modification in the archived document of wiki will affect the ranking and selection of recommended answers to users due to multiple authors.

The Proposed System
The proposed system in this work is described as follows: for each question q in a set of questions Q, q Q ∈ , with corresponding set of answers q A , there exists a group of community members V who are engaged in voting for the best answers. Each member in V selects a set of questions to consider for voting from a pool. Subsequently, for each question, each voter casts a vote for only one of the answers that the question received.
For a question q with a voter, i v making a choice of his answer i a , on question q Q , the voter score, i r is obtained as in Equation (1):

Activity Panel
Activity panel represents the upper layer of the application whose main focus is to provide authentication security to the CQA System. The layer present to the users all available questions and answers related to the category selected by the user.

Process Control Panel (PCP)
PCP is the application layer that handles all forms of request and content parsing that relate with the database. This security measure allows the application to control Movement of Data, Data Authenticity, User Control, Content Control, Content Filtering and Integrity Control of the application software (as shown in Figure 4).

Algorithm Panel
Algorithm panel has two different layers used to process every operation of this CQA.
The algorithms are: 1) Brouwer Fixed Point (BFP) Algorithm; 2) Normalized Google Distance (NGD). In a scenario where we have question q with two different voters, v i and v j , each making a choice of their answers, a i and a j . We find the total sum, R i and R j of votes cast for a i and a j :
The system also provides the total number of users U qi voting on a question denoted as |U|.
Summarizing Equation (1) to find the Fixed-Point value of the voters score r i and r j : We have the voter score with 0 1 i r < < and 0 1 j r < < as: The Fixed-Point Score (FPS) value is given as: r a a U ∈ = = = → ∑ F (6) r is the fixed point for function F.

Voter Score
Summation of all users that selected answer on question Summation of all answers across question To simplify Equations (4) and (7), we have: To determine the Answer Score for the selection above, we must specify the Fixed-Point Scoring (FPS) of individual answers based on the distribution of votes across answers and the scores of voters who cast the votes. Given a question q and its corresponding set of answers: where q A is the size of q A , we calculate FPS as: For each question q we rank the answers according to their FPS and set the highest scoring answer as the FPS best answer.

Normalized Google Distance (NGD)
There is always a need to measure the distance or the relationship between different words in the scope of this work. Shannon information theory was introduced and aimed at providing means for measuring information [14]. More precisely, the amount of information in an object may be measured by its entropy and may be interpreted as the length of the description of the object in some  [15].
The application adopted mathematical model used by Google to search relationship between words in indexed pages. This mathematical model is based on Kolmogorov complexity. The classical notion of Kolmogorov complexity is an objective measure for the information in a single object [16], and information distance measures the information between a pair of objects [17]. Assuming we have a search, term x and y proposed to be used in the NGD Engine, the search engines discover the meaning of words and phrases relative to other words and phrases in the sense of producing a relative semantics between x and y. This is given by where f(x) denotes the number of returned data records containing occurrences of x, f(x; y) denotes the number of records containing occurrences of both x and y, and N denotes the total number of records saved into the database or indexed for the occurrence of x and y.
Let X denote a finite multiset of n finite binary strings defined by { } 1 , , n x x  . We use multisets and not just a set, since in a set all elements are different while here, we are interested in the situation where some or all of the elements are equal.
, , for all , ,1 , For the Google Distribution computation, we have the following: Let the set of singleton Google search terms be denoted by S and s S The application environment (as described in Figure 5) is divided into different programming modules, each module performs different functions depending on the view being passed. The available modules are shown in the block diagram in Figure 6.
User Information Record Base In the CQA application, there exists a database of users in order to enforce security and provide an environment where robots are not allowed to post question or respond to answers in place of human. The record base is divided into two:   2) Personal Data Record Base.
These two record bases are linked together with a unique ID that performs one-one relationship mapping as shown in Table 1. Table 2 and Table 3 describe the different types of users found in the system with the levels of privileges respectively. A sample of users and their privileges is presented in Table 4.
A user can be administrator, member or content moderator. Table 5 and Table 6 describe the voting pattern of User U on Answer A for Question q. While the relationship is presented in Table 5, Table 6 shows the mapping analysis.

Results and Discussion
The proposed system was implemented with Hyper-Text Markup Language 5 (HTML5), Cascading Style Sheet 3 (CSS3), JQuery, AJAX and PHP. Figure 8 and Figure 9 present the screenshots of some of the different modules.         (Figure 9).
Normalized Google Distance Table   Figure 7 shows relationship between used words in the CQA system with Normalized Google Distance computation.
Distance uses this with other table to suggest base on content count and word mapping.

Performance Evaluation
The performance of best answer selection was evaluated using standard metrics: Reciprocal Rank, Mean Reciprocal Rank (MRR) and Discounted cumulative gain (DCG). Also, the data set generated from users' interaction used in the process of evaluation of the developed system and evaluation criteria to selecting an answer in a CQA system are discussed.

Data Set
Experimental evaluation was carried out on the data set collected within and outside the developed system, data was generated from user's interaction with the CQA system over time. 400 students of Adekunle Ajasin University, Akungba-Akoko, Nigeria, participated and records were generated with more than 600 unique questions and series of answers mapped to every question asked in real time. The data consist of individual questions and answers generated from the interaction and the votes received from participating member of the community.

Evaluation Metrics
The system evaluation was carried out using three different metrics (Reciprocal

Rank (RR), Mean Reciprocal Rank (MRR) and Discounted Cumulative Gain
(DCG)) to test the effectiveness of the developed system.

1) Reciprocal Rank (RR) and Mean Reciprocal Rank (MRR)
The mean reciprocal rank is a statistical measure for evaluating any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness, while reciprocal rank is the multiplicative inverse of the rank of the first correct answer. The mean reciprocal rank is the average of the reciprocal ranks of results for sample queries of Q.   Table 7 shows the reciprocal rank of selected answers. The Mean Reciprocal Rank (MRR) of the best answer(s) selected regarding the asker agreement with any selected answers from the pool of answers given is 0.61, which is a fair value for this evaluation.
The MRR of a system can be equal to 1 or less than 1 that is, 1 MRR 1 − ≤ ≤ .
It was observed that the system performs better with limited number of selected answers from highest to lowest rank. Therefore, MRR is given thus:  Table 8 shows that the performance reduced when more results recommended.
2) Discounted cumulative gain (DCG) Discounted cumulative gain (DCG) is the measure of ranking quality. In information retrieval, it is often used in measuring the effectiveness of web search engine algorithms or related applications. Using a graded relevance scale of documents in a search engine result set, discounted cumulative gain measures   [18], as shown in Table 9.
where p denotes rank position and rel i returns the relevance of vote at position i.
Also, the idealized discounted cumulative gain (IDCG) is used in normalizing the discounted Cumulative gain (DCG). However, IDCG works on the basic assumption that items are ordered by decreasing relevance.
Thus the normalized discounted cumulative gain (nDCG) is given thus: The relevance score provided across given answers are: 3, 4, 6, 2, 1, 0. DCG is used to emphasize highly relevant answers appear early in the list.
In perfect ranking algorithm, the DCG will be same as the IDCG producing an nDCG of 1.0. All nDCG calculations are relative value on the interval 0.0 to 1.0.
Two assumptions are made in using DCG and its related measures: firstly, highly relevant items are more useful when appearing earlier in a search engine result list (have higher ranks), and secondly, highly relevant items are more useful than marginally relevant items, which are in turn more useful than irrelevant items.
Thus DCG, normalized using IDCG measures the degree with which the items ranked meets the users' choice, and the higher the value, the better.
The Vote Score and corresponding DCG values are shown in Figure 10. Table 11 and Figure 11 present a comparison of the MRR result of our proposed system with Question Answering Refinement (QAR) System [4]. We considered final results (MRR) obtained by QAR on dataset used to recommend best answer to asker, also relating the result obtained to our system for effective

Comparison of Result Obtained with Related Works
performance. The result below shows the proposed system performs better than QAR.

Conclusion
In this work, a web-based Question Answering System that displays and recommends the best answer to the user was developed. The system gives users the privilege to ask questions and also receive answers to questions asked. The work shows different categories of users from Registered Member to Member as a Moderator. Division of users really helped the system to manage integrity of content and information delivery. The system adopted two algorithms: Normalized Google Distance (to suggest relevant answers to user) and Brouwer Fixed Point theorem (to calculate voting score on answers received). Fixed-Point Scoring (FPS) of individual answers was derived from the distribution of votes across answers and the scores of voters who cast the votes to give Answer Score (AS). The results obtained and plotted show that highly ranked answers meet user's choice and appear earlier in searches. This Question Answering System will help community users to quickly find relevant answers to question without wasting much time. It will always increase in size and supply highly relevant information to members of QA community.