Accuracy of Ebola Information in a Knowledge Exchange Social Website ( KESW )

Background: Misinformation on interactive Knowledge Exchange Social Websites (KESWs) is concerning since it can influence Internet users’ health behaviors, especially during an infectious disease outbreak. Objective: The present study seeks to examine the accuracy and characteristics of health information posted to a Knowledge Exchange Social Website (KESW). Methods: A sample of 204 answers to Ebola questions were extracted and rated for accuracy. Multiple logistic regression modeling was used to examine whether answer characteristics (best answer, professional background, statistical information, source disclosed, link, and word count) predicted accuracy. Results: Overall, only 27.0% of the posted answers were rated as “accurate”. Accuracy varied across question topics with between 11.8% 45.5% of answers being rated as accurate. When Yahoo Answers’ “best answers” were examined, the overall accuracy was substantially higher, with 80.0% of “best answers” being rated as accurate compared to 16.0% of all other answers. Conclusion: There is need for tools to help Internet users navigate health information posted on these dynamic user-generated knowledge exchange social websites.


Introduction
The popularity of the Internet as a discreet, readily available source of health in-How to cite this paper: Gorman, F., Yadegarians, D., Islam, T., Tongco, S., Johnston, E., Estrada, E. and Gorman, N. ( 2017 formation is evidenced by data showing that up to 75% of adults in the U.S. report having used a search engine to look up health information [1], with over half of adults (56.8%) having reported seeking health information online in the past month [2].Daily, approximately three times as many people search for health information online compared to consulting physicians at office visits [3].
Despite the popularity of online health resources, past research has shown that both the quality [4] [5] and readability level [4] of health-related websites varies widely, with incomplete information and inaccuracies compromising the information available to readers [6].
The presence of poor, incomplete, or misleading information is troubling given the low levels of electronic health literacy (eHealth Literacy) reported among Internet users [6] [7].Most Internet users studied reported that they knew how to find health information, but their confidence in distinguishing high quality from low quality resources online was significantly lower [8] [9].
Even among students of health science, low levels of eHealth Literacy have been documented, with few knowing about free, credible health databases [10].
The impact of misinformation online is at least twofold.
First, the mere presence of such information can influence Internet users' search and browsing habits.For instance, confirmation biases, in which people tend to seek out information that confirms their preexisting beliefs [11], have been observed to operate in online searches.In the presence of conflicting information on a health topic, individuals actively and preferentially access information that reinforces their beliefs while avoiding information that challenges their beliefs [12].Similarly, research has shown that internet users' search strategies are systematically biased towards examining only the top search results from search engines and following links related to more serious health conditions when trying to self diagnose [13].Despite these biases, Internet users tend to believe the information they find online is accurate and trustworthy, regardless of the actual accuracy of the information [14].
Previous research has examined the quality and accuracy of health information posted to professional, static websites [25] [26] and online supplement retailers [27].While poor, incomplete, or misleading information is present on many of these websites, researchers have made efforts to address this through the development of tools for assessing the veracity of such pages' content [28] and noted several characteristics of higher quality sources such as the presence of disclaimers, availability of references, and authorship disclosure [29] as well as the length of information and frequent external links [30].Accurate and timely online information is particularly important during an outbreak of a (re)emerging infectious disease.Slow dissemination of information through official channels and confusing or conflicting messages in the media generate high levels of panic in the general public and drive them to seek answers on the internet [32] [33] [34].The current study focuses on Ebola virus disease (EVD), as the response to the 2014 outbreak in West Africa was impacted by the presence of misinformation and highlighted the effects of such information on outbreak containment, support of proper quarantine procedures, and social stigmatization of patients [35] [36].While the 2014 outbreak never made significant inroads into the United States, research has highlighted several characteristics of the US populace that could exacerbate the spread of an infectious disease during a global pandemic.Specifically, knowledge and utilization of official channels for government health communication remain low [36].At the same time, in the case of Ebola, overall knowledge about the disease is low [36] [37] while generalized mistrust and conspiracy beliefs related to the medical industry [35] are prevalent.Under conditions in which Internet users are underutilizing official health communication channels, harboring mistrust towards the medical establishment, and carrying factual inaccuracies about a disease, KESWs, with their anonymity, pose a risk for fueling the spread of misinformation.
The present study seeks to address some of the knowledge gap on the accuracy of health information posted to KESWs by examining the types of Ebola questions being posted on a popular KESW and rating the accuracy of the anonymous users' answers to these questions.In addition, the relationship between an- swer characteristics, such as inclusion of links to references, and answers' accuracy was examined in order to determine whether answer characteristics could be used to identify higher quality answers.

Data Collection
The decision was made to focus the study on a single KESW.Of the KESWs reviewed, Yahoo Answers was selected due to the interface's ease of searching and retrieving questions and answers as well as for its reach; in 2016 Yahoo was On March 25, 2015, a total of 23 posts with the keyword "ebola" were extracted from Yahoo Answers for analysis (see Figure 1).Upon initial review 5 posts were excluded as they asked subjective questions whose answers could not be rated for accuracy (see Table 1's excluded category for an example question), In addition to questions and answers, six accompanying data points were ex- tracted from each answer: 1) Best Answer: Since March 2014, the person who posted their question(s) on Yahoo Answers gets to mark one of the answers provided as the Best Answer.
All sets of answers had a Best Answer marked.
2) Professional Background: This variable captured whether or not each answerer indicated that their answer was based on their professional background in the health sciences (ex: answerer indicated that they were a nurse with 10 years of experience with infectious diseases).
3) Statistical Information: This variable captured whether or not each answer included the use of statistics.
4) Source Disclosure: This variable captured whether or not each answer contained a disclosure that the information presented came from an external source, as it was discovered that many answers contained unmodified copied and pasted content from other websites.
5) Link: This variable captured whether the answer contained a link to an external website for additional information.
6) Word Count: A count of the words used in each answer.

Answer Accuracy
In order to evaluate the accuracy of each posted answer, answers were coded into one of five categories: 1) Accurate: Accurate answers contained no factual errors and addressed the question that was asked.
2) Inaccurate: Inaccurate answers contained one or more factual errors.Note that, given the severe consequences of misinformation on infectious diseases, it was decided to rate answers as inaccurate even if the answer contained accurate information as well as inaccurate information.
3) Subjective: Subjective answers included any response whose accuracy could not be rated, such as statements of opinion.
The accuracy of all answers was assessed independently by two of the authors.
The authors then examined each other's ratings and discussed the answers they disagreed upon.A physician was available as the tiebreaker in case the authors could not agree upon an answer's accuracy rating after discussion, though all disagreements were resolved with discussion between the authors without need for the physician's intervention.
A thematic analysis was conducted in order to establish a codebook of the types of questions being asked about Ebola on Yahoo Answers [39].In the first stage, two of the authors read through the entirety of the set of questions in order to familiarize themselves with the data.Following the read-through both readers independently developed a set of emergent themes to organize the types of questions asked.These emergent themes were then shared with the full research team who helped to reconcile differences in the two authors' coding schemes and arrive at a final coding scheme.
Simple descriptive statistics (frequency and valid percent) and histograms were employed to examine the types of Ebola questions being asked, the accuracy of answers to these questions, and the role of answers voted "best answer" by the KESW user who posted each question.
Multiple logistic regression modeling was used to examine whether answer characteristics (best answer, professional background, statistical information, source disclosed, link, and word count) predict accuracy (re-coded to a dichotomous accurate vs. inaccurate).Answers that fundamentally failed to address the question asked (i.e.subjective, trolling, or unanswered) were excluded from the logistic regression model, as readers looking for an answer to a health question could reasonably be expected to disregard these answers.As there were no a priori predictions regarding which variables would emerge as significant predictors of answers' accuracy, five of the six predictors were force entered into the final logistic regression model.The sixth predictor, professional background, was ultimately removed from the model, as only three answers came from respondents citing a professional background, which precluded meaningful analysis of this variable.

Types of Ebola Questions Asked
A total of seven themes were identified during the thematic analysis of types of Ebola questions posted to Yahoo Answers.Table 1 presents each theme along with a representative example question drawn from the dataset.
The topics of Yahoo Answers visitors' questions showed significant heterogeneity, with each of the question categories capturing only between 4.9% -  27.5% of the question totals (see Figure 2).

Accuracy of Ebola Answers
Overall, only 27.0% of the posted answers were rated as "accurate" (i.e.answering the question asked and containing no factual errors; see Figure 3).However, when accuracy was compared between answers to differing topics, substantial heterogeneity was observed, with between 11.8 -45.5% of answers being rated as accurate (see Table 2).When Yahoo Answers' "best answers" were examined, the overall accuracy was substantially higher, with 80.0% of "best answers" being rated as accurate compared to 16.0% of all other answers (see Figure 4).

Predictors of Answer Accuracy
Logistic regression modeling found that the overall model with all five predictors together served as a statistically significant predictor of answers' accuracy (χ 2 (5) = 25.08,p < 0.001; Nagelkerke R 2 = 0.37).Examining the individual predictors revealed only a single statistically significant predictor of accurate answers (see Table 3).Specifically, answers that were voted "best answer" were approximately 21 times as likely to be rated accurate (OR = 21.32,95% CI = 1.47 -310.02,p = 0.03).

Discussion
Overall, the accuracy of Ebola information posted to Yahoo Answers was quite low, with less than half of all answers providing fully accurate information.More troubling, the questions that would be most relevant during an infectious disease outbreak, namely transmission, symptoms, and treatment, were each answered accurately less than a third of the time.In light of Internet users' low electronic health literacy [6] [7], susceptibility to search biases [12] [14] [20], and tendency to base health behaviors off of online information [1] [13] [15]- [21] [23] [24], these data suggest that KESWs could serve as a source of misinformation and a driver of high risk behaviors during an infectious disease outbreak.
The finding that people who posted questions on the KESW later selected "best answers" that were 21 times more likely to be accurately answered helps to allay some of these concerns raised about visitors' eHealth literacy.In aggregate, KESWs, or that ratio of question topics being addressed may differ from those presented here.These data nonetheless take the first steps towards filling the knowledge gap on KESW answers' accuracy, and future replication research will help to verify the types of questions being asked.
Ultimately, these data highlight the risks posed by seeking health information related to emerging infectious disease online through KESWs.Although those posting questions selected "best answers" that were often accurate, too little is known about the browsing habits of other KESW users.The presence of frequent misinformation among the posted responses and high volume of unhelpful information (unanswered, subjective, or trolling responses), suggest that these sites may pose special risks to users with low health literacy or medical misperceptions.In the context of Ebola, this misinformation could translate into challenges to outbreak containment, opposition to proper quarantine procedures, or social stigmatization of patients.
Further research is needed in order to explore the landscape of different KESWs and health topics, though these preliminary results raise concerns.If these patterns of inaccurate information hold true in other contexts, it may be necessary to provide users with tools to help them ascertain the veracity of usergenerated claims, work directly with KESW providers to develop quality control mechanisms on their websites, and direct practitioners' attention to these sites both to drive further research as well as to prepare practitioners to work with populations using these sites as a source of medical information.

F.
Gorman et al.DOI: 10.4236/ojpm.2017.710017 ranked as the third most popular multi-platform web property in the United States with 206 million unique visitors in a single month (https://www.statista.com/statistics/271412/most-visited-us-web-properties-based-on-number-of-visitors/).

Figure 1 .
Figure 1.Flow chart of question/answer inclusion and exclusion.

5 )
Trolling: Upon working with the data, it became clear that a fifth category was needed in order to capture responses that not only didn't answer the question asked, but which also took on the characteristics of online trolling, which Merriam-Webster defines as "to antagonize (others) online by deliberately posting inflammatory, irrelevant, or offensive comments or other disruptive content" F.Gorman  et al.DOI: 10.4236/ojpm.2017.710017215 Open Journal of Preventive Medicine

Figure 2 .
Figure 2. Frequency of question topics.

Table 1 .
Types of Ebola Questions Posted to a KESW (n = 209).

Table 2 .
Percent of Answers in Each Accuracy Category by Type of Question Asked (n = 204).

Table 3 .
Summary of Logistic Regression Analysis for Variables Predicting Answer Accuracy (n = 81).
. Gorman et al.DOI: 10.4236/ojpm.2017.710017219 Open Journal of Preventive Medicine it seems like KESWs users were able, to some degree, to discern accurate information from the various responses given.In fact, 80.0% of the answers voted "best answer" were accurate while only 2.9% of these answers were categorically inaccurate.That said there remain significant unknowns.For instance, while the Several limitations should be considered when examining these results.First, in the absence of further data, it is worth noting that the culture of KESW users may differ widely from Website to Website, limiting the generalizability of these findings.Further research is needed to explore not only how KESW users differ across different sites such as Yahoo Answers versus Reddit, but also how the culture of users differs across different health topics.For instance, the participation of vociferous groups like the anti-vaxxer community could radically change the distribution of accurate to inaccurate posted answers on topics like childhood vaccination recommendations.Likewise, it seems plausible that trolling may be more prevalent in posts related to topics being popularized by the media.In addition, due the high ratio of answers to questions, although 204 answers were available to code, only 23 posts with 35 total questions were examined.This raises the possibility that other types of questions are being asked about Ebola on F. Gorman et al.DOI: 10.4236/ojpm.2017.710017220 Open Journal of Preventive Medicine