Machine Learning Approaches for Classifying the Distribution of Covid-19 Sentiments - Open Journal of Statistics

OJS > Vol.11 No.5, October 2021

Open Journal of Statistics

Volume 11, Issue 5 (October 2021)

ISSN Print: 2161-718X ISSN Online: 2161-7198

Google-based Impact Factor: 0.53 Citations

Machine Learning Approaches for Classifying the Distribution of Covid-19 Sentiments ()

HTML XML

Download as PDF (Size: 971KB) PP. 620-632

DOI: 10.4236/ojs.2021.115037 248 Downloads 1,415 Views Citations

Author(s)

M. Kuyo¹, S. Mwalili², E. Okang’o³

Affiliation(s)

¹Jomo Kenyatta Unversity of Agriculture and Technology, Nairobi, Kenya.
²Department of Statistics and Actuarial Sciences, JKUAT, Nairobi, Kenya.
³Department of Mathematics and Actuarial Sciences, Murang’a University of Technology, Murang’a, Kenya.

ABSTRACT

Previously, rapid disease detection and prevention was difficult. This is because disease modeling and prediction was dependent on a manually obtained dataset that includes use of survey. With the increased use of social media platforms like Twitter, Facebook, Instagram, etc., data mining and sentiment analysis can help avoid diseases. Sentiment analysis is a powerful tool for analyzing people’s perceptions, emotions, value assessments, attitudes, and feelings as expressed in texts. The purpose of this research is to use machine learning techniques to classify and predict the spatial distribution of positive and negative sentiments of Covid-19 pandemic. This study research has employed machine learning to classify spatial distribution of Covid-19 twitter sentiments as positive or negative. The data for this study were geo-tagged tweets concerning COVID-19 which were live streamed using streamR package. The key terms used for streaming the data were: Corona, Covid-19, sanitizer, virus, lockdown, quarantine, and social distance. The classification used Naive Bayes algorithms with ngram approaches. N-Gram model is a probabilistic language model used to predict next item in a sequence in the form (n - 1) order Markov. It relies on the Markov assumption—the probability of a word depends only on the previous word without looking too far into the past. The steps followed in this research include: cleaning and preprocessing the data, text tokenization using n-gram i.e. 1-gram, 2-gram, and 3-gram, tweets were converted or weighted into a matrix of numeric vectors using Term Frequency Inverse-Document. Also, data were divided 80:20 between train and test data. A confusion matrix was utilized to evaluate the classification accuracy, precision, and recall performance of the various algorithms tested. Prediction was done using the best performing Naive Bayes algorithm. The results of this research showed that under Multinomial Naive Bayes, unigram accuracy was 92.02%, bigram accuracy was 97.37%, and trigram accuracy was 94.40%. Unigram had 89.34% accuracy, bigram had 96.80%, and trigram had 94.90% accuracy using Bernoulli Naive Bayes. Unigram accuracy was 90.43%, bigram accuracy was 95.67%, and trigram accuracy was 92.89% using Gaussian Naive Bayes. Bigram tokenization outperformed unigram and trigram tokenization. Bigram Multinomial Naive Bayes was used to predict test data since it was the most accurate in classifying train data. Prediction accuracy was 84.92%, precision 85.50%, recall 81.02%, and F1 measure 83.20%. TF-IDF was employed to increase prediction accuracy, obtaining 87.06%. These were then plotted on a globe map. The study indicates that machine learning can identify patterns and emotions in public tweets, which may then be used to steer targeted intervention programs aimed at limiting disease spread.

KEYWORDS

Machine Learning, Sentiment Analysis, Natural Language Processing, Covid-19, Naive Bayes, N-Gram

Share and Cite:

Kuyo, M. , Mwalili, S. and Okang’o, E. (2021) Machine Learning Approaches for Classifying the Distribution of Covid-19 Sentiments. Open Journal of Statistics, 11, 620-632. doi: 10.4236/ojs.2021.115037.

Cited by

[1]	Machine Learning Approaches to Assess Mood of the News Editorial
	2022 IEEE International …, 2022

[2]	Public Sentiment Assessment of Coronavirus-Specific Tweets using a Transformer-based BERT Classifier
	2022 International Conference on …, 2022

[3]	Exploring COVID-19 public perceptions in South Africa through sentiment analysis and topic modelling of Twitter posts
	The African Journal of Information …, 2022

[4]	Predicting COVID-19 community infection relative risk with a Dynamic Bayesian Network
	Frontiers in Public Health, 2022

Journals Menu

Follow SCIRP

	+1 323-425-8868
	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals Menu

Home

About SCIRP

Service

Policies