^{1}

^{*}

^{2}

^{3}

The rise of social media paves way for unprecedented benefits or risks to several organisations depending on how they adapt to its changes. This rise comes with a great challenge of gaining insights from these big data for effective and efficient decision making that can improve quality, profitability, productivity, competitiveness and customer satisfaction. Sentiment analysis is the field that is concerned with the classification and analysis of user generated text under defined polarities. Despite the upsurge of research in sentiment analysis in recent years, there is a dearth in literature on sentiment analysis applied to banks social media data and mostly on African datasets. Against this background, this study applied machine learning technique (support vector machine) for sentiment analysis of Nigerian banks twitter data within a 2-year period, from 1st January 2017 to 31st December 2018. After crawling and preprocessing of the data, LibSVM algorithm in WEKA was used to build the sentiment classification model based on the training data. The performance of this model was evaluated on a pre-labelled test dataset generated from the five banks. The results show that the accuracy of the classifier was 71.8367%. The precision for both the positive and negative classes was above 0.7, the recall for the negative class was 0.696 and that of the positive class was 0.741 which shows the prediction did better than chance in addition to other measures. Applying the model in predicting the sentiments of the five Nigerian banks twitter data reveals that the number of positive tweets within this period was slightly greater than the number of negative tweets. The scatter plots for the sentiments series indicated that, majority of the data falls between 0 and 100 sentiments per day, with few outliers above this range.

In the dynamic world that we now live in, the ideas, thoughts, beliefs, opinions and decisions of people are now being shared in real time on different platforms like Twitter, Facebook, LinkedIn, to name a few. However, gaining insights from these textual data for effective decision making may be a herculean task, partly due to the huge volume of information being shared from a large population and also from the complexity associated in quantifying the text data for modelling purposes. For instance, Twitter had over 310 million active users as at the first quarter of 2017 who posted over 500 million tweets per day [

Sentiment analysis, also called opinion mining, is the field of study that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes [

The importance of sentiment analysis cannot be overemphasised as its application and impact span diverse fields and domains. Organisations, companies, agencies and governments can leverage on sentiment analysis in gaining insights which can enhance efficient and effective decision making. Also, by implementing sentiment analysis, organisations can take appropriate measures to ensure that they remain competitive in the market place, by determining product and services that customers are not satisfied with and improving such either through price reallocation, quality improvement or addition of new features. Similarly, in the academic domain, according to [

Sentiment analysis has been shown to be beneficial in the finance sector with works such as [

However, despite the upsurge of research in sentiment analysis, there is a dearth in literature on sentiment analysis applied to banks social media data and mostly on African datasets. To the best of our knowledge, this paper is the first work to apply machine learning technique for sentiment analysis of Nigerian banks social media data. It is based on these findings that this research is carried out. This work will be of great benefit not only in expanding the domain of sentiment analysis but also be of profound help to Nigerian banks on customer intelligence and education so as to improve their satisfaction. It will also emphasis the utilization of social media analytics, in promoting their products and services, risk management, business forecasting, competitive analysis, product and service design.

The rest of the paper is structured accordingly: Section 2 presents our proposed framework for sentiment analysis, data and software used in the study. In Section 3, we present the twitter data preprocessing techniques employed in cleaning the data while in Section 4, we give a general overview of the support vector machine, the mathematical theory behind its solution, the libSVM algorithm in WEKA for its implementation and the model evaluation parameters. Section 5 presents, the results and discussions of the twitter classification model and its results when implemented in the five Nigerian Banks considered in the study. Finally, Section 6 presents the conclusion and recommendation of the study.

We have presented our proposed framework for sentiment analysis, illustrated in

training of the SVM classification model in Weka, evaluation of the model performance and deployment in the unlabelled Nigerian Banks twitter datasets, resulting in the classification of tweets as positive and negative.

Data Used in the Study1) Nigerian banks twitter data

The data used for this study was retrieved from Twitter’s API covering a 2-year period from 1st January 2017 to 31st December 2018. Since the data crawled were publicly available data and the usage was within the stipulated twitter data usage there was no need for ethical review. The data was retrieved from the following Nigerian banks using their specific twitter filter operators as shown in

Bank Name | Stock Symbol | Filter Operator (Twitter Handle) | Number of Tweets |
---|---|---|---|

Access Bank | ACCESS | @myaccessbank | 33,821 |

First Bank of Nigeria Holdings | FBNH | @FirstBankngr | 51,995 |

Guaranty Trust Bank | GUARANTY | @gtbank | 79,745 |

United Bank for Africa | UBA | @UBAGroup | 21,048 |

Zenith Bank | ZENITHBANK | @ZenithBank | 29,007 |

2) Training data

Training Data used was the Sentiment140 dataset by [

3) Test data

The test dataset used in this study was created from the crawled Nigerian Banks Twitter datasets. We randomly select 100 tweets from each bank. Since we have five banks, this sums up our test data to 500. We then manually classified these 500 tweets into positive and negative. This test data was used to evaluate the performance of the SVM classifier, which was trained using the 150,000 tweets obtained from the Sentiment140 dataset.

4) Software used in the study

This study utilized Waikato Environment for Knowledge Analysis (WEKA) software for training the SVM classification model and deploying its results on the unlabelled datasets. WEKA is a data mining and machine learning software, developed by the University of Waikato, New Zealand. This software is used by most researchers to solve data mining and machine learning problems, because of its general public license and its graphical user interface for several functionalities such as data analysis, classification, predictive modelling and visualizations. In addition, the data preprocessing described in Section 3 was implemented in Python. Python is a very powerful open source programming language that is effective in solving diverse problems in different learning domains.

In order to obtain accurate and reliable results, it is paramount that the twitter data be preprocessed so as to enable the classifier run smoothly. Data preprocessing, is a data mining technique that seeks to clean raw datasets by, removing the noise and uninformative parts from it, thereby making it suitable for data analysis with the aim of achieving accurate and reliable results.

Furthermore, [

This phase involves removing unimportant parts from the tweets. They are tagged unimportant since they do not contain any sentiment and removing them does not change the meaning of the tweet. Basic twitter data cleaning involves removing:

1) Twitter handles such as @user,

2) Uniform Resource Locators (URLs): URLs such as http://www.twitter.com,

3) hashtags (#): in this case we remove only the hashtag symbol # and not the word, since most words after hashtags contains sentiments,

4) stock market tickers like $GE,

5) old style retweet text “RT”,

6) multiple whitespace items and replacing them with single space,

7) punctuations, numbers, and special characters, and

8) converting of all texts to lower case.

Stopwords are words which are used very frequently in a language and have very little meaning. They are mostly pronouns and articles, for example, words like, “is”, “and”, “was”, “the”, etc. Also, we created a stop list of frequent Nigerian stopwords such as “na”, “ooo”, “kuku”, etc. These words are filtered out from the tweets in other to save both space and time.

Stemming is a technique that seeks to reduce a word to its word stem. For example, words like “quick”, “quickly”, “quickest”, and “quicker” are considered to be from the same stem “quick”. This helps in decreasing entropy and thereby increasing the relevance of the word. Before stemming is applied, the text has to be tokenized. Tokenization is the process of splitting a string of text into individual words.

After performing the required preprocessing, some tweets may become empty. For example, if a tweet contains only URL and mention (@user). When URL and handles are removed, this tweet becomes empty. It is therefore important to remove all empty fields since an empty field contains no sentiment.

Support Vector Machine (SVM) algorithm is a supervised machine learning approach that has proven to be very successful in tackling regression and classification problems. It was developed by [

Consider a training set D = { ( x → i , y i ) | x → i ∈ ℝ n , y i ∈ { − 1,1 } } i = 1 m where x → i is a

vector and y i is the associated class label, which can take values +1 or −1. The goal of the SVM is to find the optimal hyperplane that best separates the data between the two classes. This goal is achieved by maximizing the margin between the two classes (e.g. blue class and red class in

Definition 4.1 (Hyperplane). The hyperplane is a subspace of one dimension less than its ambient space. It is a set of points satisfying the equation w → ⋅ x → + b = 0 .

From the foregoing, it has been buttressed that the goal of SVM is to find the optimal separating hyperplane that best segregates the data. Given a vector w → of any length that is constrained to be perpendicular to the median line and an unknown vector u → . We are interested in knowing if u → belongs to class A or B, illustrated in

Data | Number of tweets before preprocessing | Number of tweets after preprocessing |
---|---|---|

Training | 1,600,000 | 1,588,648 |

Test | 500 | 490 |

ACCESS | 33,821 | 33,046 |

FBNH | 51,995 | 50,995 |

GUARANTY | 79,745 | 77,999 |

UBA | 21,048 | 20,395 |

ZENITHBANK | 29,007 | 28,096 |

the line and we have that w → ⋅ u → ≥ c . Where c is a constant. If c = − b , we can obtained the decision rule to be:

y i ( w → ⋅ x → i + b ) ≥ 1, ∀ i (1)

In which the variable y i is the associated class label such that y i = + 1 , for + samples and y i = − 1 , for − samples.

Definition 4.2 (Point on a Hyperplane). Given a data point x → i and a hyperplane defined by a vector w → and bias b, we will get w → ⋅ x → i + b = 0 if x → i is on the hyperplane.

This implies that, for points on the hyperplane, Equation (1) becomes

y i ( w → ⋅ x → i + b ) = 1 (2)

Therefore, we want the distance (margin) between the positive and negative samples to be as wide as possible.

The distance (margin) between the two support vectors (illustrated by the two green lines in

Margin = ( x → + − x → − ) ⋅ w → ‖ w → ‖ (3)

Utilizing Equation (2), by substituting for a positive sample, y i = 1 and negative sample, y i = − 1 , respectively and later substituting both results in Equation (3) we obtain

Margin = 2 ‖ w → ‖ (4)

Rescaling w → and b to Margin = 1 ‖ w → ‖ since it does not affect the optimization result. Therefore, the SVM optimization problem becomes:

maximize w → , b 1 ‖ w → ‖ subject to y i ( w → ⋅ x → i + b ) − 1 ≥ 0 , i = 1 , ⋯ , m . (5)

This maximization problem is equivalent to the following minimization problem:

minimize w → , b ‖ w → ‖ subject to y i ( w → ⋅ x → i + b ) − 1 ≥ 0 , i = 1 , ⋯ , m . (6)

This minimization problem gives the same result as the following:

minimize w → , b 1 2 ‖ w → ‖ 2 subject to y i ( w → ⋅ x → i + b ) − 1 ≥ 0 , i = 1 , ⋯ , m . (7)

The strategy that helps in finding the local maxima and minima of a function subject to equality constraint was developed by the Italian-French mathematician Giuseppe Lodovico Langrangia, also known as Joseph-Louis Lagrange.

From Equation (7) our objective function is:

f ( w → ) = 1 2 ‖ w → ‖ 2 (8)

and m constraint functions:

g i ( w → , b ) = y i ( w → ⋅ x → i + b ) − 1, i = 1, ⋯ , m . (9)

Introducing the Lagrangian function, we have

L ( w → , b , α ) = f ( w → ) − ∑ i = 1 m α i g i ( w → , b ) (10)

⇒ L ( w → , b , α ) = 1 2 ‖ w → ‖ 2 − ∑ i = 1 m α i [ y i ( w → ⋅ x → + b ) − 1 ] (11)

where α i is a Lagrange multiplier for each constraint function.

The SVM Lagrangian problem in Equation (11) can be rewritten using the duality principle to aid its solvability.

minimize w → , b max α L ( w → , b , α ) subject to α i ≥ 0 , i = 1 , ⋯ , m .

To obtain the solution of the primal problem, we need to solve the Lagrangian problem. The duality principle tells us that an optimization problem can be viewed from two perspectives. The first one is the primal problem, a minimization problem in our case, and the other one is a dual problem, which will be a maximization problem. Since we are solving a convex optimization problem, then Slater’s condition holds for affine constraints, and Slater’s theorem tells us that strong duality holds. This implies that the maximum of the dual problem is equal to the minimum of the primal problem.

The minimization problem is to be solved by taking the partial derivatives of L ( w → , b , α ) in Equation (11) with respect to w → and b. Therefore, differentiating partially L ( w → , b , α ) with respect to w → and setting it to zero we have:

w → = ∑ i = 1 m α i y i x → (12)

Equation (12) shows that the vector w → , is a linear sum of the samples.

Also, differentiating partially L ( w → , b , α ) with respect to b, and setting it to zero we obtain:

∑ i = 1 m α i y i = 0 (13)

Substitute Equation (12) into Equation (11), we have:

L ( w → , b , α ) = 1 2 ( ∑ i = 1 m α i y i x → i ) ⋅ ( ∑ j = 1 m α j y j x → j ) − ∑ i = 1 m α i [ y i { ( ∑ j = 1 m α j y j x → j ) ⋅ x → i + b } − 1 ] = 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x → i ⋅ x → j − ∑ i = 1 m α i y i [ ( ∑ j = 1 m α j y j x → j ) ⋅ x → i + b ] + ∑ i = 1 m α i = 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x → i ⋅ x → j − ∑ i = 1 m ∑ j = 1 m α i α j y i y j x → i ⋅ x → j − b ∑ i = 1 m α i y i + ∑ i = 1 m α i

L ( w → , b , α ) = ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x → i ⋅ x → j − b ∑ i = 1 m α i y i (14)

Substitute Equation (13) into Equation (14) gives:

L ( w → , b , α ) = ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x → i ⋅ x → j (15)

Equation (15) is called the Wolfe Dual Lagrangian function. This shows that the optimization depends only on the dot product of pairs of samples (i.e. x → i ⋅ x → j ).

The optimization problem is now called the Wolfe Dual problem:

maximize α ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x → i ⋅ x → j subject to α i ≥ 0 , for any i = 1 , ⋯ , m , ∑ i = 1 m α i y i = 0 (16)

The main advantage of the Wolfe dual problem over the Lagrangian problem is that the objective function w → now depends only on the Lagrange multipliers. Also, the optimization problem depends only on the dot product of pairs of samples (i.e. x → i ⋅ x → j ). This aids computation using software.

The Karush-Kuhn-Tucker (KKT) conditions are necessary and sufficient conditions for an optimal point of a positive definite Quadratic Programming problem. Thus, for a solution to be optimal, it has to satisfy the KKT conditions. According to [

The hard margin SVM demands that the data be linearly separable. However, since in real-life data is often noisy, due to issues such as mistyped value, presence of outlier, etc. To solve this problem [

Therefore, the soft margin formulation becomes:

minimize w → , b , ζ 1 2 ‖ w → ‖ 2 + C ∑ i = 1 m ζ i subject to y i ( w → ⋅ x → i + b ) ≥ 1 − ζ i , i = 1 , ⋯ , m , ζ i ≥ 0 for any i = 1 , ⋯ , m

The parameter C is called the SVM hyperparameter, it help us to determine how important the ζ should be since sometimes we would want to use the hard margin.

Therefore, the Wolfe dual problem is:

maximize α ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j x → i ⋅ x → j subject to 0 ≤ α i ≤ C , for any i = 1 , ⋯ , m , ∑ i = 1 m α i y i = 0 (17)

When the data is not linearly separable in two dimensions and SVM is to be applied, it therefore becomes pertinent to transform the data to higher dimensions so that it can be separated. This can be done with the aid of kernel function. A kernel is a function that returns the result of a dot product performed in another space.

Let K ( x → , x → ) = x → ⋅ x → be a kernel, the soft-margin dual problem of 17 can be re-written as:

maximize α ∑ i = 1 m α i − 1 2 ∑ i = 1 m ∑ j = 1 m α i α j y i y j K ( x → i ⋅ x → j ) subject to 0 ≤ α i ≤ C , for any i = 1 , ⋯ , m , ∑ i = 1 m α i y i = 0 (18)

In Equation (18) above, C is called an SVM hyperparameter and K ( x → i ⋅ x → j ) is the kernel function, these are provided by the user; and the variables α i are Lagrange multipliers. This change made to the dual problem is called the kernel trick. Thus, applying the kernel trick is the same as replacing the dot product of the two examples by a kernel function.

There are different types of Kernel that can be used to achieve the goal of the SVM optimization.

1) Linear kernel: linear kernel is known as the simplest kernel. Given two vectors x → and x → ′ , the linear kernel is defined as: K ( x → , x → ′ ) = x → ⋅ x → ′ .

2) Polynomial kernel: Given two vectors x → and x → ′ , the polynomial kernel is defined as: K ( x → , x → ′ ) = ( x → ⋅ x → ′ + c ) d . Where c ≥ 0 is a constant term and d ≥ 2 represent the degree of the kernel.

3) Radial basis function (RBF) or Guassian kernel: A radial basis function is a function whose value depends only on the distance from the origin or from some point. Given two vectors x → and x → ′ , the RBF or Guassian kernel is defined as: K ( x → , x → ′ ) = exp ( − γ ‖ x → − x → ′ ‖ 2 ) . The RBF kernel returns the result of a dot product performed in ℝ ∞ .

We utilized the LibSVM algorithm running under WEKA environment in solving the SVM optimization problem. LibSVM is a library for SVM developed by [

After classification using SVM, it is needful to evaluate the performance of the classifier. In this work we do this by using a test data. Some of the measures used by text categorization algorithms for performance evaluation are Precision, Recall, F-measure and Accuracy. These parameters are calculated using elements of the confusion matrix or contingency table.

From

With the values provided by the confusion matrix it is possible to calculate the performance evaluation parameters such as:

1) Precision: measures the exactness of the classifier with respect to each class. It is given as:

Precision = TP TP + FP (19)

2) Recall: measures the completeness of the classifier with respect to each class. It is given as:

Recall = TP TP + FN (20)

3) F-Measure: is the harmonic mean of precision and recall. It is given as:

F-measure = 2 ∗ Precision ∗ Recall Precision + Recall (21)

4) Accuracy: is the ratio of correctly classified example to total number of examples. It is given as:

Accuracy = TP + TN TP + TN + FP + FN (22)

5) Kappa Statistics: the Cohen’s Kappa coefficient measures how much better the classifier is compared with guessing with a random classifier. To do this, it compares the observed accuracy with an expected accuracy (random chance).

KappaStatistics = P o − P e 1 − P e (23)

where P o = TP + TN TP + TN + FP + FN is the observed accuracy and P e = expected accuracy (which is the hypothetical probability of chance agreement).

6) Error Estimates: in general, error estimates are used to measure how close forecasts or predictions are to the eventual outcomes. Given θ as the true value,

Positive | Negative | |
---|---|---|

Positive | True Positive (TP) | False Positive (FP) |

Negative | False Negative (FN) | True Negative (TN) |

θ ^ as value estimated using the algorithm and θ ¯ as the mean value of θ . The following error estimates are calculated as follows:

a) Mean absolute error (MAE)

MAE = 1 N ∑ i = 1 N | θ ^ i − θ i | (24)

b) Root mean square error (RMSE)

RMSE = 1 N ∑ i = 1 N ( θ ^ i − θ i ) 2 (25)

c) Relative absolute error (RAE)

RAE = ∑ i = 1 N | θ ^ i − θ i | ∑ i = 1 N | θ ¯ i − θ i | (26)

d) Root relative square error (RRSE)

RRSE = ∑ i = 1 N ( θ ^ i − θ i ) 2 ∑ i = 1 N ( θ ¯ i − θ i ) 2 (27)

7) Matthews Correlation Coefficient (MCC): measures the quality of binary classifications. The MCC is a correlation coefficient and its value lies in the interval − 1 ≤ MCC ≤ 1 . A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation.

From the confusion matrix, the MCC can be calculated as:

MCC = T P × T N − F P × F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N ) (28)

From

Summary | Result |
---|---|

Correctly Classified Instances (Accuracy) | 352 (71.8367%) |

Incorrectly Classified Instances | 138 (28.1633%) |

Kappa statistic | 0.4369 |

Mean absolute error | 0.2816 |

Root mean squared error | 0.5307 |

Relative absolute error | 56.3265% |

Root relative squared error | 106.1381% |

Total Number of Instances | 490 |

Class | Precision | Recall | F-Measure | MCC |
---|---|---|---|---|

Negative | 0.732 | 0.696 | 0.714 | 0.437 |

Positive | 0.706 | 0.741 | 0.723 | 0.437 |

Weighted Avg. | 0.719 | 0.718 | 0.718 | 0.437 |

performed better than chance. In a similar vein, the MCC (Matthews Correlation Coefficient) measures the quality of binary (two-class) classifications and mostly employed if the classes are of different sizes. Since our result for MCC is 0.437, which is positive and above 0, it indicates that our prediction is good.

From the foregoing, we have established that the model performed very well on our test data and hence we employed it in predicting the sentiments of the unlabelled five banks twitter data to obtain the result below.

Figures 5-9 display the plots of the number of positive and negative sentiments for each bank. For each day within the period 1st January 2017 to 31st December 2018, we obtained the number of positive and negative sentiments, this data shows intuitively the distribution of sentiments for each bank within the period considered. From the plots, it can be seen that majority of the data falls between 0 and 100, with few outliers above this range.

This study proposed a framework for sentiment analysis of five Nigerian banks

BANK | ACCESS | FBNH | GUARANTY | UBA | ZENITHBANK |
---|---|---|---|---|---|

Number of Positive Sentiments | 18,026 | 27,028 | 41,616 | 12,588 | 16,375 |

Number of Negative Sentiments | 15,020 | 23,967 | 36,383 | 7807 | 11,721 |

Total Number of Sentiments | 33,046 | 50,995 | 77,999 | 20,395 | 28,096 |

tweets using Support Vector Machine. The data used for the study was crawled from Twitter API and spans a 2-year period, from 1st January 2017 to 31st December 2018. Based on the number of tweets crawled, GUARANTY had a greater

number of tweets followed by FBNH, next to ACCESS, which was followed by ZENITHBANK and with UBA having the least number of crawled tweets within this period. In order to obtain accurate and reliable results, and also ensure that the classifier runs smoothly, these datasets were preprocessed in Python in which twitter handles, URLs, hashtags, stock market symbols, “RT”, multiple whitespaces,

punctuations, numbers, and special characters were all removed. Similarly, stopwords were removed, and the tweets were stemmed so as to decrease the entropy of a word. Thereafter, empty fields were removed.

After preprocessing of the training, test and five banks datasets, LibSVM algorithm in WEKA was used to build the sentiment classification model based on the training data. The performance of this model was evaluated on a pre-labelled test dataset generated from the five banks. Our results show that the accuracy of the classifier was 71.8367%, the Kappa statistics was greater than 0 and implies that the classifier performed better than chance. The precision for both the positive and negative classes was above 0.7 which is very good. Also, the recall for the negative class is 0.696 and that of the positive class is 0.741 which are greater than 0.5 which shows the prediction did better than chance. Similarly, the F-measure scores for both classes are greater than 0.7, which shows our model performed better than chance. Since our result for Matthews Correlation Coefficient (MCC) is 0.437, which is positive and above 0, it indicates that our prediction is good.

Since the model performed very well on the test data, it was deployed in predicting the sentiments of the five Nigerian banks twitter data. Our results show that the number of positive tweets within this period was slightly greater than the number of negative tweets for each of the five banks. Plots for the sentiments series indicated that, majority of the data falls between 0 and 100 sentiments per day, with few outliers above this range.

This research will assist the Nigerian banks in better understanding their customers and foster risk management, business forecasting, competitive analysis, products and services improvements. Future studies can use several machine learning techniques and compare their performance on the datasets to ascertain the best classifier. Another research direction will be classifying the tweets into other polarities other than positive and negative.

The first author acknowledges the African Union Commission for the research grant to carry out this research under the Pan African University Scholarship Scheme.

The authors have no competing interests to declare.

Onwuegbuche, F.C., Wafula, J.M. and Mung’atu, J.K. (2019) Support Vector Machine for Sentiment Analysis of Nigerian Banks Financial Tweets. Journal of Data Analysis and Information Processing, 7, 153-173. https://doi.org/10.4236/jdaip.2019.74010