Identification of Influential Users in Online Social Network: A Brief Overview

Information networks where users join a network, publish their own content, and create links to other users are called Online Social Networks (OSNs). Nowadays, OSNs have become one of the major platforms to promote both new and viral applications as well as disseminate information. Social network analysis is the study of these information networks that leads to uncovering patterns of interaction among the entities. In this regard, finding influential users in OSNs is very important as they play a key role in the success above phenomena. Various approaches exist to detect influential users in OSNs, starting from simply counting the immediate neighbors to more complex machine-learning and message-passing techniques. In this paper, we review the recent existing research works that focused on identifying influential users in OSNs.


Introduction
Since its creation, the Internet has become a major source of news. The Internet has spawned many information-sharing networks, the most well-known of which is the World Wide Web. Recently, a new category of information networks known as "Online Social Networks (OSNs)" has exploded in popularity and now rivals the traditional Web in terms of usage. The advent of Online Social Networks (OSN) has been one of the most exciting phenomena in the past decade partially due to the increasing proliferation and affordability of Internet-enabled devices, such as personal computers, mobile devices, smartphones, tablets, etc. This is evidenced by the immense popularity of many online social networks, such as The analysis of OSNs data sets is turning out to be non-trivial, since on the one hand, they are typically large-scale and thus it is difficult to directly apply the traditional analytics methods to them; on the other hand, some analytic methods that are borrowed from the other domains fail to accurately measure the characteristics and properties of OSN.
Each social networking site includes different gatherings where users can impart their insight, spread data, or convey it to others. This has created a new way to copy genuine human interaction. For example, Twitter has become one of the biggest and very well-known microblogging sites on the internet. Twitter allows registered users to post and receive short messages of up to 140 characters. This feature that allows Twitter users to publish short messages, in a faster and summarized way, makes it the preferred tool for the quick dissemination of information over the web. These messages are called tweets and they can be posted via the Twitter website, short messaging services or third-party applications. Importantly, a large fraction of the tweets are posted from mobile devices and services, such as Short Message Service (SMS) messages. A user's messages are displayed as a stream on the user's Twitter home page.
As the social networking site (like Twitter) coordinates a massive number of users and the most significant part of the cases is that the users send the following request to others who have comparative similar topical interests with the user-the idea is known as homophily [10]. This social platform can be used as a robust marketing platform undoubtedly. The efficiency of online social interactions draws the attention of researchers to discover influential users. Finding influential users can help by highlighting the products in viral marketing or con- • Analyzing the trends and behavior of people towards any event.
• Tracking down the collective identity and role models in social movements.
In this paper, we review the existing research works to find influential users in OSNs. The rest of the paper is categorized as follows: Section 2 includes the relevant research works in this field with a brief overview. Section 3 covers the common approaches/methodologies in recent times to find influential users in OSNs. Section 4 shows comparisons over some existing research methods considering different attributes and performance strategies, and Section 5 concludes our review work with the direction of future research scope.

Related Work
The social networks are becoming complex, and directly reaching out the users is becoming more challenging. As a result, finding influential users has become a key issue in viral marketing. An online relationship refers to user's virtual social relationships in social networks, such as the following relationships on Twitter.
They can be typically expressed as directed edges between each pair of users in social graph. Figure 1 shows a directed social network comprised of four nodes with their related messages. This representation reveals that, for example, the user named "u 1 " is exposed to the content produced by "u 2 " and "u 3 ". It also indicates that none of the three other nodes are exposed to the information shared by "u 4 ".
Every user-generated content contains one or more topics. Usually, the contents published by the users of OSNs are viewed as a stream of messages. Figure  1 represents the stream produced by the members of the network depicted in the previous example. That stream can be viewed as a sequence of users' activities.
Social Influence: A social phenomenon that individuals can undergo or exert, also called imitation, translating the fact that actions of a user can induce his connections to behave in a similar way. Influence appears explicitly when someone shares or like content posted by someone else for example.
Existing methods of finding influential users in OSNs can be generally grouped into two categories: structural and hybrid methods. In the following, we review these two categories.

Structural Methods
Social connections among the users can be used to measure the popularity or influence of users, in which high-degree node (user) is assumed as the authority for the largest information dissemination [11]. These social connections can be either directed (for example, follower/followee) or undirected (friendship relations in Facebook). In-degree centrality refers to the number of edges that connect to the node, whereas out-degree centrality indicates the number of edges that originate from the node. In directed networks, in-degree centrality usually refers to the popularity of a user, whereas out-degree centrality typically indicates the sociality of a user [12].
Again, betweenness centrality of a user is calculated by the counts of the shortest paths that pass through that user to identify influential users in OSNs. For example, Catanese et al. [13] have applied betweenness centrality to Facebook social graph to identify the central nodes of the network. Similarly, Katz [14] determined centrality's influence on the node by all network links that pass through the node.
The TURank algorithm [15] studies the relationship among users along with the users' posts by taking into account the relationship graph network of user-to-tweets. The TwitterRank algorithm [16] uses the topics discussed on Twitter along with the network structure to rank user influence on Twitter.
• Kitsak et al. [18] chose k-shell centrality which is a k-shell index that is assigned to each node indicating its distance to the network core. Nodes with higher k-shells are considered more influential as they are closer to the graph core.
• Sheikhahmadi et al. [19] proposed a method that considered user's connections in different shells along with the k-shell and degree measures for presentation of a hybrid measure for determining spreading capability.
• Chen et al. [20] proposed an improved version of the degree centrality measure. The algorithm is iterated k times to select k nodes, and the node with the highest degree is selected and added to the seed set in each iteration. The edges between the selected node and the other network nodes are disregarded in specification of spreading capability of the nodes. • In [21], authors proposed a method where nodes are first colored such that nodes of the same color will have distance which is higher than a certain threshold value. Then, the nodes are grouped and ranked based on their color and their degree. Finally, the top-k nodes with highest degree within the group are selected as the most influential nodes.

Hybrid Methods
Awan et al. [22] have predicted the stock market using big data retrieved from social media like Yahoo!, daily newspaper, and Twitter. Even policing protests in the United Kingdom are analyzed by social media data [23]. Additionally, cyber risk management [24], mental health condition [25], suicide rate and causes [26], box office's profit [27], etc. are predicted through social media analysis. The prominent Influencers' impact on consumer behavior is evaluated in the work of Pick [28]. The effectiveness of sponsors and influential users' in multiple domestic and business sectors are analyzed in the work of Feng et al. [29]. The study of Anuar et al. [30] has found out the cause of being influenced by Instagram influencers in regards to purchasing intention of fashion items. Earlier approaches focused on the immediate neighbors (for example, counting the neighbors) for detecting influential. One of the first studies that attempted to find the parameters of this approach was taken by Zhang et al. [31].
They have considered users' retweet behavior patterns to investigate how friends in one's ego network influence retweet behaviors. In this model, the designs incorporate the social influence locality into a factor graph model, further leveraging the network-based correlation. Weng et al. [16] have suggested a measure named TwitterRank based on the idea of PageRank to compute users' topical influence in Twitter. This approach is based on the topic query set and it shows the relevance of link structure and the similar interest of users. Al-garadi et al. [32] have calculated the users' interactions and modeled the social graph using the weighted k-core decomposition method to identify the influential spreaders in OSNs. To identify top-k significant users in social networks, Alshahrani et al. [33] have proposed an efficient algorithm based on centrality measures. Moreover, Zareie et al. [34] have proposed a method to select the influential users based on the interest value of friends' interests and connected neighborhood. A new approach named Temporal Topic Influence (TTI) has been proposed by Wang et al. [35] states that analytical applications in online social networks can be generalized as the influence evaluation problem, which targets at finding most influential users. This model is dependent on time interval, content, and structure-aware. Othman et al. [36] have investigated the effect of topic familiarity on listening comprehension and how far certain aspects of the language would likely be influenced by topic familiarity. The UIRank algorithm is based on the commitment of the user's tweet and the attributes of information dispersal in the microblog networks. It computes user influence score iteratively by user follower graph [37]. Most of them ignore the time factors in their work.
To discover highly reliable domain-based influencers at different time intervals, Abu-Salih et al. have suggested a framework with the help of semantic analysis and machine learning modules [38]. Again, there is an on-Demand Influencer Discovery (DID) model, which employs an iterative learning process incorporating the language attention network as a subject filter, proposed by Zang et al. that can identify influential users on any subject regardless of its demand on social media [39]. Their influence convolution network is built on user interaction. But they did not suggest any rank for their influential users. The research of Mittal et al. has discovered and ranked significant users (topic wise) [40]. Their proposed Aggregation Consensus Rank Algorithm (ACRA) is applied on time intervals to generate top-ranked influential users' lists using different Twitter metrics. They analyze the connection between users and graph database to find this significant user. In a framework named Personalized Page-Rank that also identifies influential topical users based on both information gathered from the network and the data retrieved from user actions [41]. Additionally, fake influencers can be a threat to marketing and advertising. A trust-based method for identifying these inorganic users is proposed by Dewan [42]  Among the recent research, Li et al. [43] work on sensitive influence maximization on the different interesting topics of different users. Their proposed algorithm is based on graph pruning and a three-stage heuristic optimization strategy. Mandal et al. proposed Social Promoter Score (SPS)-based recommendation [44]. Kumar et al. [45] found Top-k influential nodes in a community using label propagation. They claimed their work using several real-life data. Another approach of influence maximization is proposed by Li et al. [46]. Their framework is based on a meta-heuristic search algorithm. In a social network, Shi et al. [47] proposed a community detection algorithm established on Quasi-Laplacian centrality peaks clustering.
But, most of the existing approaches overlooked the combination of analyzing trending topics and the temporal factor, which significantly affects the ranking of the influential users. Our current proposed method is the extension of Topical Influential Users Detection (TIUD) algorithm [48]. This is also defined as finding significant users for a set of trending topics and listing the top significant users at different specific time intervals considering familiar neighbors.

Problem Formulation
We first give some fundamental concepts related to the task of identifying influential users in OSNs. Topic: Any specific keyword or a set of related words which illustrates equal thought can be assessed as the topic [52] [53] [54]. For instance, when health is a topic, words linked to health are like doctors, hospital, pandemic, corona, etc.
Trending Topic: A trending topic is a concern that meets an inundation of popularity, often advancing around widespread contemporaneous phenomenons.

Topic Modelling
Social media like Twitter contains more compact messages so generally one tweet is the reflection of one topic. Here, users normally tweet by using hashtags (for example, #Obama, #Ronaldo, etc.). Topic modeling is an unsupervised learn- (T-LDA), an effective extension of LDA is used here for topic distillation. Twitter-LDA is better in topic semantic coherence by presuming the ratio between topic and background words is indifferent for each user's tweets. The graphical representation of T-LDA is shown in Figure 2. The formulation of T-LDA [63] is given below: • Every individual user's topical interest i φ is represented by a distribution over N topics.
• Each word is implied by topic N is analyzed from a background word distribution represented by bw θ or topic word distribution kw θ .
• If The latent value 0 y = , it verifies that the word is from background word distribution bw θ and if 1 y = , it is from topic word distribution kw θ . • y is altered based on the ratio of background words and topic word denoted by π . π is the common factor where the rate of kw θ and bw θ is same. • DT, a D × T matrix, where D is the number of Twitter users and T is the number of topics. ij DT contains the number of times a word in twitterer i s 's tweets has been assigned to topic j t [16].

Influential User Detection Approach
Recent methodologies usually apply PageRank-like algorithm [64] to find influential users in an online social attributed graph G for a given query Q. The framework has the following steps (depicted in Figure 3): • Identify the set of topics using tweets of different users over different time periods.

Topic Distribution
One needs to apply Twitter LDA [62] (T-LDA) in order to extract the topics discussed by the users in Twitter. Table 1 represents sample word-topic distribution

Comparison
In this section, we present a very brief comparison of some existing methods in Table 2 by considering different characteristics, such as identification algorithms, performance evaluation (artificial/manual/direct comparison) and attributes.

Conclusion
In this paper, we briefly review existing research works on finding influential users in OSNs. Earlier methods mostly focused on social connections with less attention to the content generated by social users. Later approaches considered both network and content in order to get topic-oriented influential users. But, the major limitation of those approaches is that they overlook the combination of analyzing trending topics and the temporal factor, which significantly affects the ranking of influential users. Most recent works now pay more attention to dynamic networks to track time-based user activities to rank the most influential users in OSNs. In addition, social users have different degrees of interest in different topics that vary over time and as a result, users' social influences also change over time. Researchers need to focus more on dynamic social graphs where the temporal factor has a great impact on both the social connections and users' activities in order to find temporal influential users on trending topics.