Evaluation of Microblog Users’ Influence Based on PageRank and Users Behavior Analysis

This paper explores the uses’ influences on microblog. At first, according to the social network theory, we present an analysis of information transmitting network structure based on the relationship of following and followed phenomenon of microblog users. Informed by the microblog user behavior analysis, the paper also addresses a model for calculating weights of users’ influence. It proposes a U-R model, using which we can evaluate users’ influence based on PageRank algorithms and analyzes user behaviors. In the U-R model, the effect of user behaviors is explored and PageRank is applied to evaluate the importance and the influence of every user in a microblog network by repeatedly iterating their own U-R value. The users’ influences in a microblog network can be ranked by the U-R value. Finally, the validity of U-R model is proved with a real-life numerical example.


Introduction
Microblog, is a platform based on user relationships for sharing, transmitting and acquiring information, on which users can establish individual communities, update information with around 140 characters and achieve realtime sharing via WEB, WAP and a variety of clients [1]. As the development of WEB2.0, microblog, which is a booming information communicating platform, develops rapidly. Since there are a large number of active users and hotspot information, the influence of microblog on information transmitting, the change of living habits, et al., cannot be ignored. While microblog users' influence refers to as a user's influence in the microblog community. The greater the influence is, the more attention the netizens pay to, and then impact on network will become remarkable. So microblog possesses vast potential for future development theoretically and practically, especially in the field of word of mouth marketing, information mining, public opinion controlling and so on.

Reviews
At present, many scholars have started to pay attention to and study the microblog or twitter (in China, microblog is ordinarily called, so in the context "microblog" is used) all over the world. Also, the hot areas of these researches include the motivation and behaviors of microblog users, besides microblog social network structure. The evaluation of microblog users' influence (microblog influence) has also become a new research focus in the analysis of the social network. Foreign studies mainly discuss Twitter, which is considered as the pioneer prototype of microblog. In addition, AKSHAYJAVA et al. (2007) [2] studied the data sets of Twitter from 1st April 2007 to 30th May 2007 and found that the main types of user intentions are: daily chatter, conversations, sharing information and reporting news. Then, they analyzed the microblog network on the growth, degree distribution, geographical distribution of Twitter users, and so on. Besides, TEUTTE (2010) [3] analyzed Twitter from network dynamics, including the description of the microblog network's changes by the growth of in degree and out degree, network density, betweenness and so forth. KRISHNAMURTHY et al. (2008) [4] explored the structural characteristics of microblog network, and identified distinct classes of microblog users and their behaviors. Meanwhile, Chinese scholars mainly apply their minds to hotspot discovery in microblog network, propagation mechanism and user behavior characteristics. Caishuqin, Zhangjing (2012) [5] designed some metadata models for micro-blogging content through the structured metadata acquired from open APIs. And the hotspot discovery process was regarded as a value-added process of the original materials to clusters of hot products. Finally, a complete production and processing model was established.
However, there are a few academic researches on microblog users' influence presently. GABRIEL W (1994) [6] made an evaluation of Twitter users' influence learning from PageRank algorithm and considered the number of friends to be an important indicator of users' influence. In other words, it means that the more the friends are, the bigger influence it has, and the more easily it has effects on others. Furthermore, the basic equations are consistent with PageRank algorithms. However, when taking into account the situation in China that a large number of microblog fans are traded, the model is not fully applicable. In fact, YUTO Y (2010) [7] proposed TU Rank (Twitter User Rank) based on User-Tweet Graph to rank the users, which laid a lot of emphasis on the quality of the content, while the influence of fans' retreating was ignored. KLOUT [8], a famous assessment service on the influence of social network sites, uses the relationship among Facebook, LinkedIn and Twitter, and the user behavior (initiating a session, comments, forwarding, etc.) data. Assessing the users' influence by Klout algorithm, Klout believes everyone has influence on the era of social media. Also, Klout measures your influence in the social networks, and give the insight into whom you do effect on and on what topics you are affected. Klout measures your influence index on a scale of 0 -100. Kang (2011) [9] advanced a new algorithm to evaluate the influence of nodes in microblog social network through the users' behavior and relationship based on the SINA microblog. They considered the frequency of posting microblogs as a factor to evaluate the users' activity and presented Behavior-Relationship Rank algorithm after combining the users' activity with PageRank. But they only took into account the frequency of posting microblogs as a factor to evaluate the users' activity without referring to interaction behaviors such as the users' mentioning friends, commenting, and forwarding microblogs etc., which also have effects on users' influence.
In this paper, on the basis of previous studies, we learn from the PageRank algorithm which is used to evaluate the page of search engine. Then, considering the factors that include users' activity represented by the frequency of posting microblogs and interactive positiveness, we propose a U-R model, which is an algorithm for evaluating users' influence based on PageRank and microblog users' behavior analysis. And this model could cover some shortages of above models.

Hypotheses
In microblog networks, the description of friend rela-tionship varies with service providers. For instance, when we use SINA microblog, the relationship is "follow and followed", while using Tencent microblog, it is "listen and listened". In this paper, we adopt the "follow and followed", shown in Figure 1. For example, if user A follows user B, A is a Follower to B, whereas B is a Followee to A.
According to the characteristics of microblog, combining with social network analysis theory, we propose the following definitions:  Node: every user is a node in a microblog network, such as user A and user B (See Figure 1).  Edge: that is the relationship of "follow and followed" among microblog users, between which the edge has directionality.  In degree and Out degree: the number of Followee is the out degree of user nodes, instead, the number of Follower is the in degree. Additionally, PageRank algorithm is based on the following two assumptions [10]:  If a page is referenced for multiple times, it may be very important. In spite of a webpage isn't referenced frequently, if it is referenced by important webpages, it still may be important. The importance of a webpage is transmitted averagely to the pages referenced.  Assume that at the beginning, access to a page of a webpage collection randomly, then continue to browse the pages following the current page links, and the PageRank value is the probability to browse the next page.
In other words, if a webpage is linked by many significant webpages, it means that the content of this page has been recognized and trusted largely. Moreover, the content has high authority and should have a higher ranking. Therefore, the equation [10] that calculates the PageRank value of the webpage is: where PR(X) is the PageRank value of webpage X; N(X) is the out degree (the number of the links from this webpage to other webpages); M(X) is the page collection that points to webpage X. This is a recursive equation, and the PageRank value of a webpage will be evenly distributed to each forward link. In addition, PageRank value is a rank value about the indicator of the importance of a webpage and the value is generated by the hyperlink structure of the network. Then PageRank value of any webpage can be calculated by other pages' and the specific number of hyperlinks. In other words, as for each webpage linked into, the PageRank value is divided by the respective number of links out. Next, sum up them. In the calculation, we make simple modification to Equation (1) by adding damping coefficient [11] p which means that after browsing a webpage, the user will continue to browse a webpage linking out in the probability 1 − p and the probability to select a random page to browse is p. Namely, According to empirical analysis, the p is always set as 0.85, so that the result is convergent.
Hence, we propose two assumptions of U-R model drawing on PageRank algorithms:  If a microblog is forwarded and commented for multiple times, it may be very important. On the other hand, although the microblog isn't forwarded and commented frequently, if it is forwarded and commented by important microblog users, it still may be important. The influence of a user is distributed equally to the other users he/she follows.  Assume that at the beginning, access to a user in the microblog user collection randomly, then continue to browse microblogs following the current user's following, forwarding and comments, the U-R value is the probability to browse the next user.

Model Establishment
In the microblog network, if user A follows user B, A is a follower of B. Then, A can see the microblog posted by B, but B cannot see the microblog posted by A. The flow of information in the microblog network is completed by follow and followed. Actually, the structure of microblog network is similar to the link model of webpages. A follows B is equal to A votes B. Hence, we are able to rank influence of microblog users at the basis of PageRank. The more Followers a user has, the greater influence of information transmitting he has in the microblog network. And the microblog he has posted will appear on webpages of tens of thousands of Followers; therefore, he has bigger weight in the process of calculating his authority. On the other side, the vote has bigger weight if he follows other users. That is to say, the influence which belongs to the user he has voted will become bigger. Cite an instance, if user A has 100,000 Followers and follows user B, A will see the content posted by B. As a result of A's forwarding, this microblog is presented to 100,000 Followers. In this way, A is like an amplifier to enlarge the effects of information forwarding. Consequently, B is highly authoritative in microblog network. However, a condition relative to B's authority is whether A will forward the B's microblog, the probability of which is similar to damping coefficient [9] in PageRank. Thus, this paper proposes a U-R model, which combines mind of PageRank with user behavior, to evaluate microblog users' influence in a new way. This model synthetically considers the user behavior and microblog network structure and identifies the most influential network node by iterative calculation of each node's UR value.
Additionally, in PageRank, UR value of a webpage is evenly transmitted to the pages linking out of it, then, as a result of this, we will overlook the importance of the page itself. When PageRank algorithm is applied to the analysis process of microblog network, the weight ratio of user behavior is a standard to distribute PageRank value. Under this standard, the user with higher weight will accordingly obtain higher PageRank value and the PageRank transmitting is nonuniform. Eventually, active users will have more authority than inactive users in the network. The shortage, merely relying on the relationship of follow and followed to evaluate the influence, will be overcome by above process and the model can better reflect the objective reality. At last, based on the above analysis and combining the user behavior analysis, we advance a new U-R model to evaluate microblog users' influence.
where UR(u) and UR(v) represent the influence evaluation value of microblog user u and user v; p is the probability for v to forward u's microblog, and here p is set as 0.5; M(u) is the collection of u's Followers; A (v,u) is user u's UR value ratio assigned by user v, which is determined by the ratio that the u's weight account for of the total behavior weights of v's entire Followees. The equation is In Equation (4), W u is user u's microblog behavior weight. N v represents the number of Followees of u (node u's out degree).

Weight of User Behavior Influence
According to the description of microblog user behavior, microblog behavior is a major factor of microblog's in-fluence, such as the frequency of updating microblog, interaction with other users, and so on. At the same time, we need to consider the users' active degree and the enthusiasm to participate in interaction. So the higher active degree and interactive positiveness the user has the greater influence he will generate. In order to facilitate the subsequent analysis, a model on the weight of user behavior influence is defined to describe the active degree of microblog users and interactive positiveness. The model is shown in Figure 2.
If W represents the weight of user influence, then where X i is the user's positiveness; Y i is the user's interactive positiveness; a and b are both weighting coefficients. Then, the users' active degree is defined as the frequency of updating microblogs under a unified time scale. Meanwhile, the definition for interactive positiveness is the state of users' mentioning, commenting and forwarding under a unified time scale. That is Among above equations, T is a unified time scale. In order to objectively characterize users' active degree and positiveness indicators, we unify the T time. Also, the number of microblogs that users have posted is Q. A represents the amount of "@", and the number of comments is shown by C. The number of forwarding is R. In addition, c, d, e are all weighting coefficients. Since the impact on a user's weight varies with the user behaviors, weighting coefficients can be given different values. Then, after calculating the user's weights on active degree and interactive positiveness, then we can get the weight on user behaviors.

Example of U-R Model
The microblog data set [12] used in this paper, which we have preprocessed, is based on the Tencent Weibo provided by KDD Cup 2012. In order to obtain accurate analytical results, the abnormal data such as whose Tweet is less than 10 etc. is removed. At last, the effective sample size is 809,732. And the data structure of the sample is presented in Table 1.
Due to the large user group of microblog network, in this paper, only 10 nodes are selected from the data set as a sub-network of the entire microblog social network. We use these nodes to achieve the U-R model calculations and explore the information transmitting and nodes' influence. Next, the relationship (follow and followed) among nodes is shown in the form of adjacency matrix.
According to the above weight setting model and attribute analysis in the samples, we make the following settings, a = 0.4, b = 0.6, c = 0.5, d = 0.3, e = 0.2, T = 100. The network attributes and user weights of the sample are presented in Table 2.
At the basis of the relationship among nodes and Equation (3), the iterative equation is as follows: It can be concluded from the calculations of 10 nodes selected that there isn't a positive correlation between the number of Follower and the user's influence. For example, although node 8's number of Follower (in degree) is more than node 2, node 8's microblogs posted by users themselves and interactive positiveness is less and the influence is smaller. In this way, U-R model covers some shortages of the algorithm model proposed by GABRIEL W [6] and makes up for PageRank shortcoming, simply relying on network relationship. With combining users' weight model with PageRank, the model can evaluate the users' influence and better reflect the objective reality.
Then, the user's weight proportion assigned to the node is then calculated in accordance with Equation (4). For instance, node 4's following nodes are 1, 2, 3, 5, 6, 9, node 1's weight proportion assigned by node 4 is   Currently, microblog is the most popular online social network, for it has not only the characteristics of the social network, but also clear ones of media, it is also Each node's initial value is 1. Next, each node's value  called "social media". This paper can reflect the influence of microblog users veritably through the UR algorithm which is simple and clear, and that can be helpful for marketing, public opinion control etc. However, how to set the values of the damping coefficient p in U-R algorithm and the weighting coefficients a, b, c, d, e in weight model is a hypothesis, which is necessary to make specific judgment based on the actual situation. Additionally, the U-R model doesn't accurately reflect the quality of microblog content, while in a microblog network, it is easy for the higher quality content to be spread in a viral way and these microblogs tend to have an impact on other users. These two problems remain to be studied further.

Acknowledgements
This paper is supported by the fundamental research funds for the central universities under grant No. 72115096.