Facebook Dynamics : Modelling and Statistical Testing

In this work we study virtual social networks known as Facebook. It is used by millions of people worldwide, gathering a combination of virtual elements and real world components. We suggest a probabilistic model to describe the long-term behavior of Facebook. This model includes different friendship connection between profiles, directly or by suggestion. Due to web’s high interactivity level, we simplify the model assuming Markovian dynamic. After the model is established we propose Complete Transversality (CT) communication concept. CT describes people interaction that reflects profile behaviour and leads to estimators that measure this interaction. Then we introduce a weakness version of CT named Segmental Transversality (ST). Within this framework we develop estimators that allow hypothesis testing of CT and ST. And then, in ST context we propose performance measures to address a priori segmentation’s quality.


Introduction
Social networks have emerged as a communication tool of unexpected impact.
Frequent contact between people through these networks gives rise to virtual relationships developed according to their interests.This study's approach follows [1] which poses emphasis not only in the individual behaviour but in social interaction within the network.
One of today's challenges is to develop accurate tools to identify influential users by sectors and markets, and to understand information flow dynamics.In turn, it is difficult to fully observe a social network; therefore statistical problems and probabilistic modelling are important issues (see [2] [3] and [4] for surveys on this subject).Besides this topic, social networks also provide examples of situations of unidentifiable or censored data models and this makes them particularly interesting.
The above concerns lead us to study virtual social networks phenomenon and we restricted our analysis to Facebook.For this we develop a model which, although does not describe it in full reality, it is useful as an approach to network's dynamics study.Naturally this model leads to graph theory which relates to work in [5] and [6].
We will use mathematical tools and statistics from stochastic processes field [7] [8] [9], trying to answer questions such as the existence of transversal communication or how to find if a network segmentation proposed is optimal within a certain communication behaviour between segments.This paper is structured as follows.In Section 2 we address Facebook modelling using tools of Markov chains, we introduce the concept of complete transversality in the communication, and in this context, we try to find the distribution of random functions involved in the model.Section 3 proposes two statistical hypothesis tests, one to prove network's CT and the other to prove CT between network segments using U-statistics theory and asymptotic convergence theorems presented in [10].We end this section, defining useful performance index to measure segmentation's quality.Section 4 is devoted to conclusions and acknowledgment.
Section 5 is an appendix with proofs of the results obtained in the preceding sections.
Further and related works on this subject can be read in one of author's PhD theses [11].

Model Description
In this section we propose a model for describe Facebook dynamics.
Consider t ∈  .Let t  denote the set of all internet users at instant t and t  the set of Facebook's profiles at time t.Of course ⊆   and, we'll also suppose that once a profile is created it cannot be eliminated.(Actually a profile can be eliminated, but this is tedious and difficult to do, so we disregard this behaviour.)Then We also denote ∞  and ∞  to the sets For each instant t, we will model the network with a random graph where the nodes represent profiles and the edges represent friendship links.
Definition 1.Let , t f g ∈  .We define the random "friendship" function at time t as the function Remark.The "friendship" relationship is symmetric, so function t α is also symmetric.Then, the random graph determined by t α is bi-directional and we can define the adjacency matrix as follows.
to the set of the lower subdiagonal's entries of , we have that: For (1) calculation's, we describe through events the profile's actions which have impact on the transition.These events will be linked to certain indices that we will construct to measure affinities between profiles.
We will use the following notation.Let A be any set, we will denote: to the set of distinct pairs of A's elements, and by to the set of triples of A's elements they are different by pairs.Definition 3. Let , p p ∞ ′∈  , with p p′ ≠ .We define the p "image index" over p′ as the function Remark.We will suppose that the network lies at steady state, this implies that image between profiles has also evolved to a steady state, so the function X does not depends on time t.Besides, function X is not symmetric, non observable and monotonic.Definition 4. Let , f g ∞ ∈  , with f g ≠ .We define f "image index" over g as the function To triples ordered of users and triples ordered of profiles we define the following indices.
Definition 5. Let , , p p p ∞ ′ ′′ ∈  .For the triple ordered of users ( ) we define an "image index" as the function that assigns to each triple, a real number that represents acceptance level for the action p suggests to p′ to be a friend of p′′ .Remark.As with X, U doesn't depend on t, is non symmetric and non observable.
Definition 6.Let , , f g h ∞ ∈  distinct by pairs.For the triple ordered of pro- files ( ) , , f g h we define the "index image" as the function ( ) distinct by pairs.
Then, given , , f g h ∞ ∈  , we will suppose that there are 0 (The breakdown of friendship may be due to that f take decision to eliminate g or vice versa).
ii) "f successfully requests friendship to g" (The request of successful friendship arises when f asks to g for friendship and g accepts to f as his friend).
iii) "f successfully suggests to g, h' (The suggestion of successful friendship is given when f suggests h to g, g requests h's friendship and h accepts).
Then, if we denote with ( ) t I f to the interventions of f regardless its effects in the transitions from one instant to the next from  to  , these can be de- scribed in a disjunct union of the following events: , and the transition probabilities in one step from  to  can be expressed in the following formula: Proposition 1.The one step transition probability from  to  of the Markov chain { } t  is given by: ( , , ,

Complete Transversality
Complete Transversality (CT) in a social network is associated to a certain communication behavior.This behavior implies that relationship probability between two profiles is always the same.No matter who the profiles are.CT arises from social scientific theories [12] that poses that massification of social networks would bring an horizontality in communication and transversality in connection between people, overpassing social, economic, ethnographic and other differences.Virtual social networks would bring balance and democracy to all people connection.
In terms of the model, this could be reflected in the "image index" X of one user p over another p′ and in the "image index" U of the triple ordered of users ( ) , , p p p ′ ′′ distinct by pairs.So, averages of regression functions ( ) takes the same value in ( 2) and ( 3).Let's say they are equal to 1 C ∈  for all distinct user pairs and 2 C ∈  for all triples ordered of users distinct by pairs.Under this conditions, "image indices" given by ( 2) and ( 3) are reduced to ( ) When network follows this behaviour, the following results can be established.
, is a random variable with Bernoulli distribution and parameter p, 0 1 p < < , with the same distribution and independent of the friendship indicator of any pair of distinct profiles.
From the results exposed in former Theorem we can conclude that, for ∈   , ergodic distribution under CT is

Test and Estimation
We are aimed to discuss the CT's validity in social network Facebook.For this, we will propose two statistics based on samples of N profiles and will study their asymptotic distribution under CT when N increases.Besides, we will present the CT tests.Further, in this section we will introduce a lighter version of CT called Segment Transversality (ST) related to a given segmentation, that allow us to elaborate segmentation quality index.

Average Communication between Profiles
Let ( ) this statistic averages proportion of friends who have profiles on the sample and therefore measures "sample's communication average".
Focusing on long term dynamics, we have seen that under CT, the random  ( ) We want to find E N 's asymptotic distribution, so we study asymptotic behaviour of ( ) ( ) when sample size N grows towards infinity.
If we denote ( ) ( ) forms a triangular array.
Theorem 3. If the triangular array { } ( ) , for all N and for all i, and Lyapunov conditions is met, that is, ( ) ( ) then ( ) Proof 1. Hypothesis i) is trivial because for i i′ ≠ and N fixed, Then, the hypothesis ii) and iii) are met.
Let 2 δ = .We will see that iv) holds.For this, we will calculate ( ) ( , for all and for all , and Lyapunov condition given by ( 7) can be verified for 2 ( ) Consequently the hypothesis i)-iv) holds and we conclude that ( ) Returning to the centered statistic expression and to the ex- pression of N s given by ( 9) results the following: As E N is a communication estimation between profiles, if we select different profile samples under CT, we shouldn't detect differences among p's estimations.
To study this, we propose the following hypothesis test to compare communication average.
This statistic, under CT assumption, results and, given a signification level α, we obtain the following critic region

Real Data Testing
We perform such an experiment in a real profile network that gives permission to the authors for sampling.For confidential reasons we cannot release any of the data used to make calculations.We can state that we take two independent and disjunct samples of size 75 N = .The statistic value was ( ) ( ) This results indicates the social network Facebook is a platform in which communication between people or groups of people is it NOT TRANSVERSAL.

Mean Square Deviation of the Communication between Profiles
Let ( ) be an statistic that estimates the mean square deviation of the communication between profiles respect to its mean.
In order to find the asymptotic distribution of T N we use properties of U-statistics introduced in [13].
Suppose that 1 , , N X X  are independent and identically distributed random variables and that : is some symmetric function respect to permutations.Definition 7. A U-statistic of order r with kernel h is defined as ( ) We state the following theorem whose proof can be seen in [10].Theorem 4. Let U N be a U-statistic of order r with kernel h.Suppose that 0, , when .
Then we have that T N is a U-statistic of order 2 with kernel ( ) ( ) and the limit expression on ( 16) is when N → ∞ .
We can make a test to prove CT by comparing the mean square deviation of two independent populations of profiles.For this we take two independent sam- , , , , then, by (17), we obtain the asymptotic distribution of the statistic for the comparison of mean square deviations under the context of CT, that is, Similarly as for the test comparing the average proportion of communication between profiles, we use the same real profile and, on his network of friends, we take two independent and disjoint samples and calculate the statistic and the critical region, concluding that the hypothesis of CT is rejected again.

Segmented Transversality
A context of CT in a social network like Facebook is far from reality as evidenced by the findings of the two tests that we made.Is reasonable to think that profiles tend to cluster in different segments according to social criteria such as political ideologies, economic interests, musical tastes, ages, etc., and that these Advances in Pure Mathematics segments are also related to each other.
We introduce the concept of Segmented Transversality (ST), that is, CT between segments.Then, making a priori segmentation on the network, we will introduce a statistic representing the communication between pairs of segments and we will prove CT between the profiles of the segments.Let 1 , , k S S  be a partition in segments of ∞  .We notice with , and we make a random stratified sampling by segments as follows: randomly inside of 2 S , and so on until Remark.Friendship's random functions ( ) ( ) has Bernoulli distribution with parameter rt q , for all i r f S ∈ and for all j t f S ∈ .
That is, CT between segments means a distinctive homogeneous behavior in the communication between them.
So, let , , be the average proportion of friends of the profiles of r S in the segment t S .
Then, under CT between r S and t S , we have that ( ) ( ) Of the same way as we obtain the asymptotic distribution of the centered es- of the expression (6), we can obtain the asymptotic distribu- If we want to test CT between a pair of segments r S and t S we make a stra- tified random sampling independent from the previous one in which sen randomly inside of 2 S , and so on until , , , , and we construct the statistic , .

Quality on Segmentation
If we divide the network into k disjoint segments, we can take all possible pairs of those k segments, 2 k C , and make a total of  A one in this matrix means that segment i S with segment j S behaves distinctly in the sense of transversality communication.Zeros in the upper triangle matrix doesn't mean anything.Each segment compare to each self is homegenous (zero in the diagonals) except in segment 6.This could mean that further segmentation within it might improve audience segmentation.Nevertheless, segment 6 distinguish from other segments anyway.
If we sum the ones under the diagonal and divided by the number of segment combinations (see ( 23)), we calculate the performance of this segmentation which is 40%.The more the zeros the lower the performance index.
Following this framework we can improvement this segmentation by: 1) fusion homogenous segments 2) explore intra-segmentation in the cases were there was one in the diagonal For actions in the first group, we look that in this toy example S 1 and S 2 doesn't differ in communication behaviour, so we could consider to group them as one segment.We could summaries this saying that gender doesn't segment among young people (under 20 years).We calculate again the matrix with this augmented segment and calculate the performance.Then occam's razor lead us to select the least segmented partition when we have two or more with same performance.
For segment 6, this is female over thirty years, we calculate that it is inhomogenous with itself, so we could try a sub-segmentation by education degree or by motherhood.Then repeat the five tests between the new segment with the previous ones and calculate performance of this new matrix with one or more rows.
Of course this iterative method implies significant work with estimation, data recompilation, amount of data, independent sampling, etc. that although might be of extreme relevance it's beyond the scope of pure mathematics and poses a great source of scientific challenge and interdisciplinary work.

Conclusions
In this work we analyze Facebook social network, modelling it with a Markov Chain and several random variables representing profiles friendships.We further propose communication behaviour between all profiles called complete transversality that assumes no bias between profiles willingness to connect as friends.This CT behaviour leads to estimations that allow us a hypothesis test by means of mean square deviation to reject the CT assumption.This might be an obvious conclusion (because people behaves within Facebook as they behave in real context), but it has all the hypothesis testing machinery behind which gives it strong rigorosity.
Next step in our work was to weaken CT and, for this, we introduce ST (segment Transversality).This is given a determined network segmentation, and each segment profile connects to any other segment profile with the same probability.(Of course this probability can change with different pair of segments.) In this ST scenario we were able to compare between two entire segments and : : 0, 0, 1 0 : : 0, 0, 1 0 : : 0, 0, and, for fixed , f g   : 1, 1 Then, ( ) ( )

J.
Bavio et al.DOI: 10.4236/apm.2018.84021385 Advances in Pure Mathematics significance level, we Advances in Pure Mathematics reject CT hypothesis.

.
Advances in Pure Mathematics of any pair of profiles in long term and, because of the large size of the network, p is probably less than 0.5.Thus, we keep the segmentation and make a stratified random sampling segment m times, with m sufficiently large, and calculate m times the index defined in (we can observe the histogram representing the distribution of quality on segmentation.If the most of the times this measure results, for example, greater than the mean of the observations, we continue segmenting according to the criteria which has been used, otherwise it is desirable to modify the segmentation criteria.Let's illustrate this with an example.Suppose we segment people by age (>15, <20, >20 and <30, >30) and gender (M, F). , suppose we conduct the 15 hypothesis test for segment transversality and obtain the following segment adjacency matrix }