Mathematical Model and Algorithm for Link Community Detection in Bipartite Networks

In the past ten years, community detection in complex networks has attracted more and more attention of researchers. Communities often correspond to functional subunits in the complex systems. In complex network, a node community can be defined as a subgraph induced by a set of nodes, while a link community is a subgraph induced by a set of links. Although most researches pay more attention to identifying node communities in both unipartite and bipartite networks, some researchers have investigated the link community detection problem in unipartite networks. But current research pays little attention to the link community detection problem in bipartite networks. In this paper, we investigate the link community detection problem in bipartite networks, and formulate it into an integer programming model. We proposed a genetic algorithm for partition the bipartite network into overlapping link communities. Simulations are done on both artificial networks and real-world networks. The results show that the bipartite network can be efficiently partitioned into overlapping link communities by the genetic algorithm.


Introduction
Many interesting systems can be represented as networks [1]- [4].The networks are composed of nodes and links, each node represents a unit and each link represents a relation between two nodes.Since some nodes or links may have the same function in complex system.One of the most important topics in the area of networks is the community detection, which is a universal problem in many disciplines such as sociology, computer science and biology [5]- [7].
The communities are dense subgraphs induced by a set of nodes or links.If the community is induced by a set of nodes, we call it node community.If a community is induced by a set of links, we call it link community.When we partition a network into node communities, each node must belong to one or more community, some links might belong to no community.When a network is partition into link communities, each link must belong to one community, and each node might belong to one or more communities.By partition the network into link communities, we can find overlapping node or link.
Although most research paid more attention to node community detection, some researchers have investigated link communities and cliques [8]- [12].In some real-world networks, a link is more likely to have a unique identity while a node often has multiple functions, so the link communities might be more intuitive and informative than the node communities [13] [14].
Given a unipartite network with M links and N nodes, let { }  be the number of nodes in subgraph induced by c P .Ahn [8] defined the partition density D as follows ( ) ( )( ) In [12], the authors proposed another partition density H as follows: ( ) Given the number of communities, we can partition the unipartite network into link communities by maximize D or H.
Besides unipartite networks, there is another special category of network, where nodes are partitioned into two disjoint subsets, there is no link within the same subset.This type of network is called bipartite network.Some real-world relations are more suitable to be represented as bipartite networks [15], such as plant-animal network, scientific publication network, artistic collaboration network, order-item network, paper-author networks, event-attendee networks and so on.
Some research has paid attention to the node community detection problem of bipartite networks [15] [16].In [15], the authors proposed a projection-based algorithm for node communities detection in bipartite network.In [17], the authors develop a modified adaptive genetic algorithm (MAGA) to detect the node communities in bipartite network.In [18], the authors propose another bipartite modularity detection method which can detect node overlap community.In [19], the authors proposed a hierarchical divisive heuristic for approximate modularity maximization in bipartite graphs.In [20], the authors proposed an algorithm Bitector to mine overlapping communities in large scale sparse bipartite networks.In [21], the authors proposed an approach for detecting overlap node communities in a bipartite network based on dual optimization of modularity.In [22] [23], the authors proposed weighted binary matrix factorization framework to detect overlapping communities in bipartite networks.Although the algorithms above can find node communities in bipartite network, current research activity has paid no attention to the link community detection problem in bipartite networks.
In this paper, we will investigate link communities in bipartite network, define the partition density of link communities in bipartite network, and formulate the link community partition problem of bipartite network into an integer programming model.Then we design a genetic algorithm for detecting link communities in bipartite network and conduct validations on some artificial and real-world bipartite networks.By the model and algorithm, the communities including two sets of nodes in bipartite network can be identified simultaneously.( ). .

Link Community Partition Density of Bipartite Networks
We can see that the maximum of H is 1 and the minimum value of H is 0. 1 H = when each community is a complete bipartite network and 0 H = when each community is an empty bipartite graph.Given the number of communities, we can find the optimal link community partition of bipartite network by maximizing the value of H.

Integer Programming Model for Link Community Detection of Bipartite Network
Given a bipartite network ( ) ), we assume that the number of link communities is K and find the optimal link community partition by maximizing the partition density H.This problem can be formulated into an integer programming model.
( ) is the adjacent matrix of the bipartite network, where We also define binary variables ijk x , ik y and jk z to represent the membership of link ij l , node i u and node j v for link community k: The link community detection problem of bipartite network can be formulated into the following integer programming model-Model 1. 1 max { } The objective function ( 1) is to maximize the link partition density H. Constraint (2) means that every link belongs to one community.If there is no link between node i u and j v , then variables for any community k.Constraints (3) and ( 4) indicate that if link ij l belong to community k, then its adjacent nodes i u and j v must belong to the same community k.Constraint ( 5) and ( 6) mean that if a node i u (or j v ) belongs to community k, then there is at least one link adjacent to node i u (or j v ) belonging to community k.Constraints (7) (8) (9) indicate that the variables are binary.
Since there are a great many of variables in Model 1, it may have large memory overhead when solving the model directly.To decrease the number of variables used, Model 1 can be expressed by using relationship matrix.
Suppose that are two disjoint nodes sets, and the link set of bipartite network.Define two incidence matrix RS and RT as follows: Define the binary variables as follows: Based on the incidence matrix and the above variables, the link community detection problem of bipartite network can be reformulated into the following integer nonlinear programming model, Model 2.
. . 1, 2, , , 0,1 ; 1, 2, , ; 1 Where N is the number of nodes in the network, N p q = + .The objective function (10) is to maximize the link partition density.Constraint (11) means that every link belongs to one community.Constraint (12) (13) mean that, if there is some adjacent links of node i u ( j v ) belonging to community k, then node i u ( j v ) must belong to the same community k.Constraints ( 14) (15) mean that if node i u ( j v ) belongs to community k, then at least one link adjacent to this node must belong to community k.Constraints (16) (17) (18) indicate that the variables are binary.
In Model 1 and Model 2, since every link can belong to one and only one community, we might obtain the result that a pair of nodes belongs to two communities, but the link between this pair of nodes belongs to only one community.To reduce this drawback, we can revise Model 2 into the following model-Model 3. 1 max 1, 2, , ; . . 1, 2, , , (11') means that every link must belong to at least one community.Using model 3, we can partition the network in Figure 1 into two communities, and link (3,10) belongs to two communities.Each community is a complete bipartite subnetwork, and the optimal objective function value is 1.

Figure 1.
The bipartite network consists of two overlapping communities, each community is a complete bipartite network, they are overlapped by nodes 3,10 and link (3,10).This bipartite network can be partitioned into two communities by model 3, and the objective function value is 1.

Genetic Algorithm for Link Community Detection of Bipartite Network
Although we can solve Model 2 or Model 3 to partition a bipartite network into link communities for small size of bipartite network.It is difficult to solve the integer programming model for large bipartite networks which might be a NP-hard problem.In addition, most of the algorithms for community detection need some priori knowledge about the community structure like the number of communities which is impossible to know in real-life networks.In [12], the authors propose a link community detection algorithm based on the ideas of genetic algorithm and self-organize map (SOM) algorithm, which aims to find the best link community structure by maximizing the network partition density.The algorithm does not need any priori knowledge about the number of communities, which makes the algorithm useful in real-life networks.The algorithm outputs the final link community structure and its corresponding overlapping nodes as the result and does not impose further processing on the output.In the following, we will design another genetic algorithm for link community detection of bipartite network.
First of all, we need to design a chromosome representation suitable for the link community detection problem.In our implementation, the chromosome is represented by a matrix l is assigned to community c, otherwise, link m l is not assigned to community c.Matrix D can be calculated from matrix B according to the following equation: The bipartite network is represented by two incidence matrixes RS and RT , two weighted incidence ma- trixes ZS and ZT , link adjacent matrix A and weighted link adjacent matrix Q.
d represent the nodes' degree of nodes i u and j v , which is the number of links incident to nodes i u and j v respectively.The link adjacent matrix A and the weighted link adjacent matrix Q can be calculated by the following equations:

Q ZS ZS ZT ZT = +
The weighted link adjacent matrix Q means the probability for a random walker go from one link to one of its adjacent links across their common node.And this can be regarded as the possibility of two adjacent links belonging to the same community.

The Genetic Algorithm Main Functions • Input
Input the number of nodes p for node set U and q for node set V respectively, and the number of links M of the link set E in bipartite network, the maximum number of communities K, parameters , , α β θ , where ( ) ( ) Input the incident matrixes RS, RT.Calculate the weighted incident matrixes ZS and ZT, the link adjacent matrix A, and the weighted link adjacent matrix Q.Given the number of individuals N, the maximum epochs ( ) cross over to produce two temporary individuals ( matrixes) ( ) • Step 4. Population Mutation Random select mutation prob N temporary individuals (temporary matrices), do mutation operation on each temporary individual. •

Partition Matrix and Fitness Evaluation
For every individual i B , calculate the partition matrix i D according to the Formula (20).
For each community s, 1 s K ≤ ≤ , let ( ) , is a column vector whose element is a non-negative integer.A non-zero element in ( ) E s represents that the corresponding node of the node set U belongs to community s. ( , is a column vector whose element is a non-negative integer.A non-zero element in ( ) E s represents that the corresponding node of the node set V belongs to community s.Let ( ) ( ) e j s ≥ ).
( ) f j s = ) means that node j u (or j v ) belongs to community s.The fitness value of individual i B is defined by the link partition density of matrix i D , which can be calculated by the following equation: , where In this paper, we revised the s-th column of ( ) by adding a fraction of the s-th column of ( ) (where ( ) i D t is the partition matrix corresponding to :, 0.1 :, if .

1
, j j M ≤ ≤ .There are three mutation rules can be used in this genetic algorithm, i.e. exchange the j 1 -th row and the j 2 -th row in ( ) i W t , or replace the j 1 -th row by the j 2 -th row in ( ) i W t , or replace the elements of the j 1 -th row with a randomly selected number in [0.0,1.0].Three rules lead to no significant difference in this genetic algorithm.In the following simulation, we replace the j 1 -th row with the j 2 -th row in ( ) i W t .The other elements in ( ) i W t remain unchanged.

Population Self Organizing Map
For every link, find the community it belongs to and calculate its community ID variance.If the community ID variance of a link is larger than a threshold, then increase the weights of this link to its community and the weights of its neighbor links to the same community.If the community ID variance of a link is smaller than the threshold value, then decrease the weights of the link to its community and the weights of its neighbor links to the same community.This process can improve the quality of the partition by eliminating wrongly placed links.
For 1, , i N =  , do Self Organizing Map (SOM) operations on individual (chromosome) i W as follows: • According to temporal matrix i W , calculate its partition matrix i D′ ; • For 1, , j M =  , do the following operation on link j l .
• Find the community ID that link j l belongs to.The community ID corresponds to the maximum element in the j-th row of i D′ (the maximum element must be 1).Suppose the maximum element in the j-th row of i D′ is in the s-th column, which is ( ) . This means that link j l belongs to community s P .
• Calculate the total number

Numerical Experiments
In this section, we apply the genetic algorithm to both artificial bipartite networks and several well studied real-world bipartite networks, and analyze the results in terms of classification accuracy and ability of detecting meaningful communities.The algorithm is implemented by Matlab version 7.1.

Chain of Complete Bipartite Network
We test our algorithm on a type of exemplar networks, that is, chains of complete bipartite network.This network consists of many heterogeneous complete bipartite networks, connected through single nodes (Figure 2).Each complete bipartite network ( ) is a bipartite network, where there is a link between any pair of nodes ( ) , , , L s t = * links, then the network has a total of ( ) links.The network has a clear link bipartite modular structure where each community corresponds to a single bipartite complete network, thus the optimal partition density is 1.Using the genetic algorithm above, we can easily detect the optimal partition and identify the overlapping nodes.In this paper, we use a network consists of two (3,4)-complete bipartite networks, one (4,5)-complete bipartite network, one (4,6)-complete bipartite network, and one (5,5) complete bipartite network, the optimal partition results are obtained and described in Figure 2.

Real-World Networks
In this subsection, we validate our algorithm on some real-world networks.
The Southern Women Network During the 1930s, ethnographers Davis, Stubbs Davis, St. Clair Drake, Gardner, and Gardner collected data on social stratification in the town of Natchez, Mississippi.One of their work is collecting data on women's attendance to social events in the town [24].They constructed the famous women-event bipartite network and analyze it.Since then the women-event bipartite network has become a de facto standard for discussing bipartite networks in the social science [12]

=
with which a link m l belongs to a community c.Note that , m c b ranges in the interval [0.0, 1.0].Each link of the bipartite network is subject to the following constraint: represents normalization to 1.0 of link factors of belonging to the communities.For each chromosome, we design a partition matrix if the link m adjacent links of j l (including link j l ), and the number of its adjaequal to the sum of ele- ments in the j-th row of matrix A, which can be expressed by link j l by the following equation.

Figure 2 .
Figure 2. The chain of heterogeneous complete bipartite network.Each community is a complete bipartite network, and two adjacent communities are overlapped by one node.
Step 5. Population Self Organize Mapping For each temporary individual, do self organize mapping operation on it.• Step 6. Population Normalization For each temporary individual, do normalization on it.Denote the normalized individuals by where α and β are adjustable parameters which can decrease with the step t.In this paper, we let Assume that