Coarse Graining Method Based on Noded Similarity in Complex Network

Coarse graining of complex networks is an important method to study large-scale complex networks, and is also in the focus of network science today. This paper tries to develop a new coarse-graining method for complex networks, which is based on the node similarity index. From the information structure of the network node similarity, the coarse-grained network is extracted by defining the local similarity and the global similarity index of nodes. A large number of simulation experiments show that the proposed method can effectively reduce the size of the network, while maintaining some statistical properties of the original network to some extent. Moreover, the proposed method has low computational complexity and allows people to freely choose the size of the reduced networks.


Introduction
Many complex systems in reality can be abstracted into complex networks [1] for research.Complex network has been one of the most active tools for the study of complex systems.It has important applications in many fields such as biology, economics and finance, as well as society, electricity, transportation and so on.
Since the similar topological properties of real-world complex networks, it is a hot topic to study the commonness of networks and the universal methods to deal with them [2]- [9].However, many networks are known to exhibit rich.It World-Wide-Web are thought of as the nodes and the hyperlinks between them as edges, then the World-Wide-Web is a huge complex network and continues growing.It is difficult to deal with this kind of large-scale networks, coarse graining of complex networks is the latest way to overcome such difficulty in the world.Given a complex network with N nodes and E edges, which is considerably large and hard to be delt with, the coarse-graining technique aims at mapping the large network into a mesoscale network, while preserving some topological or dynamic properties of the original network.This strategy is based on the idea of clustering nodes with similar or same nature together.
In the past decade, some well-known coarse-graining methods have been proposed [10]- [21].Historically, these methods can be classified into two categories: one is based on the eigenvalue spectrum of the network.Its main goal is to reduce the network size while keeping some dynamic properties of the network.
For example, D. Gfeller et al. proposed a spectral coarse-graining algorithm (SCG) [11] [12], Zhou and Jia proposed an improved spectral coarse-graining algorithm (ISCG), Zeng and Lü proposed a path-based coarse graining [13] [14].All these coarse-graining methods are developed to maintain the synchronization ability.Another coarse-graining method is based on topological statistics of the network, for instance, the k-core decomposition [10], the box-counting techniques [15] [16], the geographical coarse-graining method introduced by B. J. Kim, et al. in 2004.And referring to literature [18], the network reduction is related to segmenting the central nodes by implementing the k-means clustering techniques, etc.These methods can well maintain some of the original networks.In 2018, our research team proposed a coarse-graining method based on the generalized degree (GDCG) [19].Specifically, the GDCG approach provides an adjustable generalized degree by parameter p for preserving a variety of significant properties of the initial networks during the coarse-graining processes.
In general, degree is the simplest and most important concept to describe the attributes of a single node.In undirected networks, the degree of node i is defined as the nature of nodes, is not only related to the degree of the nodes, but also to the degree of their neighbor nodes, the number of edges connected to i, i.e., the number of neighbor nodes of node i.Actually, the nature of nodes is not only related to the degree of the nodes, but also to the degree of their neighbor nodes.From the point of view of information transmission, the more common neighbors of two nodes, the more similar information they receive and the ability to receive information.From the perspective of information transfer, the more common neighbors the two nodes have, the more similar information they receive and the ability to receive information.In this paper, based on the similarity index of nodes, we introduce a new possible coarse-graining technique.According to the number of common-neighbor of nodes in the network, the algorithm describes the similarity between nodes and extracts the reduced network by merging similar nodes.The method is computationally simple, and more importantly, the size of the reduced network can be accurately controlled.Numerical simulations on three typical networks, including the ER random networks, WS

Definition of Node Similarity
Consider a complex network ( ) Let ( ) i Γ denote the set of neighbor nodes of node i, ( ) is the common neighbor node set of node i and j， ( ) ( ) is the union of the neighbor nodes of node i and j.By Equation (1), ij ji s s = , it shows that the similarity with the node i itself is 1.And if the node i and the node j have no common neighbor nodes, then their similarity is zero, i.e., scribe the degree of local structure similarity between the node i and j.We treat ij s as the local similarity index.
The similarity between the node i and other nodes in the network can be expressed by a Ndimension vector ( ) , , , . The larger the value ∑ is, the more nodes in the network are locally similar to the node i. There- fore, we extend the Equation (1), the global similarity index for node i in the network is defined as follows: The larger i gs is, the more likely the node i will be the cluster center of some similar nodes.

Noded Similarity Coarse-Graining Scheme
It is noted that coarse-graining methods have to solve two main problems: one is the emergence of nodes, that is, to determine which nodes should be merged; And the second is how to update the edges in the process of coarse graining.In the following content, the noded similarity coarse-graining scheme is introduced from these two sides.

Nodes Condensation Based on Similarity Index
Suppose we are going to coarse grain a network containing N nodes to a smaller one with N  ( N N <  ) nodes.First, we need to select N  cluster center, per- form the clustering algorithm to get the corresponding N  cluster, then merge nodes in the same cluster.
In order to select N  suitable nodes as the cluster centers, it is necessary to ensure that the extracted cluster centers have as much high global similarity as possible (with as many nodes as possible in the network).It is also required that the local similarity between the two clustering centers should not be too high (otherwise, they may belong to the same cluster, only one of them could be the clustering center).
The detailed steps for selecting N  clustering are shown as follows.
Step 1: Get the local similarity and the global similarity of each node in the network.The sequence gs gs gs  of the generalized degree of N nodes has been sorted in decrease order.
Step 2: Set S V be the set of cluster centers.Firstly, put the node 1 v which cor- responding to the maximum global similarity Secondly, pick the node 2 v corresponding to the second largest global similarity (N and N  are the size of the coarse-grained networks and original networks respectively, β is an adjustable parameter).It indicates that the node 2 v and 1 v are not in the same cluster, so 2 v could be the , which means that the local similarity between the node 2 v and 1 v is too high and these two nodes may belong to the same cluster.Then 2 v cannot be put into S V as a new cluster center.Continue to select the cluster centers in the order of gs gs gs  , the new cluster center node i v has to satisfy: In this way, stop selecting the new cluster centers until the number of S V reaches N  , denoting , , , Step 3: Take as the cluster centers respectively, their corresponding clustering sets are described as In order to find the clustering set j M that the node i v of the S V belong to, our objective is to find: where with other nodes in the network.

Updating Edges of the Reduced Networks
N  clustering sets have been obtained from the section 3.1, merge nodes in each cluster and get N  coarse-grained nodes.To keep the connectivity of the re- duced network, the following step is to update edges, the detailed content is as following: Definition of weight.The set of nodes in ith cluster is defined as i M ( i M is also the ith node in coarse-grained networks).We re-encode the weight, specifically: , , , where, ij a is the element in the adjacency matrix ( ) , i j M M separately represent the number of the nodes in ith, jth cluster.As presented above, the framework can preserve the edges between the clusters (each cluster corresponds to a coarse-grained node) that are closely related to each other in original networks.Moreover, it can prevent the network from reducing into a fully connected network.And removing the weight of the edges, only displaying the topology structure of the coarse-grained networks, is conducive to keep some statistical properties of the original networks.In particular, if the network becomes disconnected after deleting the edge in order to ensure the connectivity of the network.Now we can create an undirected unweighted network after two steps as described above.

A Toy Example
To better illustrate the algorithm we proposed, this section will apply the noded similarity coarse-graining scheme on the small toy network, as shown in Figure 1.
A 9-node toy example is shown in Figure 1(a).Here, we use the noded similarity coarse-graining method to reduce the network into 7 nodes, and to meet the required coarse-grained network size.First of all, calculate and sort the global similarity of nodes in decrease order as , , , , , , , , gs gs gs gs gs gs gs gs gs .It can be found that: , , , , , , M M M M M M M .It is not difficult to calculate 1  the average path length, average degree and clustering coefficient.Recently, the average path length, average degree and clustering coefficient are the three most concerned topological properties in the research of complex networks.They describe more explicit information about the various aspects in the networks.Specifically, we will give a clearer in the following sections.And for simplicity, we main consider three typical networks (the ER random networks, the WS small-world networks and the SF scale-free networks).To better illustrate the effect of the proposed method on these topological properties.We investigate our method with different values of β .On the other hand, under the optimal β , we further investigate the effect with the structural parameters of different networks.For simplicity, we consider the ER random networks with connecting probability 0.01, 0.02, 0.03, 0.04, 0.05 p = .The WS small-world network algorithm is proposed by Newman and Eatts in 1998, which is obtained by randomly rewiring each edge of the original networks on the basis of the nearest-neighbor coupled networks.We adjust the rewiring probability with 0.1, 0.2, 0.3 p = and coordinator number 4, 6, 8 K = .In terms of SF networks, the degree distribution follows the power-law distribution.When the power-law exponent γ increases from small to large, the power-law networks change from the highly heterogeneous networks to the highly homogeneous networks.
Additionally, for each type of the artificial complex networks, we fix the size of these networks as 1000 N = , and for each type of the typical complex networks, we consider 1000, 900, 800, 700, 600, 500, 400, 300, 200, 100 N =  . The results of all the illustrated experiments are the average of ten independent simulation runs.

Average Path Length
The average path length L between two nodes is defined by: where N is the size of the network, ij d is the shortest path length from i to j.Under each p, the average path length can be well preserved, especially for 700 N ≥  . For the WS networks, when 600 800 N ≤ ≤  , the larger β is, the better effect of maintaining the average path length will be.And the curve with

Average Degree
Average degree k is a critical index to weigh the relatively connectedness of the whole network.Given the adjacent matrix A of a network, the corresponding k is given by: The evolutions of average degree k versus the parameter β with the node similarity coarse-graining processes for the three types of networks are shown in Figure 4. Similar to the phenomenon exists in Figure 2 As displayed in Figure 5, the average degree can be preserved in three kinds of coarse-grained networks with the optimal β .In ER networks, the curve of average degree is approximately linear with 0.01, 500 p N = > . With the increase of p, the volatility of the curve increases, but it also roughly keeps the average degree.From Figure 5(b), the average degree for all WS networks can be well preserved regardless of the structural parameters.In SF networks, the node similarity algorithm can better keep the average degree of the original networks with the increase of γ .Moreover, the average degree of the coarse grained net- works is within 1 degree relative to the average degree of original networks even the size of the original networks is reduced to half.As you can see, the SF networks are superior to the above two networks in maintaining the average degree.

Clustering Coefficient
The clustering coefficient measures the edge connection probability between the neighbor nodes in a network.The clustering coefficient i C of node i with the ( ) where i E gives actual number of edges between i k neighbor nodes of node i.
The overall level of clustering in a network is measured as the average of the clustering coefficients of all nodes: The result shows that the optimal parameter β corresponding to the best performance on preserving different statistical properties is the same.From the result shown in Figure 7

Conclusions and Discussions
The coarse-graining techniques are promising ways to study the large-scale complex networks.In this paper, we have developed a new algorithm to reduce the sizes of complex networks.This method is based on the local similarity and the global similarity of nodes, which is more suitable for the original intention of coarse graining.Particularly, we introduce a tuning parameter β in the algorithm to obtain the best effect of keeping the statistical properties.The study found that the optimal parameter β for different types of networks was different.Specifically, the ER networks are not sensitive to β ; the WS networks require larger β ; and in the SF networks, the smaller β is, the better the statistical properties will be maintained.Results from extensive numerical experiments indicate that the average path length, the average degree and the clustering coefficients can be preserved during the coarse-graining processes.

.
And then cluster the remaining N N −  nodes in the network (the collection of the remaining nodes is represented as:

3 is the first cluster center corresponding the clustering set 1 M
Figure 1.A toy example based on the node similarity coarse-graining method.(a) A 9-node simple network with adjacency matrix A; (b) The reduced network with adjacency matrix A  .cluster center into to the set S V .And so on, the set of cluster center { } 3,8,5, 6,1, 7, 2

1 M 8 belong to the same clustering set 2 M
(3).The distance between the node 4 and the cluster center node 3 is the smallest one, so the node 4 should be merged with the cluster center node 3 together.They belong to the same clustering set ; akin to the node 4, the node 9 and the cluster center node .At this point, seven coarse-grained nodes have been obtained.There are only 0, 1 or 2 three kinds of weight between the coarse grained nodes.So take the node the nodes that are not connected in the original network, their edge weight are still set to 0 in the coarse-grained network.Then we can create the coarse grained network with adjacency matrix A  , as shown in Figure1(b).

Figure 2 &
Figure 2 & Figure 3 show the evolution of the average path length of the above mentioned networks with noded similarity coarse-graining method.From Figure 2(a), one can see that for the ER networks the average path lengths are almost the same with the adjustable parameter β varying from 0.1 to 0.6.It means that the value of β does not have a great impact on the ER networks.Therefore, we randomly pick method can well preserve the average path length of the original networks with

Figure 2 .
Figure 2. Evolutions of the average path length under different β.(a) ER network; (b) WS networks; (c) SF networks.

Figure 3 .
Figure 3. Evolutions of the average path length under different structural parameters.(a) ER network; (b) WS networks; (c) SF networks.

Figure 5 .
Figure 5. Evolutions of the average degree under different structural parameters.(a) ER network; (b) WS networks; (c) SF networks.

Figure 7 .
Figure 7. Evolutions of the clustering coefficient under different structural parameters.(a) ER network; (b) WS networks; (c) SF networks.