1. Introduction
As an archetypal framework of privacy-preserving cooperative learning, federated learning (FL) has received great attention and in-depth research, which enables users to cooperatively learn models without exchanging data and thereby protecting users’ personal privacy (see [1] and [2]). Specifically, each user possesses their own private data, which is used to learn their local models. After several local learning rounds, users upload their model parameters to the server, which performs aggregation and distributes the global model parameter to all users for the next episode of learning (see Korba et al. [3], Bouzabia et al. [4] and Venkatesan et al. [5]).
The key issues of FL are how to deal with data heterogeneity and how to transfer knowledge between users with similar data. Each user is eager to absorb positive knowledge from other users to promote local model training. Transfer learning has become one of the hot branches of FL research. Selvakanmani et al. [6] raised a transfer learning trick integrated into a federated learning framework in privacy-preserving breast cancer classification circumstances. Raja et al. [7] proposed a federated transfer learning framework that integrated a pre-trained ResNet-18 for transfer learning with a meticulously designed convolutional neural network. Singh and Bedi [8] proposed a streamlined approach that merged clustering analysis with federated and transfer learning. Qian et al. [9] presented a comprehensive review of federated transfer learning in machinery fault diagnosis. However, due to the huge discrepancies between private data, users with huge differences in data structure tend to provide negative knowledge, and absorbing excessive negative knowledge will reduce the efficiency of the local model.
When considering the data structure of machine learning, we have to take structured data or irregular data into account. For regular data structure, we can think of them as a standard set of data or an Excel spreadsheet. While for the structured data or irregular data, it can be represented by a graph, i.e. since there are mutual correlations among samples, and hence vertices are used to represent samples and their associated structures are represented by edges. To resolve the limitations stated above, federated graph learning (FGL) is introduced to deal with the structured data involved in FL. In FGL setting, each user possesses a structured dataset presented by a (weighted) graph, users train the local model parameters in terms of graph neural networks (GNNs), graph convolutional neural networks (GCNNs), or other graph learning approaches (see [10], [11] and [12] for more details). The server aggregates the local model parameters, which are submitted by users, and disseminates the global model to each user (see Xie et al. [13], Peng et al. [14] and Tang et al. [15]). However, the existing FGL has defects in calculating user similarity. Specifically, the similarity function is calculated from scalars or vectors, which are used to represent the user’s graph-structured data, while the topological structure of the graph is ignored. Although a small number of contributions extract graph features, the highly concentrated graph feature information often distorts the graph topological features.
To alleviate the foregoing issue, we propose a novel FGL based on graph distance computing, and it has the following two exclusive advantages:
Graph distance calculation is completely dependent on the topological structure of the graph, and its calculation results are highly reliable, so that the similarity between user data can be accurately determined.
By means of graph distance computing, FGL algorithm with dynamic clustering can be designed, where each cluster contains users with similar data structures, and the size of the cluster can be adjusted according to the threshold.
In this work, we proposed a novel FGL algorithm based on graph distance computing, where users are clustered into several clusters: the more clusters there are, the fewer users in each cluster; the fewer clusters there are, the more users in each cluster. The number of clusters can be adjusted by a threshold. The server aggregates the model parameters of users in the same cluster, thereby achieving collaborative training of similar users.
The subsequent sections are organized as follows. First, a comprehensive survey on FGL is presented in the next section. Then, the settings and problem formalization are described. After that, we present the main algorithm and detailed analysis. Finally, we conclude the paper.
2. Related Works on Federated Graph Learning
The aim of this section is to present a comprehensive survey on FGL in recent years. Chen et al. [16] proposed FedGraph for FGL among multiple computing clients, and an intelligent graph sampling algorithm in terms of deep reinforcement learning was also suggested. Wu et al. [17] raised a multi-level FGL and self-attention-based personalized indoor localization method to capture the intrinsic features of received signal strength. Zeng et al. [18] introduced a feature-contrastive graph federated learning approach to improve the robustness of the federated learning system in graph data. Chang et al. [19] determined a graph-based client selection framework for heterogeneity in federated learning. Liu et al. [20] studied FGL under a cross-silo setting to deal with network traffic throttling and flow scheduling problems. Zhang et al. [21] argued a specificity-aware FGL framework for rs-fMRI analysis and automated brain disorder identification. Zhang et al. [22] propose an FGL framework by means of ego-graphs, in which each client trains their local models while also contributing to the training of a global model. Li et al. [23] proposed federated graph topology-aware aggregation in terms of topologyaware local smoothing confidence and mixed neighbor features. Chen et al. [24] investigated the problem of stealing graph data in the FGL setting. Fu et al. [25] devised an FGL approach for privacy-preserving individual-level infection prediction. Kong et al. [26] considered a model heterogeneous graph federated learning framework in terms of prototype learning and local data augmentation. Zhong et al. [27] designed a lightweight FGL approach for accelerating classification inference in unmanned aerial vehicles-assisted multiaccess edge computing systems. Xie et al. [28] proposed a decentralized cross-institutional personalized FGL algorithm for IoT service recommendation. Abdel-Basset et al. [29] introduced a digital twin framework for creating a digital replica of the physical slicing-supported network in terms of FGL approach. Tian et al. [30] proposed a FGL approach for privacy-preserving cross-domain recommendations to capture users’ preferences. Xing et al. [31] proposed an FGL framework for threat detection of TAP rules while simultaneously protecting user data and privacy.
The existing approaches of FGL fully reflect the fusion of data and try to use a variable to represent the topological structure of a graph, so as to realize the graph similarity calculation by calculating the gap between variables. These methods cannot reflect the real topological structure gap between graphs, even if the variable describes a certain topological indicator parameter in the graph.
3. Preliminary and Settings
The purpose of this section is to introduce the setting of FGL, and the corresponding problem is formally formulated.
3.1. Federated Graph Learning
Federated graph learning is considered as a distributed training framework for graph learning models (e.g. GNNs, GCNNs), or as a framework for large-scale subgraphs collaborative training. It leverages to cope with the computational and storage restrictions by dealing with large-scale subgraphs in terms of local graph training policies.
Let
be the set of
users, and user
possesses a private local dataset, which is represented by a graph
. Actually,
is the set of samples of user
, while there is an inherent correlation between samples in
, and hence edges between samples represent such intrinsic connection. The edge set
indicates the topological characteristics of the data structure of user
. Without loss of generality, we assume samples can be represented by
for
and
, where
and
. In particular,
in an unsupervised learning setting.
Prosaically speaking, in the FGL setting, each user
trains a local model parameter using a graph-structured dataset
and a training framework (GNN, GCNN, etc.), and the server obtains the global model parameter by aggregating the local model parameters submitted by users. FGL currently has a wide range of applications in various fields, such as biomedicine, finance, and industry.
Example 1: Ontology can be seen as a collection of concepts with semantic information. Based on the hierarchical relationship between concepts, ontology can be represented as a directed acyclic graph, where concepts are represented by vertices, and the inherent relationships between concepts are expressed by edges between vertices. Assume that the semantic information of each vertex in the ontology graph can be represented by d-dimensional vector, then an ontology function
(i.e.
, and here
is the vertex set of ontology graph) is a dimensionality reduction operator that aims to map each vertex to a real number, and similarity between two concepts is measured by the difference between their mapped real numbers, i.e. the similarity between
and
(represents two concepts) is determined by
.
Suppose we have multiple ontology digraphs and aim to get an ontology function that enables us to map each vertex (in any ontology graph) to a real number. This is the standard problem of ontology mapping, and the similarity between two concepts (in the same ontology graph or two vertices from different ontologies) can be computed in a similar way. In this setting, all ontology digraphs are merged into one large ontology digraph, and each original ontology digraph becomes a connected component of the merged ontology digraph (see Gao and Chen [32] and Gao et al. [33] for more details about ontology similarity measuring and ontology mapping).
Now, imagine that ontology digraphs are distributed among multiple users, each user training ontology mapping function using the local ontology dataset, and each user doesn’t expect to share their private ontology data. However, the server aims to obtain the global ontology mapping function, which facilitates to mapping each vertex (it can be from an ontology graph for any user) to a real number. In this case, the ontology mapping problem becomes a standard FGL problem.
3.2. Distance between Graphs
In this subsection, we mainly introduce the computational method for graph distance computing, which was raised by Chowdhury and Mémoli [34].
Suppose that
and
are two finite graphs such that
and
can be regarded as the weight function on two graphs respectively. It should be noted that we ignore the edge set of the graph here, because for
with
, then we specify
. A correspondence between
and
is a relation
. The collection of all correspondences between
and
are denoted by
. The distortion of
is denoted by:
(1)
The distance between
and
is denoted by:
(2)
If we set
and
, then
can be abbreviated as
.
The above computing trick is also applicable to unweighted graphs. For any two vertices in the graph, the weight is set to 1 if they are adjacent, and the weight is set to zero if they are not adjacent.
3.3. GraphSAGE
GraphSAGE is a classic graph convolutional neural network algorithm, and its name SAGE is abbreviated as Sample and AggreGatE. It argued that convolution is a combination of sampling and information aggregation, thus its core idea is to divide convolution into two steps: sampling and aggregation. The aggregation function is independent of the input order, i.e. the vertices in the neighborhood do not need to be sorted. The specific steps can be divided into the following three steps:
1) Get the domain vertex in terms of sampling.
2) Aggregate the information of the neighboring vertices by means of aggregation function and obtain the embedding of the target vertex.
3) Use the information aggregated on the vertex to predict the label of the vertex/graph.
GraphSAGE uses uniform sampling to sample fixed neighborhood vertices, that is, for a certain vertex, uniform sampling is performed on its first-order neighboring vertices to construct a neighborhood with a fixed number of vertices. Furthermore, GraphSAGE has several kinds of aggregation approaches:
In the above formulations,
represents the number of depth or layers of the neighborhood; the weight matrix
(or presented by
) denotes the convolution kernel parameters of the k-th layer, which is learnable;
is the nonlinear parameters, which are usually taken as ReLU function, and sometimes sigmoid function;
represents the aggregation function selected for the k-th layer;
represents the signal or feature of vertex
on the k-th layer; the information of
and
is combined using CONCAT.
Note: 1) The sampling domain obtained by GraphSAGE is different in each iteration, which is the biggest difference from GNN. GNN obtains a fixed domain for each vertex through random walks and does not change, but the vertex domain of GraphSAGE is obtained through sampling, and sampling is required once in each iteration, so the domain of each vertex changes. 2) The second type of LSTM aggregation is related to the input order, but a special treatment trick is applied here.
The differences between GraphSAGE and GNN can be summarized by the following three points:
1) In GNN (and traditional CNN), the order of neighborhood vertices needs to be determined. However, in GraphSAGE, the vertices in the graph convolution neighborhood do not need to be sorted (the aggregation function is independent of the vertex order).
2) In GNN, each vertex in the neighborhood has different convolution kernel parameters. In GraphSAGE, all vertices in the neighborhood share the same convolution kernel parameters.
3) In terms of neighborhood selection method, GNN constructs the neighborhood by the probability size of random walks, and GraphSAGE constructs the neighborhood by uniform sampling.
More studies and applications on GraphSAGE can be referred to Zhang et al. [35], Mirthika et al. [36], Liu et al. [37], El Alaoui et al. [38] and Seon et al. [39].
4. Proposed Algorithm Description
The procedure of our main algorithm is presented in Algorithm 1.
It can be seen from Algorithm 1 that the graph distance is applied to user selection in view of threshold-based user clustering, and the threshold value is not given by input in advance, but determined by experts (decision makers) according to graph distance results. The threshold
can be artificially adjusted, and hence the cluster number
and combination of users among clusters may have clear distinctions among various time slots. Generally speaking, in the initial stage, the value of
can be selected to be larger, so that the number of clusters is smaller and each user can collaborate with more other users. As learning progresses, users tend to collaborate with more similar ones, so the value of
can be selected to be smaller in the later stage, which leads to a larger number of clusters and a smaller number of users in each cluster.
It is worth mentioning that Algorithm 1 is applicable to the scenario of mobile edge computing. Each mobile user has its own structured data, which represented by a graph, and their location changes in real time. Each user needs to determine the server closest to it in each round and upload the model parameter to the closest server. In this case, the threshold calculation can be changed to the distance calculation between users, or the cluster calculation centered on the servers. At this time, the number of clusters is fixed, that is, the number of servers.
5. Conclusion and Discussion
In this paper, we propose a new algorithm for FGL, all users are divided into several clusters according to graph distance, and users in the same cluster aggregate local model parameters, thereby achieving the purpose of knowledge interaction among users with similar graph structures. In practical application, we found that the proposed algorithm has some flaws. One salient issue lies in that the algorithm is only sensitive to graph structure and insensitive to labels and data. We noticed that even two graphs with the same structure have significant differences in their data and different categories of labels. It is difficult to achieve mutual fusion for graph data with significant differences in data categories, even though their graph structures may be very similar.