Hierarchical Modeling by Recursive Unsupervised Spectral Clustering and Network Extended Importance Measures to Analyze the Reliability Characteristics of Complex Network Systems

The complexity of large-scale network systems made of a large number of nonlinearly interconnected components is a restrictive facet for their modeling and analysis. In this paper, we propose a framework of hierarchical modeling of a complex network system, based on a recursive unsupervised spectral clustering method. The hierarchical model serves the purpose of facilitating the management of complexity in the analysis of real-world critical infrastructures. We exemplify this by referring to the reliability analysis of the 380 kV Italian Power Transmission Network (IPTN). In this work of analysis, the classical component Importance Measures (IMs) of reliability theory have been extended to render them compatible and applicable to a complex distributed network system. By utilizing these extended IMs, the reliability properties of the IPTN system can be evaluated in the framework of the hierarchical system model, with the aim of providing risk managers with information on the risk/safety significance of system structures and components.


Introduction
Critical infrastructures are engineered distributed systems which provide the fundamental support to modern Industry and society.Examples are computer and communication systems, power transmission and distribution systems, rail and road transportation systems, oil/gas systems and water distribution systems.Failures of such systems can have multiple, transnational impacts of significant size [1-3].Hence, identifying and quantifying the reliability and vulnerability of such systems is crucial for designing the adequate protections, mitigation and emergency actions against failures [2].
These systems are exposed to multiple hazards and threats, some of which are even unexpected and emergent, and consist of a large number of elements whose interactions are not easily modeled and quantified, so that a complete analysis by exhaustive treatment cannot be pursued.As a result, the performance and reliability assessment of such "complex" systems has proved to be a non-trivial task in practice.
Recent studies suggest that many real complex network systems exhibit a modularized organization [4,5].In many cases, these modularized structures are found to correspond to functional units within networks (ecological niches in food webs, modules in biochemical networks) [6].Broadly speaking, clusters (also called communities or modules) are found in the network, forming groups of elements that are densely interconnected with each other but only sparsely connected with the rest of the network.Furthermore, hierarchically modularized organization, which is a central idea for the life process in biology [5,7], is also found to characterize the internal structure of many technological networks [8].This sparks the idea of utilizing the hierarchical, modularized structure as a basis to model these complex systems, for their analysis and understanding [9].
In the analysis of systems with respect to their failure behavior, Importance Measures (IMs) are used to identify the weak points and quantify the impact of component failures [10,11].IMs provide numerical indicators to determine which components are most important for system reliability improvement or most critical for system failure.Many different IMs have been proposed in the literature [12,13], among which classical and relevant statistics are Birnbaum [14],  and Criticality Importance [16,17].However, none of these measures can be applied directly to complex network systems, because of the distributed character of functionality and service that they provide.
The purpose of this paper is twofold: firstly to propose a scheme of recursive clustering to obtain a hierarchical modeling framework associated with different varied-size grained virtual networks; then to introduce Extended Importance Measures (EIMs) which are compatible with the distributed characteristics of complex network systems, to evaluate the components importance in the framework of the hierarchical system representation.
The remainder of this paper is organized as follows: Copyright © 2013 SciRes.AJOR Section 2 presents the methodology of hierarchical modeling, taking the structure of the 380 kV Italian Power Transmission Network (IPTN) as an example for illustration; in Section 3, the basic terminal-pair connection reliability problem is first introduced, based on which the traditional IMs are extended and then calculated for the IPTN system; conclusions are drawn in Section 4.

Network Representation
Graph Theory provides a framework for the mathematical representation of complex networks.A graph is an ordered pair comprising a set of vertices (nodes) together with a set of edges (also called arcs or links) , which are twoelement subsets of V.The network structure is usually defined by the adjacency matrix, which defines which two nodes are connected by assigning a 1 to the corresponding element of the matrix; otherwise, the value in the matrix is 0 if there is no connection between the two nodes.As described, this type of graph is unweighted and undirected.A graph is weighted if a value (weight) is assigned to each edge representing properties of the connection like cost, reliability, capacities, etc.For example, the matrix of physical distances is often used in conjunction with the adjacency matrix to describe a network also with respect to its spatial dimension [18,19].
In this paper, we take for exemplification of the analyses proposed the 380 kV Italian power transmission network (IPTN) (Figure 1).This network is a branch of the high-voltage-level transmission network, which can be modeled as a graph of ( 127 nodes N  30 G N  generators and distributors) connected by M = 171 links [20,21], defined by its adjacency matrix A whose entries ij  are 1 if there is an edge joining node i to node j or 0 otherwise.In Figure 1, the generators, i.e. hydro and thermal power plants, are represented by squares whereas the distribution substations are represented by circles.

Construct Network Hierarchy by Successive Clustering
Modularity is ubiquitous in many networks of scientific and technological interest, ranging from the World Wide Web to biological networks [7,22].As a result, it is often possible to identify groups of elements that are highly interconnected with each other, but have only a few links to components outside of the group to which they belong to.These communities usually combine into each other in a hierarchical manner [7], in which nodes form groups and then join the groups of groups, and so forth, starting from the lowest levels of organization (individual nodes) up to the level of the entire system.This suggests the development of a hierarchical structure to describe a complex network system at different levels of resolution, with the aim of managing the complexity of the system more effectively.
A successive Unsupervised Spectral Clustering Algorithm (USCA) [23], which is invariant to cluster shapes and densities and simple to implement, has been adopted in this study to build the hierarchical structure of the IPTN system.Cluster analysis aims at recognizing natural groups within classes of entities [24].The problem is to assign categories to unlabelled data, encouraging the search of implicit information in the network structure encoded in its graph [25].Consequently, modularity patterns within a complex network system can be revealed without a priori knowledge of their existence.The detailed description of different clustering methods is beyond the scope of this article.For a systematic and synthetic review, the reader is encouraged to look at [24][25][26].
The USCA makes use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before Fuzzy c-Means (FCM)-clustering in fewer dimensions.Schematically, it is performed by the steps [23] in Table 1.
In the first step, the Laplacian matrix sym is calculated from the similarity (affinity) matrix as follows.
Then, the normalized graph Laplacian matrix can be obtained: where L D S   and I is the identity matrix of size n n  .
By recursively operating the USCA on the data of the IPTN presented in Section 2.1 above, a 5-levels hierarchical structure of the system is constructed which contains the complete system at the top and individual elements at the bottom (the top panel of Figure 2 gives out the structure of the hierarchy, detailed in the first 3 levels).

Hierarchical Modeling of the Network
Based on the hierarchy structure resulting from the successive application of USCA, artificial networks can be The number of clusters c is set equal to k, according to the eigengap heuristic theory [24].
Let be the matrix containing the vectors For let be the vector corresponding to the i-th row of T.

1, ,
Resort to the FCM algorithm [27,28] to partition the data points defined at each layer.The artificial network a the hierarchy is described as a graph with , where is the number of levels of the hierarchy.We use to represent the artificial node i at level l, which corresponds to a cluster of real network nodes.Artificial nodes are connected by artificial links for 1, 2, , and composed by those actual network links connecting (in parallel) the actual nodes in the clusters forming the artificial nodes, The connection pattern between artificial nodes at level l is illustrated by an adjacency matrix   l A whose element

in the artificial nodes and
there is at least one actual link connecting two actual nodes, and 0 otherwise.
Figure 2 presents the hierarchy structure of the IPTN system and the artificial networks associated with the first 3 levels of the hierarchy.At the top of the hierarchy (i.e.l = 1), the network is a single unit, i.e. one artificial node , which consist of all actual nodes.At the second level indicates the number of actual nodes which compose it, e.g. is representative of a group of 38 actual network nodes.Note that at the bottom of the hierarchy, we find the original network, i.e. each artificial node is an actual node and each artificial edge corresponds to an actual link.

V
The hierarchical model offers different levels of resolution at the different levels of the hierarchy.The artificial networks at the top of the hierarchy contain limited detail information of the local connectivity patterns (in the limit, only one node represents the whole network at the first level of the hierarchy); as we move down the hierarchy, more local information enters the model, at the expense of an increase in the dimension of the network.These characteristics can be leveraged efficiently to manage the complexity of a complex network system.reliability parameters for large or even moderate network systems [11].In this respect, hierarchical modeling sets up a framework based on which reliability and vulnerability characteristics of complex network systems can be computed efficiently, thanks to the multi-scaled information representation scheme.

Terminal-Pairs Reliability Assessment oblem
The terminal-pair or node-pair reliability (TPR) pr amounts to determining the probability of successful communication between a specified source node and a terminal node in a network, given the probability of success of each link and node in the network.Let us introduce a binary vector to represent the state of the network, i.e. the state x of each of its M edges and the state y of each of its N nodes, wher 1 i x  if edge i e is perating and 0 herwise ( y for node).For simp icity of illustration, we assume that nodes cannot fail, while edges can (thus y is no longer considered hereafter).The state of the network is defined as being non-failure if the specified terminal-pair is connected by at least one path of operating edges; otherwise it is failure.We then define the TPR as: where sd  is a binary function which indicates the connection availability between node-pair s and d (1 = connection; 0 = no-connection).Let us assume that each edge i e has associated a probability i p of being operating and a probability 1 of being failed; then, the TPR of the network c ulated as: an be calc where x represents the state of network edge and i e f X is the set of failed edges for a given state k F S   .that the implicit assumption underpinning (4) is that the network edges are independent.
When the computational cost of the network is hig Note uation Eq h (it grows exponentially with the number of network components), then, the artificial network at a suitable level of the hierarchy can be leveraged to carry out the TPR.At the generic level of the hierarchy, the artificial link   is comp by actual network links in par l, then, the reliability of the artificial edge at level l   l ij E can be calculated by: (5) where   st q e k indicates the failure probability of the actual lin st e that in the real network connects nodes s v and t v .Va u orithms to solve the classic TPR problem ha rio s alg ve been reported in literature, with various computational efficiencies [29][30][31].A so-called Modified Dotson algorithm [30], which has been claimed and tested to subdue others in computational time, is used here for the TPR assessment based on the hierarchical modeling.The failure probability of the transmission lines in the IPTN system is computed based on outage statistics provided in [32], by assuming that the edge failure probability is proportional to its length with an average failure rate 1.380635 occ/100 mile-year, and average outage du t = 64.81hours/occ.In Figure 3 right-panel, the connection reliability betw ration time een nodes 1 and 127 in the IPTN network system (left panel in Figure 3) is shown as resulting from evaluations at each of the five levels of the hierarchical model described in the previous section.The right panel of Fig- ure 3 gives the probabilities of connectivity failure between nodes 1 and 127 from level 2 to level 5 (top) and ximum values of connectivity failure probability and computational time, which occur at the bottom of the hierarchy (level 5) corresponding to the whole network.The result at the first level has not been shown since its value is simply 0, i.e., node 1 and 127 are in a single unit and will not disconnect.One can see that the difference between the actual and estimated failure probabilities decreases as the assessment moves downs to the bottom of the hierarchy, balanced by the computation time which instead increases significantly.The decision maker can obtain satisfying estimations of the failure probability at a hierarchical level of lower complexity, e.g. level 3, thus saving significantly in computation time.

Component Extended Importan
Component importance measures are widely used in system engineering to identify components within the system that most significantly influence the system behavior with respect to reliability, risk and/or safety.The indications drawn are valuable for establishing direction and prioritization of actions, related to reliability improvement during system design and optimization of operation and maintenance.
A well known IM is the so called Birnbaum IM defined as (with reference to system reliability s R , as the system performance indicator) [14]: where B i I is the Birnbaum Importance (BI) of component i; s R represents the reliability of the system; i R is the reliability of component i ;   is the system reliability calculated assuming that component i is erfectly operating and the system reliability in the opposite case of component i failed.The BI measures the significance of component i to system reliability by the rate at which system reliability improves with the reliability of component i.As shown in Equation ( 6), the BI of component i does not depend on i R itself, so that two components i and j may have a similar value B I although they have different reliability values i R and j R , respectively; this could be seen as a limitation of BI.The Criticality Importance (CI) measure overcomes above limitation by considering component unreliabi-the lity [17].It is defined as: where F is the unreliability of component i and s F is the system unreliability.Now, a less reliable com B a co ponent is more critical than another one with same value of I.
Fuessell & Vesely [15] proposed an alternative impor-tance measure according to which the importance of mponent in the system depends on the number and on the order of the cut sets in which it appears [17].Most commonly used as a risk reduction indicator, Fuessell & Vesely Importance (FVI) quantifies the maximum decrement in system reliability caused by a particular component being failed   The previously proposed IMs (B functionally different.They evaluate subtly different pr m, we introdu I, CI and FVI) are operties of the system behavior, and therefore, are often used in a complementary fashion to infer different information.To apply the IMs for analyzing a network system such as the IPTN, it is necessary to extend the definition of the IMs to account for the multiple terminal or node pairs (e.g.generator-distributor pairs) where connectivity defines the network functionality.
Specializing such extension for the analysis of the importance of components of the IPTN syste ce the Extended Birnbaum Importance (EBI) measure as the average of all BI values obtained considering all possible Generator-Distributor pairs reliabilities in the network system: where and  

Numerical Example: Results a iscussions
The EIMs introduced have been calculated for t system at different levels of the hierarchical model of the system developed.For the evaluation, an artificial node functions as a generator as long as there is at least one actual generator node within it; otherwise it is simply a distributor.Tables 2 and 3 report the results of the importance assessment (EBI, EFVI are given in Table 2 and ECI in Table 3) for the hierarchy.For EBI and EFVI, all components in the artificial network have the same importance rank, but with slight differences between EBI and EFVI values, and the artificial edge {2-4} is the most important in the artificial network (see the bottom panel of Figure 2).This is due to the fact that this artificial edge is the only possible link between a generator in artificial node   2 2 V and the distributors in other artificial nodes, and thus its disconnection would cause a large-scale generator-distributor connectivity failure.The rank based on the I is different from that of EBI and EFVI, and the most important artificial edge is {3-4}; the difference lies in the definition, as discussed before: EBI depends only on the structure of the system and not on the reliability of the considered component, whereas ECI takes the unreliability of the component into consideration, and in fact, the artificial edge {3-4} is made of only one actual edge with relatively high probability of failure, which leads to the highest ECI value.
By combining the indications of EBI and ECI, it is advisable to offer advices to the decision maker for the purpose of system n [10].When EBI & EFVI is high and ECI is low like in the case of artificial edge {2-4}, the system safety can be improved by protecting against failure of each component, e.g., adding alternative edges between artificial node   2 2

V
and node   V ).For the case of low E EFVI and ECI ificial edge {3-4}), the decision maker should invest in improvements of the component itself, to decrease the failure probability.
Tables 4 and 5 report the evaluation results at level ), indicating that the system reliability is highly sensitive to its failure, whereas the component itself is relatively reliable.On the contrary, the artificial edge {1-10} composed by only one actual edge {64-78} is highly unreliable itself, and its EBI and EFVI values are both ranked 8th among all 17 edges.It is important to pay attention to these artificial edges with both relatively high EBI & EFVI ranks and ECI ranks, which means not only that their failures cause a significant deterioration of the system reliability but also that they are vulnerable themselves.In this respect, by combining Tables 4 and 5, we find that artificial edges {1-11} (whose actual network link is {71-83}), {6-10} (which is composed by actual link {76-79}), and {10-12} (which is composed by actual links {75-88, 80-95}) are the three artificial edges most critical for the system reliability.
The bold edges tual network system which have resulted most critical based on the extended importance measure evaluation carried out at level 3 of the hierarchy model.These edges should be paid special attention.For links {110-111, 112-114, 107-109}, improving the defense in depth against     their failures is ad to improve the reliability system, while fo {64-78, 71 3, 76-79, 75-88}, the edge unreliability should also be mitigated.
Tables 6 and 7 the results of the EIMs e tion at level 4 o TN hierarch l model.I out that artificia 7-11} (corr onding to link {119-122}) ha highest EB and EFVI and artificial edg 2} (correspo g to actu {64-78}) has the st ECI rank and relativel EBI and EFVI ranks, indicating its criticality to syste reliability.
Finally, Table orts the com utation time quired for the calculations of the EIM t different in the hierarchy: as expected, the m go down in the hierarchy the higher the computation time.

Conclusions
The modeling and analysis of comp network sys is a non-trivial task.Related decisio aking regardi reliability and vu ility is limited by computati resources.
In this work, we ha ntroduced a ramework for erarchical modeli omplex netw systems, w ads to the definition of different varied-size grained model is obtained by rsive unsupervised spectral clustering method.T odel thereby obtained provides a multi-scaled representation of the original network system ore detailed information but high complexity a els of the hierarchy, and simplified structure but relatively low complexity at the higher levels.The availability of different scales of modeling resolution allow anagement of the analysis, at the level of d esired for its purposes.

Acknowle
The authors are thankful to Dr. Copyright © 2013 SciRes.AJOR

Figure 1 .
Figure 1.The 380 kV Italian power transmission network.

Figure 2 .
Figure 2. The hierarchy structure of the IPTN system and associated artificial networks of the first three levels.
1 theoretical analysis and the ability to compute different The integer that is indicated in the Figure in proximity of the generic i-th artificial node

F
All possible failure states are included in the subset o ot l e of the set  containing all possible scenarios (fail and non-failure).An inclusive TPR analysis requires considering all elements in ure F  .

Figure 3 .
Figure 3. Illustrative example of terminal pair reliability as sessment of IPTN system.- respectively; of node gene sd R is the TPR between node s and node d;

IFigure 4
 is the Extended Criticality Im (ECI) m of component i and portance easure E FV i I  is the Extended l & Vesely Importance measure.Fussel Copyright © 2013 SciRes.represent the edges of the ac el 2 of the hierarchical model.

Figure 4 .
Figure 4. Most critical edges at level 3 of the hierarchical model.

Table 1 . Unsupervised spectral clustering algorithm.
L .The first k eigenvalues are such that they are very small whereas λ k+1 is relatively large.