^{1}

^{*}

^{2}

^{*}

^{3}

^{*}

^{1}

^{*}

This research uses random networks as benchmarks for inferential tests of network structures. Specifically, we develop formulas for expected values and confidence intervals for four frequently employed social network centrality indices. The first study begins with analyses of stylized networks, which are then perturbed with increasing levels of random noise. When the indices achieve their values for fully random networks, the indices reveal systematic relationships that generalize across network forms. The second study then delves into the relationships between numbers of actors in a network and the density of a network for each of the centrality indices. In doing so, expected values are easily calculated, which in turn enable chi-square tests of network structure. Furthermore, confidence intervals are developed to facilitate a network analyst’s understanding as to which patterns in the data are merely random, versus which are structurally significantly distinct.

Many social network analyses begin with identifying actors with high centrality. Depending on the nature of the actors and the ties that link them, highly central actors may be actors with power or prestige, expected to be influential in the network ( [

It is not unusual for social network scholars to obtain centrality scores to understand a network structure in a descriptive sense, and perhaps to use the centrality indices as input into subsequent analyses to study associations with actor attributes or behaviors of the actors or network. Descriptive statistics on networks are certainly very informative, yet even more information may be obtained via inferential methods with statistical methods that allow testing hypotheses about network structure. For example, social network analysts might pose questions such as whether a set of actors are significantly different from others, to help formulate conclusions about patterns that appear different, making those assessments with greater confidence. This study is intended to contribute to the literature on inferential analyses of social network data. We take a relatively novel and straightforward approach to obtain the statistics, and we believe it is at least as important that this approach should be easily implemented by users for their own social networks.

In building toward these statistics, we begin in Study 1 by examining the descriptive statistics for four popular centrality indices across several stylized networks. (These particular centrality indices are the four that are most frequently implemented in network texts and software). The network prototypes are drawn from the social networks literature which has frequently used clean structures to illustrate various characteristics of networks, from centrality features to cliques and equivalence patterns. We then add error to the networks so the structures of ties might resemble real data more closely than their stylized forms, and we compare the descriptive statistics as more noise is introduced into the network. The results show that as the networks become more errorful, their statistical profiles become similar, regardless of the initial structure of the network.

Given the convergence in similar centrality indices, we conduct a complementary investigation in Study 2, in which we focus on fully random networks. Networks of random ties can serve as a benchmark against which real network data may be compared to assess the extent to which the observed network structures are real patterns and phenomena versus merely those that would be observed in networks of random collections of ties. We use this information to develop formula to test hypotheses about centrality scores. Specifically, we derive expected values for centrality indices and compare those against observed network centralities in a chi-square test, with follow-up z-tests. Additionally, we create confidence intervals to form bands within which observed network ties may be said to be expected at the rate of a random network, and beyond which, in either direction, higher or lower real centralities may be said to reflect truer, non-random network structure in the social network data.

It is our hope that this research and these tools might be useful in assisting network researchers in determining whether their observed sets of network patterns are reliable and statistically significant network structures, above and beyond random collections of ties. We derive these expected values, the statistical tests, and the confidence intervals for each of four popular centrality indices: degree, closeness, betweenness, and eigenvector centralities.

The paper is organized as follows. We first briefly review the centrality indices that comprise the focus of this research. In Study 1, we examine the centrality indices on three stylized networks, and observe the centrality profiles as random errors are introduced into the networks. In Study 2, we consider results for wholly random networks with varying numbers of actors and network densities. We use the results to derive more general formula, for any number of actors and density, to obtain expected values and confidence intervals for each of the four centralities.

This research focuses on four centrality indices: degree, closeness, betweenness, and eigenvector centrality. Degree reflects overall volumes of ties, closeness captures the extent to which the relational ties travel via few “degrees of separation”, and betweenness highlights those actors through whom much of the rest of the network is interconnected. Eigenvector centrality is a weighted function that incorporates information about an actor’s connections with other actors who may themselves be highly central. Scholars have proffered additional centrality measures, but these four are prevalent across texts on social network analysis (e.g., [

Degree. The notion of an actor degree is intuitively understood as capturing the volume of interconnections ( [

C D − i n ( i ) = ∑ j = 1 , i ≠ j g x j i and C D − o u t ( i ) = ∑ j = 1 , i ≠ j g x i j .

Actor degrees are normed as

C ′ D ( i ) = C D ( i ) g − 1 ,

for ( g − 1 ) being the maximum number of links for an actor to the others in the network.

Closeness. An actor’s closeness is “based upon the degree to which [an actor] is close to all other [actors] in the [network]” ( [

C C ( i ) = 1 ∑ j = 1 , i ≠ j g d i j .

At most, one actor may be as far as ( g − 1 ) steps from another, so closeness indices are normed: C ′ C ( i ) = ( g − 1 ) [ C C ( i ) ] .

Betweenness. Betweenness is “based upon the frequency with which [an actor] falls between pairs of other [actors] on the shortest or geodesic paths connecting them” ( [

C B ( i ) = ∑ j < k g g i j k g j k .

The maximum value is [ ( g − 1 ) ( g − 2 ) ] / 2 , so this centrality is normed:

C ′ B ( i ) = 2 C B ( i ) ( g − 1 ) ( g − 2 ) = 2 C B ( i ) g 2 − 3 g + 2 .

Eigenvector. [

Various research articles have certainly considered alternative centrality indices, such as information centrality ( [

Next, in Study 1, we examine the empirical performance of these four centrality indices on small social networks that have exemplar, stylized network structures drawn from the literature. We will then perturb the clean, prototype structures by adding random error to the network ties and observe the effects on the four centrality indices.

To understand the nature of the differences among the centrality indices, it should be useful to begin with clean networks, simple and classic in appearance.

The intention in selecting these particular stylized networks was to represent some variability across structural properties that might be reflected better by one or more of the centrality indices. Indeed given these structures, it would not seem unreasonable to anticipate that some centrality indices may be more sensitive to certain elements of different network structure. For example, one might expect the core-periphery network to have high degrees, and high closeness,

whereas the hierarchy might yield greater betweenness indices. Nevertheless, before proceeding, it is important to note that even if one might have preferred that a different selection of network structures be included for examination, the particular structures will not matter soon, as we add random noise, which will become clear shortly.

To examine whether these relationships hold, we analyzed each network to obtain all four sets of centralities. (For simplicity, we constructed the adjacency matrices to be binary and symmetric). The means of each centrality computed across the actors are presented in

These stylized networks are exemplar structures that should epitomize network patterns for which one type of centrality index would be optimal to use. Yet the networks are so simple and clean that they do not seem particularly representative of real network data. Thus, we build further, adding noise to these stylized networks.

Noisy or errorful networks have been used to study numerous network phenomena ( [

Network Structure | |||
---|---|---|---|

Centrality Index | Hierarchy | Star | Core-Periphery |

Degree | 0.095 | 0.095 | 0.167 |

Closeness | 0.360 | 0.536 | 0.470 |

Betweenness | 0.100 | 0.048 | 0.065 |

Eigenvector | 0.173 | 0.184 | 0.162 |

Carley, and Krackhardt (2006) studied networks of varying size (g = 10 to 100) and densities (1% to 90%) to understand the stability of network indices as network data are sampled, simulated by the addition or deletion of nodes or edges. The perturbations were more disruptive to network recovery than sampling variations on the nodes. [

Real datasets can serve as an acceptable truism, against which the effects of sampling and adding noise may be compared. However, in real data, the extent to which betweenness or closeness, say, should reflect elements of the true, underlying network structure is unknown. Hence, we will add noise to our previously analyzed stylized networks to gauge the sensitivities of the centrality indices, having begun from bases with pure, known network patterns.

We continue this investigation by perturbing the network structures by adding random error from a uniform distribution. Specifically, for each network structure, and each cell in the sociomatrix, a 0 (1) was changed to a 1 (0) with probabilities that ranged from 0.0 to 0.5. For example, for a probability of 0.2, on average, 80% of the ties in the network remained the same, with 20% reversals. Changes were made to the upper triangle of the matrix, and then copied to the lower triangle so as to maintain symmetry. Once the sociomatrix was revised, the centrality indices were calculated. This process was repeated 100 times for each combination of network structure and level of error.

The results are presented in

In the results, note that the mean centrality indices for each network begin at slightly different values (per

As error is introduced, the average degree centrality increases for each network, as does the average closeness centrality. The average betweenness centrality decreases slightly (not having far to drop from low initial values). The eigenvector indices remain stable, increasing only a modest amount (e.g., from 0.162 to 0.214 for the core-periphery network).

As the level of noise added approaches the level of 50%, the centrality indices

converge. This result may be anticipated by some, or may certainly be easy to understand in hindsight. Specifically, the indices should converge regardless of which of the three starting networks on which it is being calculated, because as error is added to the 50% level, the network is at its noisiest, essentially a random network, with none of the original inherent structure remaining. The hierarchy network, for example, began in its clean form depicted in

This finding makes sense because the networks at the right in each panel of

Therefore, next we shall show that in their random states, the four centralities may be derived as a function of the size of the network or the number of actors, g, and the network’s density, the proportion of extant ties. In the section that follows, we show this relationship to be precisely true for degree centrality, and approximately true for closeness, betweenness, and eigenvector centralities.

In this next investigation, we consider the purely random network form to understand what the centrality indices may be measuring in such contexts. We will derive the expected values for degree centralities analytically, and empirically calculate the expected values for closeness, betweenness, and eigenvector centralities. In addition, we will provide the formulas for confidence intervals for each of the centrality indices.

1) At the left are the stylized networks presented in

2) The scenario of “Stylized + Noise” were networks that were analyzed and mean centralities presented in the panels of

3) The fully random networks will be analyzed in

To proceed in the creation of the random networks, we varied networks in size, g = 50, 100, 150 actors, and densities = 0.1, 0.3, 0.5, 0.7. For each combination of parameters, e.g., g = 50 and density = 0.7, we created a network with those specifications and proceeded to calculate the four centrality indices, noting their descriptive statistics and correlations. This process was repeated 100 times for each combination.

Let us consider first the mean centralities, presented in

For the closeness centrality indices in a 10% density network, the values are 0.41, 0.45, and 0.48 for g = 50, 100, 150 respectively, whereas in a 70% density network, the values are 0.77, for all g. We will formulate the expected values for closeness shortly.

The results are different for the betweenness and eigenvector centralities. Both remain relatively constant regardless of the densities of the network, or network size, g. The average betweenness scores range from 0.03 (for 10% density) to 0.01 (for 70% density) for g = 50, 0.01 to 0.003 for g = 100, and 0.01 to 0.002 for g =

150. The changes in the average eigenvector scores are similarly modest, increasing from 0.13 for 10% density to 0.14 for g = 50, and 0.095 to 0.099 for g = 100 actors, and 0.079 to 0.082 for g = 150 actors. It is perhaps sensible that the betweenness and eigenvector centralities are rather insensitive to differences in densities, even if neither is explicitly normed to adjust for the prevalence of ties. Betweenness values reflect the inter-connectivity, and eigenvectors the direct and indirect ties, both doing so regardless of the overall volume of ties, indeed both are presumably stabilizing with increased density as, by definition, the direct connections increase, therefore leaving fewer indirect paths remaining, which affect both betweenness and eigenvector scores. Note that by comparison, these results almost suggests that social network analysts should introduce a normative adjustment for degree and closeness centralities to account for density (not just network size) so as to tease out that confound from indices intended to reflect actors’ patterns of connections.

Across the panels in

These various results on fully random networks can be used to derive baselines for the purposes of comparing real network structures and determining the extent of the validity of the inherent patterns in the network ties. For example, for the mean centralities depicted in

Density = # ties # possible = ∑ i = 1 g ( ∑ j = 1 , ≠ i g x i j ) g ( g − 1 )

Given the formula for a degree centrality is:

C D ( i ) = ∑ j = 1 , ≠ i g x i j

Then density may also be written as:

Density = ∑ i = 1 g C D ( i ) g ( g − 1 ) .

Furthermore, note the mean degree centrality is:

C D ¯ = ∑ i = 1 g C D ( i ) g

so:

Density = C D ¯ g − 1 .

Thus, if we have the mean centrality, dividing it by ( g − 1 ) yields density, or if we have a network’s density, we can multiply it by ( g − 1 ) to obtain the mean centrality. Note also that given the normed degree centrality is:

C ′ D ( i ) = C D ( i ) g − 1

the mean normed centrality would be:

C ′ D ¯ = ∑ i = 1 g C ′ D ( i ) g

So density may also be written as a function of the mean normed centrality. Specifically:

Density = C ′ D ¯ .

Thus, if we create a random network with density 0.7, say, then the mean of the normed degree centrality indices will be 0.7 also. In

As just previewed, now that this relationship has been established, its nature is rather intuitive―if the overall density is 0.7, then on average, one would expect a degree centrality to be 0.7 if the network was a manifestation of only random sets of ties. If the network were real data depicting real network structures, presumably the set of actor degrees would vary from 0.7, some being lower and some higher as ties cluster around some actors but not others.

Next we create a chi-square calculation to compare real network data to a random network to highlight the structural elements of the real network that is not shared by a random network (cf., [

For example, let us begin with the case for degree centralities. Upon receipt of network data, a network modeler knows immediately the size of the network, g, and can easily obtain the density. From that, as just shown, the expected value of the normed degree centrality would be the density value (i.e., C ′ D ¯ = Density ), and the expected value of the raw (non-normed) degree centrality would be: C D ¯ = Density ( g − 1 ) . That is, if the network showed no particular structure varying across the actors, then each actor would have a degree centrality of approximately C D ¯ .

These expected values can be used in the familiar chi-square test:

X ( g − 1 ) 2 = ∑ i = 1 g ( o i − e i ) 2 e i

where, o i = C D ( i ) , e i = C D ¯ , and X 2 would be distributed as (tested for significance against) the χ 2 on ( g − 1 ) degrees of freedom. This statistical distribution is applicable given that the ties are binary, distributed Bernoulli individually, summed to binomial and approximated by the normal distribution (the sum of squares being distributed chi-square; [

As an example, consider the core-periphery network depicted in _{i}’s would be 0.17 ( g − 1 ) = 0.17 ( 20 ) = 3.40 . The o_{i}’s seen in

X 2 = 14 [ ( 1 − 3.40 ) 2 3.40 ] + 7 [ ( 8 − 3.40 ) 2 3.40 ] = 14 ( 1.694 ) + 7 ( 6.224 ) = 67.282

compared to χ 2 on (21 - 1) degrees of freedom, yielding p < 0.0001 , indicating that the actors’ degree centralities indeed vary significantly from a uniformly distributed expected degree as if the ties were random.

Given that chi-squares are most applicable to frequencies, the raw, non-normed degree centralities should be used, as in the example just shown. If a network researcher wished to work with the normed degrees, it is easy to show that the X 2 would merely need to be scaled up by multiplying the normed X 2 by ( g − 1 ) , obviously to cancel the effect of the norming having previously divided by ( g − 1 ) . That is, X 2 = ( g − 1 ) X normed 2 .

Given that the chi-square statistic is comprised of the sum of squared elements each of which is distributed as a z-statistic ( [

standardized residuals, z = o i − e i e i (each piece to be squared in the chi-square

formula above) may be compared to a z-distribution, e.g., for 95% confidence level at ±1.96. To continue with the core-periphery example just analyzed, any of the actors with a degree centrality of “1” (namely those in the periphery) would be deemed not significantly different from what a random network would yield, because

z = 1 − 3.4 3.4 = − 1.30 > − 1.96 ,

whereas the actors with degrees of “8” (that is, those in the core) have centralities that significantly exceed those which would result from a random network:

z = 8 − 3.4 3.4 = 2.49 ,

which exceeds 1.96. Obviously real networks will have finer gradations of degree centralities, and each observed degree centrality value can be tested in this manner.

Expected values are more challenging to derive analytically for closeness, betweenness, and eigenvector centralities, however, they are easily obtained empirically, through the generation of numerous random networks, for fixed g’s and densities. To do so, we generated 100 random networks each for 15 levels of varying g (10, 20, 30, …, 150) and 9 levels of density (0.1, 0.2, …, 0.9). The means for the standardized degree, closeness, betweenness, and eigenvector centralities, as well as their standard deviations, were obtained.

Rather than presenting 15 × 9 tables of reference values, those values were submitted to regressions to replicate the tabled findings and also to allow for estimates of expected centralities for g’s or densities not tabled, such as g = 27, or 134, etc. Thus we used the results as a database in which we regressed the average degree (or closeness, or betweenness, or eigenvector) centrality against the predictor factors of g and density. The resulting equations appear in

It is not surprising that the R^{2} for degree centrality is nearly perfect, given the analytical solution shown previously (indeed this empirical approach was not necessary for degree centralities, given the explicit analytical solution just presented). The R^{2} for eigenvector and betweenness centralities are high enough to suggest that the equations can be useful.

The R^{2} for the closeness centralities is very weak, so those forecasts should be considered very approximate. We sought better predictive models for closeness, and obtained increases in R^{2} to levels of 0.3 and 0.4, but the models seemed convoluted, e.g., adding predictive terms such as g^{2} (in addition to l n ( g ) ), or

Degree* | Eigenvector | Closeness | Betweenness | |||||
---|---|---|---|---|---|---|---|---|

b | p | b | p | b | p | b | p | |

Intercept | −0.00063048 | 0.50379 | 1.74651 | 0.14623 | ||||

g | 0.00059528 | <0.0001 | 0.00458 | 0.009 | 0.00030033 | <0.0001 | ||

density | 1.00082 | <0.0001 | 0.01969 | <0.0001 | 0.28678 | 0.004 | −0.02882 | <0.0001 |

l n ( g ) | −0.10348 | <0.0001 | −0.36897 | 0.0003 | −0.03423 | <0.0001 | ||

R^{2} | 0.9999 | 0.9635 | 0.1789 | 0.7116 |

As an example of using these equations, a network with g = 30 actors and 25% density would yield expected values of betweenness centralities of 0.03 (read the table above within a column): ExpectedNormedBetweennessCentrality = 0.14623 + 0.00030033 ( g ) − 0.02882 ( density ) − 0.03423 ( l n ( g ) ) = 0.14623 + 0.00030033 ( 30 ) − 0.02882 ( 0.25 ) − 0.03423 ( 3.4012 ) = 0.031612 The network’s actual (normed) betweenness centralities may be compared to that expected base. Say there were 10 actors with betweenness centralities of 162.4 (normed betweenness of 0.4), 10 actors with betweenness centralities of 81.2 (normed at 0.2), and 10 actors with betweenness centralities of 0 (normed at 0.0). A X^{2} may be calculated: X 2 = 10 [ ( 0.4 − 0.03 ) 2 0.03 ] + 10 [ ( 0.2 − 0.03 ) 2 0.03 ] + 10 [ ( 0.0 − 0.03 ) 2 0.03 ] = 10 ( 4.563 ) + 10 ( 0.963 ) + 10 ( 0.030 ) = 45.63 + 9.63 + 0.30 = 55.56 On ( g − 1 ) = 29 degrees of freedom, the critical value of chi-square is χ 2 = 42.56 , which the observed X^{2} value exceeds; alternatively, the observed chi-square yields a probability value of 0.002. That is, for this hypothetical scenario, the set of 30 actors’ betweenness centralities are significantly different from the values of betweenness that would be expected for a network with a random distribution of ties among 30 actors with 25% ties present. *Given the analytical derivation for expected degree, this prediction equation would be simply of the form: Expected Normed Degree Centrality = 1.0 (density). Expected centrality values may be calculating using the spreadsheet available from the authors or the SAS code in the Appendix.

the interaction between the number of actors and ties, but these terms were not significant. There was one outlier observation: for a normed closeness of 1.0, the standardized residual was very large, 3.0, but we did not delete that observation, because statistically speaking these random samples were as likely as any other random samples, and to purify the results in this manner seemed arbitrary and potentially misleading. Next, given that closeness centralities reflect distances, in a manner not true of degree, eigenvectors, or even betweenness, we sought alternatively to model the raw, non-normed closeness centralities (on the same network data) and were more successful: the regression equation,

0.279 + 0.00084 g − 0.0075 density − 0.0776 l n ( g ) ,

resulted in an R 2 = 0.562 . The coefficient for density was not significant, and the reduced model, 0.275 + 0.00084 g − 0.0776 l n ( g ) , fit nearly as well, R 2 = 0.559 , certainly better than the R^{2} for the normed closeness values. Still, it seems reasonable to conclude that predicting expected values for closeness needs further study.

The formulas in

Degree | Eigenvector | Closeness | Betweenness | |||||
---|---|---|---|---|---|---|---|---|

b | p | b | p | b | p | b | p | |

Intercept | 0.26271 | 0.27074 | 2.35759 | 0.18596 | ||||

g | 0.00038748 | <0.0001 | 0.00054664 | <0.0001 | 0.00473 | 0.11 | 0.00050449 | <0.0001 |

density | −0.06384 | <0.0001 | −0.67384 | <0.0001 | −0.03206 | <0.0001 | ||

l n ( g ) | −0.05663 | <0.0001 | −0.06160 | <0.0001 | −0.53919 | 0.002 | −0.04801 | <0.0001 |

R^{2} | 0.8511 | 0.7338 | 0.2472 | 0.6403 |

To continue with the example of a network with g = 30 actors and 25% density, and an expected value for betweenness centralities of 0.03 from

network would contain all of its centrality values within the span of the random-derived confidence interval. A real network will very likely contain many of its centralities therein, but the difference is that some centralities will fall outside the bounds of the confidence interval.

The different states of the actors’ centralities may be correlated with other information. Actors who have significantly lower or higher centralities may be of one political party affiliation, gender, ethnicity, or attitude compared to actors whose centralities fall in the span of random values. That is, these inferential tests allow networks researchers to draw conclusions beyond simply which actors are significantly different, but in conjunction with other explanatory variables, the distinctions and variability may be explained.

This investigation considered the usefulness of random networks, whereby the lack of structure provides a beneficial baseline to determine a stochastic likelihood of substantial structure over random structure. Study 2 used wholly random networks of varying size and densities to derive expected values, with a combination of analytical and empirical derivations. Illustrations of the calculation of a chi-squared statistic on the expected values, and confidence intervals using the expected values and their standard deviations, were also provided.

For convenience, software for calculating expected values of centralities as in

The establishment of expected values, the chi-square tests, the z-score follow-up tests, and the confidence intervals are all important contributions to continue building on the inferential arm in the social network analysis literature. For both the network scholar and for the scholar’s intended audience, tests of hypotheses enable conclusions about what effects in the network are “real” in a manner offered with greater statistical confidence than the presentation of merely descriptive statistics.

This research considered four key centrality indices: degree, closeness, betweenness, and eigenvectors. Study 1 began with forms of stylized networks expected to exemplify conditions under which specific particular sensitivities of the four centralities should be clearest. Depending on the nature and content of the relational ties, some centrality indices seem more applicable or meaningful than others. As noise was added to the network ties, any distinctiveness was erased to the point that the average centrality indices converged across the network structures. This observation was suggestive, thus Study 2 focused on networks comprised entirely of random ties.

Study 2 focused on fully random networks. The examination of random ties allowed for the development of several comparative benchmarks. First, expected values were derived analytically for degree centralities and empirically for closeness, betweenness, and eigenvector centralities. The accuracies of the betweenness and eigenvector centrality estimators (and of course, degree) were at reasonably acceptable levels, but further research will be required to clarify the closeness estimators. Future research can also envelop non-binary and directed ties.

The expected values then enabled further tools for inferential tests of network structures, including first the chi-square test and its follow-up analyses for the micro-level examination, actor by actor to establish whether a set of actor centralities exceeded random patterns. Next, standard deviations were derived, which allowed for the construction of confidence intervals, similarly for the purpose of testing and demonstrating whether a set of observed centralities fell within the realm of random parameters or were significantly different from random, thereby indicating their more systematic and substantial patterns and natures.

In any given real network, many centrality values will be near the expected values, but the network as a whole would not be considered random unless all values were near their expected values, with none being statistically different. Instead, when actors’ centralities are significantly different from values expected in random networks, researchers can be confident that there is indeed something structurally interesting about those actors. Furthermore, those differences may be investigated to learn whether the different classes of actors (significantly lower, significantly higher, or random levels of centrality indices) are correlated with or explained by independent variables of theoretical interest to the network scholar.

This research builds further on the literature for inferential methods for analyzing social network data. Many centrality indices were originally created as descriptive statistics, without accompanying statistical distributions to test the significance of their observed values. The statistics offered in this paper were derived using random networks that served conceptually as a comparison. Observed centrality indices may now be tested against those standards to test the hypothesis as to whether the apparent network structure is random, or the pattern of ties is connected in a more meaningful way. Descriptive statistics are certainly informative, however an inferential approach goes a step further in allowing hypothesis testing about network structure, in turn enabling conclusions based less on subjective judgment and more on stronger grounds of statistical confidence.

We believe these techniques are easily implemented (see the Appendix). We hope they lend complementary insight to understanding actors in social network data.

The authors declare no conflicts of interest regarding the publication of this paper.

Iacobucci, D., McBride, R., Popovich, D.L. and Rouziou, M. (2018) Confidence Intervals for Assessing Sizes of Social Network Centralities. Social Networking, 7, 220-242. https://doi.org/10.4236/sn.2018.74017

SAS Code to Generate Expected Values and Confidence Intervals for Degree, Eigenvector, Closeness, and Betweenness Centralities

prociml;

g = 30; *<-- enter number of actors, g, here;

density = 0.25; *<-- enter density (or approximation) here;

degree = 1.0 × density;

print “expected degree” = “degree”;

eigenv = 0.50379 + (0.00059528 × g) + (0.01969 × density) − (0.10348 × (log(g)));

print “expected eigenvector” = “eigenv”;

closen = 10.74651 + (0.00458 × g) + (0.28678 × density) − (0.36897 × (log(g)));

print “expected closeness” = “closen”;

between= 0.14623 + (0.00030033 × g) − (0.02882 × density) − (0.03423 × (log(g)));

print “expected betweenness” = “between”;

sdd = 0.26271 + (0.00038748 × g) − (0.05663 × (log(g)));

sde = 0.27074 + (0.00054664 × g) − (0.06384 × density) − (0.0616 × (log(g)));

sdc = 20.35759 + (0.00473 × g) − (0.67384 × density) − (0.53919 × (log(g)));

sdb = 0.18596 + (0.00050449 × g) − (0.03206 × density) − (0.04801 × (log(g)));

lowcid = degree − (sdd × (1/(g−1)) × (tinv(0.975, (g−1))));

hicid = degree + (sdd × (1/(g−1)) × (tinv(0.975, (g−1))));

lowcie = eigenv − (sde × (1/(g−1)) × (tinv(0.975, (g−1))));

hicie = eigenv+ (sde × (1/(g−1)) × (tinv(0.975, (g−1))));

lowcic = closen − (sdc × (1/(g−1)) × (tinv(0.975, (g−1))));

hicic = closen + (sdc × (1/(g−1)) × (tinv(0.975, (g−1))));

lowcib = between − (sdb × (1/(g−1)) × (tinv(0.975, (g−1))));

hicib = between + (sdb × (1/(g−1)) × (tinv(0.975, (g−1))));

print “95% Confidence Interval for Degrees”: “ lowcid” to “hicid”;

print “95% Confidence Interval for Eigenvector”: “lowcie” to “ hicie”;

print “95% Confidence Interval for Closeness”: “lowcic” to “hicic”;

print “95% Confidence Interval for Betweenness”: “lowcib” to “hicib”;

quit; run;