Who Is Collaborating with Whom in Science ? Explanation of a Fundamental Principle

Since many decades power functions are well-known in counting single scientists or co-author pairs in social networks. However, in this paper a developed procedure for visualizing a bivariate distribution of co-author pairs’ frequencies hence producing three-dimensional graphs is presented. This distribution is explained by a fundamental principle of social group formation and described by a mathematical model. This model is applied to 52 co-authorship networks. For 96% of them the squared multiple R is larger than 0.98 and for 77% of the 52 networks even larger than 0.99. The visualized social Gestalts in form of three-dimensional graphs are rather identically with the corresponding empirical distributions. Question: Can we expect a general validity of this mathematical model for co-authorship networks?


Introduction
What does it mean: "Fundamental Principles of Social Group Formations?"There are both visible formations and non-visible.The visualization of a special non-visible fundamental principle of social group formations in co-authorship networks is presented in our paper.In this connection a mathematical model for the intensity function of interpersonal attraction (Social Gestalt) will be explained.

Fundamental Principles of Social Group
Formations (Visible and Non-Visible) Already in the prehistoric times of higher vertebratesbirds and mammals-several fundamental principles of well-ordered social group formations had existed and some of them are also identifiable in humans.For example, group formations of fishes or birds are well-known.These kinds of group formations can be observed by our own eyes (cf.Figures 1 and 2).The shapes of these group formations can be changed depending on the changing environment (cf. Figure 2).However, well-ordered social group formations in coauthorship networks are non-visible formations and have to become visualized by special methods.Changing shapes are presented and explained by a mathematical model for the intensity function of interpersonal attraction (Social Gestalt), example, cf. Figure 3.In the present paper we are looking for a well-known fundamental principle of social group formations (social Gestalts).
We have created methods for visualization of the corresponding well-ordered social Gestalts, i.e. they are well-ordered like the shown shapes above with changing shapes in dependence on both the changing environment and/or scientists.Result: We have visualized social Gestalts in co-authorship networks concerning the question: Who is collaborating with whom in science?

Brief Literature Review: Who Is Collaborating with Whom in Science?
There is a rapid increase of cooperation in science since many decades.This has led to an increase in the number of scientific studies of this topic internationally [1,2].
A high diversity of studies regarding the question "Who is collaborating with whom" can be found in the literature.A few of them should be mentioned here:  On the level of cross-national collaboration Glänzel & Schubert [3] have presented the cross-national preference structure of 36 selected countries.The study revealed that geopolitical location, cultural relations and language are determining factors in shaping preferences in co-authorship. Liang & Zhu [4] obtained for inter-regional co-operation in China that geographical proximity is an important factor.Nagpaul [5] studied international research collaboration and concluded that geographical proximity has greater positive impact on the preference in collaboration than thematic proximity and socio-economic proximity.Geographical proximity is also studied by Have-mann, Heinz & Kretschmer [6] and by Katz [7]. On the individual level Braun, Schubert & Glänzel [8] have described the features of productivity and copublication patterns of four types of authors.For this purpose, the authors were classified according to their anterior and posterior records.The collaboration pattern between the four types was studied.
The above mentioned studies are important in general and of special interest for science and technology policy.Additionally, other approaches in the field of collaboration studies are coming from the informetric point of view and are related to mathematical modelling of col-laboration networks.
Egghe's paper [9] has given a model for the size-frequency function of co-author pairs.Starting with the well-know Lotka's Law [10] Egghe has emphasized [9]; another way of looking at author production is by studying co-author pairs and the number of their joint papers.According to Egghe's opinion this topic is becoming more and more important since collaboration increases in time.
Egghe has referred to both Morris and Goldstein's [11] approach regarding univariate distributions and Kretschmer & Kretschmer's approach [12] regarding a bivariate distribution, hence producing three-dimensional graphs.
Univariate distribution means, counting authors as in Lotka's distribution is replaced by counting co-author pairs, but hence also producing two-dimensional graphs similar to Lotka's distribution.
Whereas the univariate approaches by Egghe [9] and by Morris & Goldstein [11] don't include the aspect: "Who is collaborating with whom", Kretschmer & Kretschmer's [12] bivariate distribution is focused on this question: Is there any fundamental principle existing regarding the preference in collaboration between individual authors?

Hypotheses
 In dependence on the productivities of collaborators -a special fundamental principle of social group formation is a determining factor in shaping preferences in co-authorship between individual scientists. The corresponding well-ordered distributions of coauthor pairs' frequencies and changing shapes of these distributions (Example, cf. Figure 3) can be described by a mathematical model.

Theoretical Approach
On this place a brief introduction of the used methods is necessary for further explanations of the theoretical approach.The extended version regarding the methods can be found in Appendix.

Brief Introduction of the Used Methods
The number of publications per author P is determined by resorting to the "normal count procedure".Each time the name of an author appears, it is counted.The co-author pairs P, Q are counted under the condition of both the first authors P count and the sec- A bivariate distribution of co-author pairs' frequencies i j is studied hence producing three-dimensional graphs.Under the condition the place of P or Q in the by-line is not considered, a symmetrical matrix and following a symmetrical graph is resulting (Example, cf.Table 1).
However, these distributions were restricted to max 31 i  .
Usually the stochastic noise is increasing with higher productivity because of the decreasing number of authors.We intend to overcome these problems in this paper with help of the logarithmic binning procedure.Newman has already proposed in 2005 [16] to use the logarithmic binning procedure for the log-log scale plot of power functions.To get a good fit of a straight line (log-log scale plot of power functions, for example Lotka's distribution), we need to bin the data into exponential wider bins.Each bin is a fixed multiple wider than the one before it.For example, choosing the multiplier of 2 we receive the intervals 1 to 2, 2 to 4, 4 to 8, 8 to 16, etc. interval of .In other words, the new value in a bin i is simply the arithmetic average of all the points in the bin.The sequence of the bins i is i (i = 1, 2, 4, 8, 16, 32, 64, 128, 256•••).The same holds for the sequence of the bins j.

i
We have extended this procedure for use in bivariate distributions (Details cf.Appendix.The width of a bin (cell ij ) in the matrix is the product of ∆i and ∆j: (∆i × ∆j).
The number of co-author pairs in a bin (cell ij ) has to be divided by the width of the bin: (∆i × ∆j).The result is the arithmetic average of all the points in this bin.
Using the log-log-log presentation after the logarithmic binning procedure, we are starting in this paper with the following kind of matrix (cf.Table 2) as basis for three-dimensional graphs.The sequence of log i (rows) is as follows: log i (log i = 0, 0.301, 0.602, 0.903, 1.204, 1.505, 1.806, 2.107, 2.408), the same is hold for log j (columns).The main diagonal of a square matrix is the diagonal which runs from the top left corner to the bottom right corner with: The secondary diagonal is the diagonal of a square matrix running from the lower left entry to the upper right entry with: c m and c s are constants.Visualization of the Social Gestalts: For visualization of the theoretical patterns (Social Gestalts) the Function Plot of SYSTAT is used and the Scatterplot for the empirical patterns.(The detailed procedure for visualizing the empirical data of the distribution of co-author pairs' frequencies in comparison For example, N i'j' = N 2,3 = 7. with the theoretical Gestalt can be found in Appendix (Heading 5)).An example for the predictions for the places of the empirical values in the theoretical pattern (Social Gestalts) is shown in Figure 4 as well as the overlay of the corresponding empirical data.Regarding the three-dimensional graphs we assume the fundamental principle of social group formations is one of the determining factors in shaping preferences in co-authorship between individual scientists (cf.1.3.Hypotheses).Therefore we have applied the corresponding mathematical model for describing co-author pairs' frequencies [12].This model is presented in Section 2.2 "Model for the Intensity Function of Interpersonal Attraction (Social Gestalt)".
Complementarities are a crucial determinant of this model.In conjunction with these complementarities, various shapes of the distribution of observed co-author pairs' frequencies emerge.Two contrary of them (convex and concave) are selected for presentation in Figure 5.

Table 2. Square matrix of co-author pairs (logN ij ). Note:
The main diagonal is green colored and the secondary diagonal red.The comparison between the curvilinear lines (red and green) in Figure 5 (convex and concave) is also shown in two-dimensional graphs (Figure 6).The two-dimensional graphs are used for explanation of the model in Section 2.2.

Model for the Intensity Function of
Interpersonal Attraction: Social Gestalt (Partially presented in [12]).

General Remarks
Interpersonal attraction is a major area of study in social psychology.Whereas in physics, attraction may refer to gravity or to the electro-magnetic force, interpersonal attraction can be thought of force acting between two people tending to draw them together.To achieve pre- dictive accuracy, one must refer to the qualities of the attracted as well as the qualities of the attractor when measuring interpersonal attraction.Qualities maybe for example, terms of age or of productivity.It is suggested that to determine attraction, personality and situation (or environment) must be taken into account.The notion of "birds of a feather flock together" points out that similarity of quality is a crucial determinant of interpersonal attraction.Do birds of a feather flock together, or the opposites attract?This leads to a model of complementarities.
The modern notion of complementarities introduced by Niels Bohr had existed already in a clear-cut manner in old Chinese thought, in the Yin/Yang teaching.Yin and Yang have to be seen as polar forces of only one whole, as complementary tendencies interacting dynamically with each other, so that the entire system is kept flexible and open to change (For example, expressed in varying shapes of the social Gestalt).
Based on this background Kretschmer [17,18] has already created a model for social Gestalts valid for social networks in general.This model is also applied for description of the distribution of co-author pairs' frequencies N ij [12].The present paper shows a special extension by logarithmic binning procedure.
The fundamental "formula" of Gestalt theory might be expressed in this way: there is a whole, the behaviour of which is not determined by that of their individual elements, but where the part-processes are themselves determined by the intrinsic nature of the whole [19].Holistic organizational patterns playing a role that comprised man and the environment.These holistic entities are often designated as psychological fields.Their tendency towards a stable state of order is called conciseness (or "Prägnanz") tendency, it is a "tendency towards a good Gestalt".The stable final state is, if possible, built up in a simple, well-ordered, harmonic and uniform manner in line with definite rules.Several authors take the view that these fields can be mathematically described.
With this in mind, the distribution of observed co-au-  thor pairs' frequencies can be considered to be a special reflection of a social Gestalt.
The model of social Gestalts mathematically describes, and textual explains, well-ordered fields of mutual attraction between a large numbers of individual persons.In conformity with complementarities or Yin/Yang theory these fields change their shapes depending on changing personalities and/or situations.
In brief: Holistic entities with balancing interactions of forces are considered both by Gestalt psychology and Yin/Yang teaching. The "tendency towards a good Gestalt" is a special feature of the Gestalt psychology. The complementarities are a special feature of the Yin/Yang theory.
In conclusion, the social Gestalt is a model that comprised the "tendency towards a good Gestalt" but the definite rules are obtained by considering the complementarities.
The model is based on two independent different Yin/Yang pairs:  The first Yin/Yang pair is connected with the secondary diagonal and its parallels (cf.Table 2), with the complementary poles: Similarities of attractors' qualities (called "Yang" as metaphor) Dissimilarities of attractors' qualities (called "Yin"). The second Yin/Yang pair is connected with the main diagonal and its parallels, with the complementary poles: high average level of attractors' qualities (called "Yang") low average level of attractors' qualities (called "Yin") "Independent Yin/Yang pairs" means, the complementary tendencies of one Yin/ Yang pair are interacting dynamically with each other independently from the complementary tendencies of the other Yin/Yang pair.A special function is defined for each of the Yin/Yang pairs: Z A for the first pair and Z B for the second.
Because of independence we assume the intensity of mutual attraction Z XY is proportional to the product of the two functions Z A and Z B : In detail: We assume the intensity structure of mutual attraction Z XY can be described by a function of a special power functions' combination (X is the value of a special personality characteristic (or quality) of an attracted and Y is the value of the same personality characteristic (or quality) of the attractor and in case of mutual attraction also vice versa).
Whereas the empirical shapes of power law distributions are rather similar with each other the empirical shapes of the three dimensional distributions show manifold shapes.

Function Z A for the Description of the First
Yin/Yang Pair The function Z A is a combination of two special power functions: The crucial determinant of interpersonal attraction (similarity or dissimilarity respectively) suggests to consider the distances A between the qualities of persons

 
A X Y   as independent variable of a power function: First power function: The 1 is added because logA is not possible in case A = 0.
A power function with only one parameter (unequal to zero) is either only a monotonically declining or only a monotonically rising function, when referred to both proverbs: either "Yin" or "Yang" (Figure 7).
In order to fulfil the inherent requirement that both proverbs with their extensions can be included in the representation, the second step of approximation will follow.
The similarity is highest at the minimum of A and lowest at the maximum and vice versa the dissimilarity is highest at the maximum and lowest at the minimum.
Moreover there is a complementary variation of similarity and dissimilarity.With increasing dissimilarity between persons, the similarity is decreasing and vice versa.
Thus, the model of complementarities leads to the conclusion to use additionally to A the complements of these distances (A complement ) as independent variable of the second power function: Second power function: A is a variable with the two opposite poles A min and A max .The sum of A min and A max is a constant.Thus, complement min max That means, the variable A complement is increasing according to the same amount as the variable A is decreasing and vice versa (cf.Table 3).First yin/yang pair: The first Yin/Yang Pair is related to the secondary diagonal and its parallels.The relationships of the two parameters α and β to each other determine the expressions of the complementarities (similarities, dissimilarities) in each of the shape.
Figure 8 shows in the middle eight several patterns created on the basis of varying α and β.
In every box of the figure the difference |X − Y| is al-ways the abscissa and Z A the ordinate axis.In the middle of the abscissa is X − Y = 0.The relationships of the two parameters to each other determine the expressions of Yin and Yang in each of the patterns.
Figure 8 shows only eight patterns as typical examples.However, according to Chinese philosophy, Yin and Yang are the opposite poles of a single whole.There is not an isolated exclusive Yin, or only a Yang.All transitions occur with a direct and uninterrupted sequence.The natural order is secured by the dynamic equilibrium between Yin and Yang.That means, theoretically an infinite number of patterns should have been produced in Figure 8 for presentation of all of the possible patterns.
For explanation, we intend to characterise 4 types of similar patterns only:  The Yang pattern (upper pattern);  The Yin pattern (pattern below);  The Yin-in-Yang pattern (left pattern) and;  The Yang-in-Yin pattern (right pattern).
While in the upper pattern Yang is more likely to be in the foreground (Birds of the feather flock together), the pattern below reveals that Yin is more likely to be accentuated (Opposites attract).
Starting from the left side of the upper pattern in the direction of the pattern below from pattern to pattern Yang has retracted itself in favour of Yin.Vice versa,

Table 3. Example:
starting from the right side of the pattern below in the direction of the upper pattern from pattern to pattern Yin has retracted itself in favour of Yang.The patterns in Figure 8 are arranged that the opposite placed patterns show the opposite meaning regarding Yin and Yang.(For example: "Yang pattern"/"Yin pattern" or "Yin-in-Yang pattern"/"Yang-in-Yin pattern", etc.).
Note: log-log presentations of patterns in Figure 6 can have similar shapes like the here in Figure 8 shown 8 original patterns.The here on the right side above attached third pattern from the second row in Figure 6 shows Yang is more likely to be in the foreground (Birds of the feather flock together).The here on the left side below attached fourth pattern from the second row in Figure 6 shows Yin is more likely to be in the foreground (Opposites attract).Another pattern is shown from the empirical results (upper pattern on the left side in Figure 8): Yin-in-Yang pattern.The 8 patterns in the middle of Figure 8 are arranged that the opposite placed patterns show the opposite meaning regarding Yin and Yang.(For example: "Yang pattern"/"Yin pattern" or "Yin-in-Yang pattern"/"Yang-in-Yin pattern", etc.).
The 8 patterns in the middle of Figure 8 can be found on the secondary diagonal and its parallels.

Function Z B for the Description of the Second
Yin/Yang Pair For the purpose of completion, the addition as opposite of subtraction (A = |X − Y|), is supposed as independent variable of the third power function: Third power function: and the complement (B complement ) is the independent variable of the fourth power function: Fourth power function: In analogy to A and A complement : Second yin/yang pair: including Equation (11).
The second Yin/Yang Pair is related to the main diagonal and its parallels.
This second dimension is placed orthogonal on the first dimension.Thus, these 8 patterns can be found on the main diagonal and its parallels.The upper patterns of Figure 6 (the curvilinear lines regarding the main diagonal: convex and concave are attached on the top of Figure 9 for comparison.

Function ZXY: Intensity Function of
Interpersonal Attraction (Social Gestalt) Because the shapes of the second Yin/Yang pair (Figure 9) can vary independently from the shapes of the first Yin/Yang pair (cf. Figure 8) we assume the intensity of mutual attraction Z XY is proportional to the product of the two functions Z A and Z B : Intensity function of interpersonal attraction (Social Gestalt): with c = constant and the Equations ( 5), ( 8), (11) and (15).The measurement of the variables X, Y and Z XY including the Equations ( 20) and ( 21) are depending on the studied object.
X min = Y min (20) and We assume with rising Z XY the frequencies of social interactions are proportionally rising.Examples of social interactions are collaboration, friendships, marriages, etc., while examples of characteristics or of qualities of these individual persons (X or Y) are age, labor productivity, education, professional status, etc.This bivariate Intensity Function is producing threedimensional graphs.Based on the influence of the changes of men and environment the three-dimensional social Gestalts can change the shape resulting in a diversity of social Gestalts.These many Gestalts are classified into Prototypes (cf. Figure 10) in the sense of well-ordered patterns or "Good Gestalts".Several empirical social Gestalts matching the 5 Prototypes were already taken out and presented in Kretschmer 2002.The distribution of co-author pairs' frequencies N ij is one of the examples.The non-logarithmic presentation is similar to the left prototype.However, in this paper we are showing the corresponding log-log-log presentation only. .There is a conjecture by de Solla Price [20], physicist and science historian, that the logarithm of the number of publications is of a higher degree of importance than the number of publications per se.

Model for the Intensity
Thus, using the logarithm of the number of publications (logi or logj respectively) as personal characteristic "productivity", we define: Thus: Let us lay down a specific value for the maximum possible number of publications i (or j respectively) of an author as standard for such studies, which does not vary depending upon the given sample.It is assumed that the maximum possible number of publications of an author is equal to 1000, i.e. following: The theoretical mathematical function for describing the social Gestalts of the distribution of co-author pairs' frequencies is resulting in the following logarithmic version: with c = constant and with the Equations ( 22) and (23).
Although the field of mutual attraction (or social Gestalt) fails to determine completely individual pairs of persons in terms of the predictability of these individual pairs, the force that emanates from this field generates a statistically balanced evenness among all the individual pairs in their totality (Tendency towards a good social Gestalt).This statistically balanced evenness is enhanced as the number of co-author pairs' frequencies rises.We refer to Bernoulli's law of large numbers.
Obtaining large sample sizes is possible in studies of large bibliographies (journals or research areas).The application of logarithmic binning procedure is another method for obtaining this statistically balanced evenness.In conjunction with the model of complementarities, there are different shapes of social Gestalts depending on changing personality and situation or environment.

Overview about the Studied Networks
The function of N ij is a special case of the social Gestalts in social networks described by the general function Z XY .

Distribution of the 52 Networks over Scientific
Disciplines  29 networks belong to medicine and life sciences;  13 to physics, chemistry or technology;

Results
 7 to humanities and social sciences and;  3 of them are general oriented (PNAS, SCIENCE and NATURE).

Introduction
Distributions of co-author pairs' frequencies of 52 coauthorship networks are studied in total.For 96% of the 52 distributions the squared multiple R is larger than 0.98% and for 77% even equal or larger than 0.99 (cf. Figure 13).The social Gestalts in comparison with empirical data of these 77% will be presented in the Appendix.

Distribution of the 52 Networks over Resources
 45 are co-authorship networks obtained from journals (38 of them are taken from the SCI);  5 co-authorship networks are the results of field studies;  2 are the results of studies of institutes.On the first step, two of them are selected here for explanation [21].Figure 11 shows the overlay of the distribution of co-author pairs' frequencies (logN ij ) and the

Another Overview about the 52 Networks
The total number of co-author pairs' frequencies N ij per

Discussion and Conclusion
A mathematical model is presented for describing bivariate distributions of co-author pairs' frequencies in dependence on the productivities of the collaborators.This model is based on a special fundamental principle of social group formations in connection with complementarities.
Holistic entities with balancing interactions of forces are considered both by Gestalt psychology and Yin/Yang teaching. The "tendency towards a good Gestalt" is a special feature of the Gestalt psychology;  The complementarities are a special feature of the Yin/Yang theory.
In conclusion, the social Gestalt is a model that comprised the "tendency towards a good Gestalt" but the definite rules are obtained by considering the complementarities.
As a special method the logarithmic binning procedure is applied.Because of theoretical expectations regarding the complementarities, Gestalts with various shapes are resulting.The correlations between empirical and theoretical values are very high in 52 studied co-authorship networks (For 96% of these networks the squared multiple R is larger than 0.98).
First Proposal for studies in future: Discoveries of laws (or STRONG regularities respectively) belong to the main goals in basic research.Further, the correlations between empirical and theoretical values are very high in 52 studied co-authorship networks (For 96% of these networks the squared multiple R is larger than 0.98.This quality is comparable with Lotka's law, Bradford's law or Zipf's law).
Therefore, the main question is arising: Is there a new informetric law (or STRONG regularity respectively) existing?
In this line a hypothesis can be formulated regarding a possible general validity in co-authorship networks: A fundamental principle of social group formations is general existing regarding the preference in collaboration between individual authors.This social group formation can be described by a mathematical model.
For verification of this hypothesis, extensive and systematic studies have to be carried out for several specifications, for example several scientific fields, several journals, special kinds of limitations or advantages, etc.As mentioned above crucial determinants of the mathematical model are both the 'tendency towards a good Gestalt' and complementary tendencies interacting dynamically with each other, so that the entire system is Further, in practice we already could find concave patterns more frequently in small bibliographies with low maximum productivity of authors (maximum i or j per author is between 7 and 30).This kind of pattern could also be found in gender oriented journals.There are also differences in the behavior between male and female authors.We assume maybe there is a relation to minorities (concave).But we should investigate this question further and more in detail in future.
Vice versa, large bibliographies including high productivity authors (maximum i or j higher than 100) and natural sciences or medicine show the tendencies to convex patterns.
There are many bibliographies between the clear concave or clear convex patterns like the Yin/Yang teaching is speaking: There is not only Yin or only Yang.
Some of the possible questions for future studies:  Is there a change of the shape of the well-ordered distributions of co-author pairs' frequencies over time as suggested in Figure 3?  Are there special conditions the shapes of the social Gestalt apt to special poles as we can expect by the empirical results mentioned above?Self-similarity and power laws are successful in describing complex networks of interactions.Scale-free network models describe many natural and social phenomena, for example networks of interacting components of a living cell or social networks.
Further, there is already a successful example as a result from former studies available (cf. Figure 16).Thus, the following question for further studies is arising: Are self-similarities of Gestalts in co-author pair's networks in general existing?

Appendix
First the methods are described in detail for counting the co-author pairs and the logarithmic binning procedure.The visualization of the three-dimensional graphs of the social Gestalts is explained.
Second the 40 Gestalts with overlaid empirical data, under the condition the corresponding squared multiple R is equal or larger than 0.99, will be presented.

Method for Counting N i'j'
Given is an artificial bibliography including 8 papers (names of authors: ) G, H The number of publications i (or respectively) per author P (or Q respectively) is determined by resorting to the normal count procedure'.Each time the name of an author appears, it is counted (e.g.A three times: once in the first paper, and once each in the 4th and 8th papers)., i.e. the authors are ordered according to or i  respectively in both the row and the column.
Under the condition, the place of the authors in the by-line is not taken into consideration the symmetrical matrix is resulting.For example, the pair G,A is marked two times: once under the condition G count    4).
  j In the symmetrical matrix, one can determine for each author P the number of his collaborators N P .N P is equal to the Degree Centrality in Social Network Analysis (SNA).
The matrix of i j (derived from the symmetrical matrix, cf.Table 5) is the representation of the number of pairs i j with authors who have publications per author, with authors who have publications per author included in the bibliography.For example, the pairs E,D and F,D from Table 4 are counted as N 1'2' = 2 and N 2'1' = 2 in the matrix of (Table 5).

Logarithmic Binning
In general, an important problem when dealing with experimental data is to reduce the amount of stochastic noise.For example, to get a good fit of a straight line (log-log scale plot of power functions, for example Lotka's distribution) the logarithmic binning procedure can be used [16].As already mentioned in paragraph 2.1., to get a proper fit, we need to bin the data i into exponential wider bins.Each bin is a fixed multiple wider than the one before it.For example, choosing the multiplier of 2 we receive the intervals 1 to 2, 2 to 4, 4 to 8, 8 to 16, etc.··· i.e. the sizes or widths of the bins (∆i) are 1, 2, 4, 8, etc.···The number of samples in a bin should be divided by the width of this bin to get a count per unit interval of i.In other words, the new value in a bin is simply the arithmetic average of all the points in the bin (Example, cf.Tables 6  and 7).
After binning, the "new" distribution shows some Table 4. Symmetrical matrix.
Table 5. Matrix of co-author pairs N iʹjʹ .
is the number of collaborators of all authors with i' publications per author; N j' =  i' N i'j' is the number of collaborators of all authors with j' publications per author; N T = Total sum of pairs N i'j' .modified parameters related to the distribution without binning.In analogy to Lotka's distribution, Figure 17 shows the log-log scale plot of the power distribution of users among web sites [21] before logarithmic binning.

Table 6. Example before logarithmic binning (Y = Number of Authors with iʹ publications per author).
Figure 18 is the result after logarithmic binning.The exponentially wider bins appear evenly spaced on a log scale (Compare Figure 18 with Figure 17, before binning).We propose in this paper a new version of logarithmic binning, i.e. the logarithmic binning procedure regarding bivariate distributions hence producing three-dimensional graphs (log-log-log scale plot).The bivariate distribution of co-author pairs' frequencies ( i j ) should be considered, and will be counted by using the normal count procedure (Counting how many times an author appears in bibliographies).i j N   i j N   between authors with i publications per author and authors with publications per author is a function of and : In other words: i j is the number of co-author pairs.We were using the same method for counting N   i j N   as described in [12] and in paragraph 6.1.
In case we are starting with the matrix of co-author pairs' frequencies i j before logarithmic binning (Example, cf.Table 8), we need to bin both the data N   i and into exponential wider bins.j The logarithmic binning procedure is applied in our paper choosing the multiplier of 2.

i j i j
However, because of bivariate presentation the width of a bin (cell ij ) in the matrix is the product of ∆i and ∆j equal to (∆i•∆j), cf.Table 9.
The sum i j in a bin (cell ij ), cf.Table 10, has to be divided by the width of the bin: (∆i•∆j).In other words, the new value in a bin is simply the arithmetic average of all the points in the bin, cf.Table 11.For example, the value in the bin (cell 24 ) with i = 2 and j = 4 is as follows:   There are three advantages of the logarithmic binning procedure in combination with the following log-log-log scale presentation of the matrix above (logi, logj, and logN ij ):  Reducing the amount of stochastic noise;  The exponentially wider bins appear evenly spaced on a log scale;  On the secondary diagonal the sum of logi and logj is a constant (cf.Equation ( 1)).The same holds on the corresponding parallels but with different constants each;  On the main diagonal the absolute difference of logi and logj is a constant (cf.Equation ( 2)).The same holds on the corresponding parallels but with different constants each.
Using the log-log-log presentation after this procedure, we are starting in this paper with the square matrix of coauthor pairs (logN ij ), presented at the beginning of this paper in Table 2, as basis for the studies.In Table 2 the sequence of logi (rows) is as follows: logi (logi = 0, 0.301, 0.602, 0.903, 1.204, 1.505, 1.806, 2.107, 2.408), the same is hold for logj (columns).The main diagonal of a square matrix is the diagonal which runs from the top left corner to the bottom right corner (green colored in Table 2).The secondary diagonal is the diagonal of a square matrix running from the lower left entry to the upper right entry (red colored in Table 2).

Method of Gestalt Visualization
We are using the mathematical model of social Gestalts for describing co-author pairs' frequencies [12] in form of the log-log-log presentation (Equations (34), with Equations ( 22) and ( 23)).For visualization of the theoretical patterns (Social Gestalts) the Function Plot of SYSTAT is used and the Scatterplot for the empirical patterns.After regression analysis the obtained 4 parameters α, β, γ, and δ plus constant are entered into Equation (34).
Scale Range: The maximum and minimum values to appear on the axis are specified.Any data values outside these limits will not appear on the display.The minimum for the X-axis is specified as 0 (   min ) and the maximum is equal to the maximum bin  log 0 i  log i  max  of the empirical data (For example, in   11; the number of cuts in the grid is specified by 3 − 1 = 2.The resulting number of lines of the theoretical pattern (Gestalt) is equal to the double of the number of bins (In the example: 2 × 3 = 6).The number of points where two of the lines intersect, is equal to the square of the number of bins i i (In the example: 3 2 = 9).This square of the number of bins i is equal to the number of cells in the corresponding square matrix of co-author pairs (log N ij ).The Scale Range of the empirical pattern has to be equal to the theoretical Gestalt or slightly less.
After the overlay of the empirical distribution and the theoretical pattern into a single frame the goodness-of-fit is highest in the case where the empirical values (dots)obtained from the cells of the matrix-are directly placed on the points where two of the theoretical lines intersect.As the distance between the intersection points and the dots increases, the goodness-of-fit decreases.
The 40 Gestalts with overlaid empirical data are presented in this Appendix below, Figures 19-58.The cor-number of dots.N T is equal to the total number of coauthor pairs responding squared multiple R is equal or larger than 0.99.The presentation of the Gestalts is starting below.The Gestalts are ordered according to the squared multiple R from R 2 = 1.000 down to R 2 = 0.990.N is equal to the

Figure 1 .
Figure 1.Group formation of fishes (the picture is a copy from Wikipedia).

Figure 2 .
Figure 2. Changing shape of group formation (the picture is a copy from Wikipedia).

Figure 3 .
Figure 3. Example: Change of the shapes of the well-ordered distribution of co-author pairs' frequencies over time (Journal of Experimental Medicine.First and second rows: Pattern based on data from one year: 1980.Third and fourth rows: Pattern based on data from 19 years: 1980-1998).The patterns on the left (both first and third rows) are turned around three times each.

Figure 4 .
Figure 4. Overlay of the empirical data and the theoretical model.

Figure 5 .
Figure 5. Two prototypes of distributions of co-author pairs' frequencies with opposite shapes (Convex and concave).Patterns on the second row: The values on the third dimension (logN ij ) related to the main diagonals (cf.Table 2) are green coloured and the values related to the secondary diagonals (cf.Table 2) are red.The left prototype shows a convex shape and the right a concave.Each prototype is turned around two times.The prototypes are theoretical models.

Figure 6 .
Figure 6.The curvilinear lines related to the diagonals in Figure 5 (red and green) are shown here in two-dimensional graphs.Straight lines are added to the patterns in the first and second columns for comparison.The upper two patterns show the curvilinear lines attached to the main diagonal (left: convex, right concave) and the lower two patterns to the secondary diagonal (left: convex, right concave).The corresponding standardized versions of the last can be found on the right side.The upper two patterns on the left side are rather equal to the standardized versions.The two-dimensional graphs are used for explanation of the model in Section 2.2.

Figure 7 .
Figure 7. Power functions of similarity on the one hand or dissimilarity on the other.On the left, the parameter α is negative: "Birds of a feather flock together" and decrease of interpersonal relations with increasing dissimilarity.On the right, the parameter α is positive: "Opposites attract" and increase of interpersonal relations with increasing dissimilarity.This figure is a copy of a figure in [12].

Figure 8 .
Figure 8. Presentation how two polar forces are interacting as complementary tendencies dynamically with each other on one dimension.Figure 8 is a copy of a figure in [12] according to the function Z A , cf.Equation (10).

Figure 9 .
Figure 9. Presentation how two polar forces are interacting as complementary tendencies dynamically with each other on one dimension according to the function Z B , cf.Equation (17).
Function of the Distribution of Co-Author Pairs' Frequencies N ij One of the examples, how to measure the variables X and Y can be shown in relation to the function of the distribution of co-author pairs' frequencies ij XY N Z 

Figure 10 .
Figure 10.Prototypes of social Gestalts (non-logarithmic presentation).Several empirical social Gestalts matching the 5 Prototypes were already taken out and presented in [18].

Figure 15 .
Figure 15.The F-ratios of the regression analyses are increasing with the total number of N ij (left pattern), with the relative number of authors per article (middle) and with the number of cells in the square matrix (compareTable 2) of co-author airs (logN ij ) (right pattern).p


j Pairs P, Q are marked in the cells of the matrix under the condition of both the first authors P count   i and the second authors Q count   j j

Table 7 .
Example before logarithmic binning (Y = Number of Authors with iʹ publications per author).

Figure 17 .
Figure 17.Log-log scale plot of the distribution of users among web sites (This figure is a copy of [22]).

4 .
Gestalts with Overlaid Empirical Data (Dots) and Corresponding Squared Multiple R Equal or Larger than 0.99 (R 2 ≥ 0.99) Independently, on the last page an example is presented regarding the change of the shape of the wellordered distributions of co-author pairs' frequencies over time.Whereas the first row is showing the pattern based on data from one year of the Journal of Experimental Medicine, the second row shows the pattern based on data from 19 years (cf.Figure 59).

Figure 59 .
Figure 59.Journal of Experimental Medicine: Change of the shape over time from 1980 up to the period of 1980-1998 in total.First row: The two patterns from 1980 are taken from Figure 37 (bottom left and top right) with N = 15, R 2 = 0.995, F-ratio = 482.793,P = 0.000, N T = 2504, Authors = 724, Articles=260; Second row: The two patterns obtained from the period 1980-1998 are taken from Figure 42 (bottom left and top right) with N = 36, R 2 = 0.994, F-ratio = 1384.706,P = 0.000, N T = 133,852, Authors = 16,493, Articles = 5990.
•••, i.e. the sizes or widths of the bins are 1, 2, 4, 8, etc.•••The number of samples in a bin should be divided by the width of this bin to get a count per unit

Table 11 . After logarithmic binning: Matrix of N iʹjʹ (count) = average = (Sum N iʹjʹ in a bin)/i(bin) × j(bin).
maximum logarithm values of the whole Gestalt produced by the function.In case there are empirical values greater or less than these two theoretical values, the minimum or maximum of the Z-axis has to be extended accordingly.The Surface and Line Style dialog box is used to customize the appearance of lines or surfaces.The used XY Cut Lines are in two directions.The number of cuts in the grid has to be specified by the number of bins i (or j , respectively) minus 1 in the data set.For example, a special data set has 3 bins i (or j , respectively), as in Table