The Bio-Geographical Regions Division of Global Terrestrial Animal by Multivariate Similarity Clustering Analysis Method

A novel multivariate similarity clustering analysis (MSCA) approach was used to estimate a biogeographical division scheme for the global terrestrial fauna and was compared against other widely used clustering algorithms. The faunal dataset included almost all terrestrial and freshwater fauna, a total of 4631 families, 141,814 genera, and 1,334,834 species. Our findings demon-strated that suitable results were only obtained with the MSCA method, which was associated with distinct hierarchies, reasonable structuring, and furthermore, conformed to biogeographical criteria. A total of seven kingdoms and 20 sub-kingdoms were identified. We discovered that the clustering results for the higher and lower animals did not differ significantly, lead-ing us to consider that the analysis result is convincing as the first zoogeographical division scheme for global all terrestrial animals.


Introduction
Biodiversity is important to humankind, and the significance of its protection is well recognized by both scholars and governments [1]. Of all the measures for managing and protecting biodiversity, biogeography constitutes a basic, but very useful, tool [2] [3]. The discipline of biogeography originated in 1761 when it was introduced by the French naturalist Georges Buffon [4]. Early zoogeograph-minor revisions [10], the original map from this publication is still in use today [11].
The exploration of biogeography has continued into the 21st century. In addition to the lengthy debates regarding the rationality of the "Wallace line" [12] [13] [14] [15], the development of quantitative analytical methods for determining and refining zoogeographical regions has been central to biogeographic research [16]- [26]. There has been considerable focus on zoogeographic division schemes in the last few decades, and a variety of schemes based on different methods have been proposed between 7 -14 divisions [27]- [42]. Being faced with numerous and disorderly results, Morrone J.J. wonders that bio-geographical regionalization is a spectre haunting biogeography [43]. Unfortunately, there are three significant issues with these proposals.
Firstly, while the necessity for quantitative methods in biogeographic division schemes has been recognized, systematic comparisons of different similarity coefficient formulas and clustering algorithms are lacking. Additionally, some similarity coefficient formulas are only accurate under defined conditions, and thus their use is restricted. Secondly, researchers have used the grid method to define basic geographic units, which are typically generated using latitude and longitude coordinates or geographical distance. Although this method is acceptable, species distribution records, which have been collected and accumulated long-term by taxonomists, are not associated with grid method. The variations in the collection degrees (such as the frequency, timing, and depth) between each grid could result in discrepancies, thereby influencing the model estimation. The grid cell strategy is thus best suited to medium-scale field investigations, but is not appropriate for global-scale clustering analyses. Thirdly, there has been greater focus on higher animals, such as vertebrates, despite the fact that they only represent a small percentage of the global fauna (4% of species, 5% of genera). Lower animals should also be included in biogeographical research. Considering the concerns outlined above, the aim of this study was to develop a division scheme for the global terrestrial fauna based on a comprehensive analytical quantitative framework. To achieve this, we implemented a novel approach based on multivariate similarity clustering analysis (MSCA) combined with the similarity gen-

Global Terrestrial Animal Species
The materials used in this study originated primarily from three sources: catalogs, checklists, or taxonomic monographs on global or regional fauna [44]

Division of Basic Geographical Units (BGU) and Building Databank
Based on ecological conditions and animal distributions [111], we divided the global terrestrial surface, excluding Antarctica, into 67 BGUs (   The distribution information of the major animal groups in each BGU is described in Table 2.

Clustering Methods
Although similarity formulas are more than a few dozens [112], they are only able to calculate the similarity coefficient between two regions. In this study, the similarity coefficient of multiple regions was calculated as the percentage of the average number of common species in the participating regions to the number of all species [113]. We defined the similarity general formula (SGF) as: where SI n is the similarity coefficient of n geographical units; S i , H i , and T i represent the number of total species, common species, and unique species of BGU i, and H i = S i − T i ; S n is the total number of species in n BGUs. All of these values can easily be obtained from the database, which is convenient and efficient for both manual and computational analysis. We used a combination of MSCA and SGF in this study. In MSCA, the similarity coefficient of any group of BGU is calculated directly using the raw data of the participating BGUs [114], and it is not affected by the previous similarity coefficient, and furthermore, is not limited by the sequence of the clustering analysis. The general similarity coefficient (GSC) of all 67 BGUs can even be calculated first. Final dendrogram can be generated according to the size of these similarity coefficients. This method has been validated in some fauna [115]- [124], and has been successfully used for distribution pattern analysis at large geographic scales [125] [126] [127].
The results of the above method were assessed by comparison with three common hierarchical clustering methods: The single linkage method (SLM) [128], also known as minimum distance method, is the most basic clustering analysis method. This method uses the similarity coefficient formula proposed by Jaccard (1901) [16], where, SI = C/(A + B − C).
The sum of squares method (SSM), also known as Ward's method (21), usually provides better results than the above models, but involves more complex calculations. In this method, we used the similarity coefficient formula proposed by Czekanowski (1913) (17), which is also called the Sørensen formula

Clustering Results of Terrestrial Animal
The results of the MSCA clustering analysis of 141,814 global terrestrial faunal genera are shown in Figure 4. The GSC value was 0.066, and at a similarity of 0.300, 67 BGUs were grouped into 20 smaller unit crowds (SUCs), labeled from a to t. At a similarity of 0.200, the BGUs were further grouped into seven larger unit crowds (LUCs), labeled from A to G. Each unit within a crowd was adjacent to another unit, thereby satisfying principles of geography. The ecological conditions of each crowd were relatively consistent, which met ecological principles.
In addition, intra-crowd similarity was greater than inter-crowd similarity, thereby realizing statistical principles. The MSCA clustering analysis results for the higher animals and lower animals are shown in Figure 2 and Figure 3, respectively, using the same letters for the crowds to facilitate direct comparisons. The animals grouped into seven LUCs and some SUCs at specific levels and exhibited similar crowd compositions of Figure 4. Some variation in the location of a few BGUs between two regions existed, but these nevertheless still conformed to geographical principles.
Our scheme supported many aspects of these proposals, including the subdivi- China has the most animal genera and very much endemic genera (Table 3).
Obviously, this is the great contribution of Chinese zoologists.

Comparison with Traditional Clustering Methods
The SLM results for the same dataset were chaotic and the groups were difficult to categorize ( Figure 5). Most of the BGUs were considered to be noise, and when the distance between two clusters was set at 0.730, only two crowds (D and E) could be recognized. In contrast, AGL provided significantly improved results with appreciably less noise ( Figure 6). When the distance value was set at 0.740,       five crowds could be distinguished. Of these five crowds, four corresponded to the C, D, E, and G groups. One crowd consisting of 31 BGUs was very complex and could only be categorized into three crowds when the distance value was set at 0.550. More definitive clustering results were obtained with SSM ( Figure 7). When the distance value was set at 1.40, eight crowds were obtained; among which seven were comparable to the seven crowds in the MSCA, while the remaining crowd had no geographical significance, and further categorization using SSM proved challenging. These findings indicate that these three clustering methods do not satisfy zoogeographic requirements.

A Biogeographical Division Scheme for the Global Terrestrial Fauna
Based on the clustering results, we suggest that the terrestrial world can be divided into seven kingdoms and 20 subkingdoms using an animal geographical regionalization scheme (Figure 8). This is the first geographical regionalization scheme that represents the overall global terrestrial fauna. Our scheme showed a similar overall distribution pattern to Wallace's scheme [9], with some notable differences. For example, in our study 1) the Palaearctic Realm is further divided into eastern and western halves; 2) New Guinea Island and the Pacific Islands are regarded as part of the Oriental kingdom as opposed

Conclusions and Discussion
To the best of our knowledge, this constitutes the first quantitative attempt at a Based on the observation that the same distribution patterns exist for higher and lower animals worldwide despite their distinct evolutionary stages, degrees of evolution, and habitats, we deduce that the same distribution patterns may also be shared among animals, plants, and microbes. Therefore, we recommend the use of quantitative analyses, such as MSCA, to establish a biogeographic division scheme for all terrestrial living organisms, including plants and microbes.