Diversity Dimensions of Freshwater Fish Species around the World

The quality and coverage of the available taxonomical and geographical information and the recognition that diversity is multi-faceted are two main factors that hinder to understand the spatial and temporal variations of biodiversity. In this study, we aim to quantify the global distribution of five diversity components used to assess freshwater fish diversity in river basins around the world. The multidimensional character of these diversity components was estimated and the so obtained diversity dimensions mapped. This was done taking into account those well-surveyed basins discriminated by considering collector’s curves, and additionally by controlling for the effect of survey effort on all considered diversity components. A total of 1,472,109 occurrence records were analysed, corresponding to 17,292 species of increase with the number of fish species present in river basins and that a decrease in species richness may involve a loss of functionality. The levels of rarity and taxonomic diversity of many species-poor basins found in arid and cold regions suggest that the distinctiveness of their freshwater fishes is primarily a consequence of how isolated these basins are.


Introduction
One of the main goals of the Ecology and Biogeography is to understand the spatial and temporal variations that underlie biodiversity [1] [2]. However, two factors complicate this mission: 1) All the variables used to measure diversity (diversity components) depend fundamentally on the quality and coverage of the available taxonomical and geographical information [3] [4]; 2) Diversity is a multi-faceted measure [5] [6] [7]. The evolutionary history, phenetic variability and ecological functions of different species, among other dimensions of diversity, interact and covary according to a syndromic pattern. The multidimensional nature of biodiversity means that many of the commonly recognized diversity variables are associated and that some variables are better than others in explaining ecosystem functions [8] [9] [10].
Large-scale distributional patterns in freshwater fishes have been assessed according to species richness [11], endemicity [12] [13] and, more recently, beta diversity [14] [15], and functional diversity [16]. These studies have primarily focused on estimating the probable causal processes behind fish diversity [17] [18]. Freshwater fish research has been influenced by the multidimensional approach towards understanding biodiversity [19] [20]. Functional diversity measurements have begun to be incorporated into basic and applied studies [21] [22]. Ecosystem functioning is linked with the functional diversity of the species within ecosystems [23]; therefore selecting the right species traits and metrics to estimate functional diversity has become a priority. A recent review [24] demonstrated that a plurality of functional diversity studies have been of plants (31%), while a far smaller proportion of studies have been of fishes (8%). Most of the latter studies focus on marine species. There have been few functional studies of freshwater fish [25]- [33]. Most large-scale functional diversity studies of freshwater fishes have been based on ecomorphological traits (e.g. [34]) and focus on the relationships between taxonomic and functional diversity in coral fishes [35] [36].
In the present study, we use comprehensive world information about the taxonomy and distribution of freshwater fishes to examine the multidimensional character of several diversity components. We made a special effort to estimate C. Granado-Lorencio et al. Journal of Geographic Information System the functional diversity by delineating different trait states for over 16,000 species. Surveys carried out in the different world river basins may seriously affect diversity measurements [37]. Therefore, diversity dimensions were estimated in well-surveyed basins that had been discriminated using collector's curves. Additionally, the effect of survey effort on all considered diversity components was controlled. The aim of these analyses is to describe the geographic distribution of the main diversity dimensions observed in freshwater fishes globally and to determine the impact of each diversity component on this distribution.

Occurrence Records and River Basins
The data set of geographical records for freshwater fishes reported by Pelayo-Villamil et al. [38] was updated to reflect changes in taxonomy and to include the novel species described as of January 2020. Data sources include GBIF, web pages, museum collections, and journal articles [38]. Records were downloaded and filtered using the ModestR software package [39] [40] [41] [42]. GBIF records were filtered as follows: 1) records with the same latitude and longitude were excluded, 2) records with zero latitude or longitude were excluded, and 3) occurrences in habitats other than those corresponding to terrestrial freshwater ecosystems were eliminated [see 41 for details]. As of January 2020, 17,292 species of freshwater fishes were recognized by taxonomists as valid. Information about these species can be found on the website IPez1.4 (http://www.ipez.es, [43]). Of these, 17,148 (99.2% of the total) had associated geographical information. In total, 11,472,109 occurrence records were analysed.
The geospatial data for river basins were downloaded from the WaterBase project website (http://www.waterbase.org). This data was processed using ModestR. WaterBase global river basin data were downloaded from the drainage basin data set distributed through HYDRO1k, a hydrological database developed by the EROS Data Center of the U.S. Geological Survey (USGS). This database included a collection of global geo-referenced layers that had a 1 km resolution.
These layers had been derived from GTOPO30, a 30 arc-second digital elevation model (DEM) of the World. Using the World Geodetic System 1984 standard (WGS84), the drainage basins data were assigned latitude/longitude geographical coordinates. In order to generate the ESRI Shape files available via the Wa-terBase website, vertices were smoothed out by applying a 500-meter threshold.
The river basin dataset was originally obtained by combining flow accumulation and flow direction layers. These layers were derived from the DEM, which had been hydrologically corrected according to GTOPO30 dataset. The basins were organized using procedure of Pfafstetter [44], which had been adapted for use with the HYDRO1k dataset [45]. River basins were divided into six levels. Each sub-basin was assigned a unique Pfafstetter code, (i.e., a six-digit code with information regarding the interconnectedness of the basins). The second level of  [46] as used as the spatial unit for estimating diversity measurements (n = 440). This is because the second level was the geographical extent that best illustrated the effects of environmental parameters on the distribution of freshwater fish species [47].

Biodiversity Metrics and Biological Traits
Using the DER function of the EcoIndR package [48] [49] of the R software package [50], five diversity components representing different biodiversity metrics were estimated for each river basin: species richness (SR), geographic rarity (GR), rarity index (LR), taxonomic diversity (TD), and functional richness (FRic). GR reflected the average rarity of all species present in each river basin and was calculated as the inverse of the relative frequency of occupied basins [51]. LR weight the species according to their rarity (see [52] [53]). TD was used to determine the taxonomic hierarchical Linnaean level of the species observed in each river basin [54]. The FRic was defined as the volume of the functional space occupied by the species [55].
We modified the classification system described by Buisson et al. [56] in order to apply it to the functional description of freshwater fish species. Six traits were analysed. These traits represented three basic biological functions: Food acquisition traits, life habitat, and locomotion traits. Food acquisition traits were: 1) feeding habitat (pelagic, benthopelagic and benthivorous); and 2) trophic guild (primary consumer, secondary consumer, top-predator, omnivorous and detritivorous). Life habitat was defined as either pelagic, benthopelagic, or demersal. Locomotion traits were: 1) body length (in cm: small < 15, medium 15 -50, large 50 -150 and extra-large > 150), 2) rheophily (rheophilic, limnophilic and eurytopic) and 3) migration type (potamodromous, anadromous, catadromous, amphidromous, oceanodromous and no migration). When not available through FishBase, this information was collect either from https://www.fishbase.org/ or from source articles. It was not possible to include reproduction traits such as life span, parental care or reproduction habitat, because it was too difficult to assign these functional traits to over 16,000 species.

Well-Surveyed River Basins and Data Treatment
According to Pelayo-Villamil [37], 71% of world countries had inventories of freshwater fish species that were of poor quality. Differences in the quality of inventories could be observed between countries. Furthermore, even those countries with relatively accurate and reliable national inventories had provincial and regional inventories that varied highly in completeness [37]. Therefore, whenever the available raw occurrences of the species had been used in the past, species richness in some river basins had doubtlessly been underestimated. In order to prevent this bias, potentially well-surveyed river basins (WSB) were discriminated using the RWizard [57] application KnowBR [58] (www.ipez.es/RWizard). KnowBR was also available as an R package on CRAN [59]. KnowBr was used to build species accumulation curves from database records. As a surrogate for the survey effort carried out in each river basin, these curves described the relationship between the accumulated number of species and the increasing number of database records taken. WSBs were therefore defined as those basins which simultaneously had a final accumulation curve slope of ≤0.02 (two new species added every 100 records), a completeness value of ≥90% (the percentage of species predicted by the accumulation function that were also observed), and a ratio of number of records to number of observed species of ≥15. Fifty-two world river basins fulfilled these requirements (12.6% of the total). Most basins were located in the Nearctic region (n = 42). A low completeness value of 5% was assigned to all the river basins in which completeness values could not be computed (n = 70; a 17% of total) due to the low number of database records and/or the lack of asymptotic tendencies.
All the diversity components were influenced by bias and by the unequal knowledge about different world basins. The diversity components significantly correlated to a greater or lesser degree with the completeness values derived from the accumulation functions (Pearson product-moment correlations oscillating from r = -0.18 in the case of LR to 0.77 for FRic; p < 0.001 in all cases).
Thus, in addition to calculating WSBs, the dependence between diversity components and the survey effort carried out in each river basin was solved by performing a regression between the values of each diversity component and the completeness values obtained for the river basins. All these diversity metrics were firstly standardized to zero for means and one standard deviation to eliminate the effect of measurement scales. These regressions were adjusted to linear and quadratic functions in order to explore possible curvilinear relationships. A quadratic function is considered statistically significant when both linear and quadratic terms have a significance level of ≤1%. The residuals of these regressions are thus un-correlated with the completeness values used as a surrogate for survey effort (r values are zero in all cases). Subsequently, the relationships between the different diversity components were examined using a simplified version of the procedure proposed by Stevens & Tello [5] [6]. This procedure consisted of a principal component analysis (PCA) computed for the five diversity components (with a varimax normalized rotation) using the so generated orthogonal variables with eigenvalues higher than one as the main diversity dimensions. Of course, the values of diversity components and dimensions can be related with different types of explanatory variables (area, climate, historical, etc.). The objective in this study is not to examine the comparative relevance of different environmental variables on diversity differences, but to estimate the relationships among diversity components and the global distribution of the diversity dimensions.

Results
The raw data showed that functional richness was generally higher in the tropical regions of South America, Central America, Africa and Asia, but also, to a Two diversity dimensions appear when the information coming from the 52 WSBs is considered. The PCA analysis indicated that these two dimensions accounted for 48.1% and 23.4% of the total variability in diversity components, respectively. The first dimension was positively related with SR, LR and FRic.
These three components accounted for 88%, 77%, and 62% of the variance of this factor, respectively (square of factor loadings). The second dimension is positively related with TD (67% of variance) and GR (37%). The results obtained with the WSBs did not match those observed when complete set of data was used in the analysis (Figure 2(a)). In this case, the two first PCA dimensions explained 40.8% and 27.3% of the total variability. The first dimension was positively related with SR, TD and FRic (60%, 51% and 88% of variance, respectively), while the second dimension was positively related with the two rarity metrics GR (47%) and LR (59%). Geographical patterns derived from the raw data were influenced by the unequal knowledge since all diversity components are positively correlated with completeness values and therefore, their variation could be explained by differences in the survey efforts carried out in each river basin ( Table 1). The residuals of these regressions were rescaled to values of between 0 and 1 (corrected diversity components). A PCA analysis on these rescaled values again selected two diversity dimensions that could account for 34.3% and 30.4% of total variability in the diversity components, respectively (Figure 2(b)). As in the case of the WSB based analysis, the first dimension is positively correlated with SR and FRic (63% and 80% of total variance) but was uncorrelated with the two rarity components. The second dimension is negatively correlated with GR and LR (40% and 58% of variance), and positively correlated with TD (44% of total variance) (see Table 1). The geographical distribution of the so obtained dimensions   Meanwhile, some of basins located in North Africa and in the north of the Palearctic region that were poor in species had higher taxonomic diversity (positive values in dimension 2). Rarity seemed to be high (negative values in the second dimension) in South America, South Eastern Asia, Central Africa and Europe ( Figure 3). The frequency distributions of the corrected diversity components were different ( Figure 4). Consequently, values equal to or higher than upper quartiles were selected and the corresponding basins qualified as the "most diverse". The geographical distributions of these "most diverse basins" allowed us to better assess the distribution of the above mentioned diversity dimensions and the relevance of the different diversity components ( Figure 5).
Both SR and FRic were found to be higher in eastern North America, tropical South America and Africa, Australasia, Easter Asia and Europe. However, TD was higher in the south of South America, the western part of the Nearctic region, North Africa, and across the Palearctic region. GR was higher in the northern part of North America, the southern part of South America, Europe and across the Palearctic region. LR was higher is the eastern part of Central Africa, Australasia, Asia, Europe and South America. Only in some in river basins of western South America (n = 4), Europe and the Mediterranean region (n = 3), and South Eastern Asia (n = 5) was it possible to find high values of four diversity components ( Figure 6).

Discussion
A primary goal of ecology and biogeography is to determine biodiversity patterns [60]. Biodiversity is generally described in terms of taxonomic entities.
However, this approach has crucial limitations because it ignores the multidimensional character of diversity [5] [6] [7]. In this study, we aimed to surpass C. Granado-Lorencio et al.   diversity patterns under these circumstances? Our approach has been to minimize the effects of these limitations by estimating the residuals of the relationship between each one of the diversity components and the completeness values derived from the accumulation functions. These corrected diversity components provide a less biased image about the global distribution of diversity in freshwater fishes. The confidence in these corrected diversity dimensions is based on the fact that they are fundamentally similar to those obtained from analysing well-surveyed river basins.
Kuczynski et al. [30] found a weak congruence between different diversity components in Europe. However, at a global scale our analyses show that the different diversity components representing the distribution of freshwater fishes in world river basins can adequately be summarized using only two main dimensions. The first diversity dimension reflects a gradient in species richness and functional diversity: these diversity components are higher in regions with a tropical or subtropical climate and lower in regions with arid, cold or cold-temperate conditions (see [61]). This gradient can be explained by taking into account the dominant effects of energy availability and habitat heterogeneity [17]. The second diversity dimension summarizes rarity and taxonomic diversity values that sometimes occur in some species rich basins such as in Easter South America, Europe or Eastern Asia. However, these basins are primarily located in species poor areas under arid, cold and cold-temperate conditions. Northern areas of the Holarctic region and southern South American basins covered by ice sheets during the Last Glacial Maximum could have propitiated the isolation and distinctiveness of the freshwater fishes found in these regions [62]. Similarly, the singularity of the Sino-Oriental or the arid African basins can be explained as a consequence of the isolation generated by the contraction of either ancient watercourses [63] or the uplift of the Tibetan plateau [64].
Studies on functional diversity have always shown a heterogeneous pattern with respect taxonomic groups and ecosystems. However, fish were usually not This relationship can vary depending on the chosen metrics [68], and the number of analysed traits [69]. However, the fact that this relationship between functional diversity and species richness can be observed in natural systems would indicate that a river basin with a higher number of species generally also has a higher number of occupied niches. Functional diversity influences ecosystem dynamics and stability [70]. Therefore, the association observed between these two diversity components among world freshwater fish would suggest that ecosystem functions increase with the number of fish species found in river basins, but also that a decrease in species richness might involve a loss of functionality.

Conclusion
In this study, we aim to surpass the biases and gaps in the distributional information on world freshwater fishes taking into account the multidimensional character of diversity. The provisional character of this faunistic data is evident because just one world basin in seven would have a reliable inventory, most part of them located in the Nearctic region. Considering completeness calculations, we propose here to estimate corrected diversity components able to provide a less biased image. This approach seems to generate reliable patterns when they are compared with those coming from the analysis of well-surveyed basins. In the light of this work, two diversity dimensions seem to be enough to offer a consistent picture on the geographical patterns of world freshwater fishes. One dimension is related with the tropical-temperate latitudinal gradient in species richness and functional diversity, and another associated with the higher rarity and taxonomic diversity that host some species poor areas located under arid, cold and cold-temperate conditions.

Data Availability Statement
All the primary data used in this study are freely available in the IPez1.4 website (http://www.ipez.es). A detailed description of the data sources, the considered species, the states of each biological trait assigned to each species, and the values of the diversity components, the slopes, completeness, and the ratios of the number of species to records for each level 2 river basin are available as supplementary material.