Prediction of Potential Sorghum Suitability Distribution in China Based on Maxent Model *

It is increasingly relevant to study the effects of climate change on species habitats. Using a maximum entropy model, 22 environmental factors with significant effects on sorghum habitat distribution in China were selected to predict the potential habitat distribution of sorghum in China. The potential distribution of sorghum under baseline climate conditions and future climate conditions (2050s and 2070s) under two climate change scenarios, RCP4.5 and RCP8.5, were simulated, and the receiver operating curve under. The accuracy of the model was evaluated using the area under the receiver operating curve (AUC). The results showed that the maximum entropy model predicted the potential sorghum habitat distribution with high accuracy, with Bio2 (monthly mean diurnal temperature difference), Bio6 (minimum temperature in the coldest month), and Bio13 (rainfall in the wettest month) as the main climatic factors affecting sorghum distribution among the 22 environmental factors. Under the baseline climate conditions, potential sorghum habitats are mainly distributed in the southwest, central, and east China. Over time, the potential sorghum habitat expanded into northern and southern China, with significant additions and negligible decreases in potential sorghum habitat in the study area, and a significant increase in total area, with the RCP8.5 scenario adding much more area than the RCP4.5 scenario.


Overview of the Study Area
Sorghum, as one of the five wine grains of Wuliangye, is a traditional cereal crop, an annual herb of the grass family Sorghum, with multiple edible and medicinal effects. The main production areas are concentrated in the northeastern region, eastern Inner Mongolia, and the hilly mountains of the southwestern region. The historical survey of sorghum distribution in China shows that sorghum is grown in China across five climatic zones: cold temperate, temperate, warm temperate, subtropical, and tropical. China is a vast country with complex topography that spans the subtropical and northern temperate zones with varying climates.

Spatial Distribution of Sorghum Research Data
In this paper, literature and specimen data were reviewed to obtain the distribution loci of sorghum. For this study, sorghum sample point data were selected from data recorded in the Chinese Herbarium (CVH, http://www.cvh.ac.cn) as well as the National Specimen Platform (NSII, http://www.nsii.org.cn/). The data were selected by removing specimens that were too old and trying to select sample points with clear records. As some of the data were recorded as approximate locations without specific latitude and longitude information, they belonged to the surface data, which were obtained through ArcGis combined with Baidu maps to get the central latitude and longitude information of these surface data, and 128 sample points were obtained.
As shown in Figure 1, the sorghum suitability distribution map generated in the current climate was compared with 128 sample point distribution data, and the suitability distribution map of sorghum in Arcgis 10.8 was overlaid with a 1:1 million digital plant layer to remove distribution record points that were not within the sorghum suitability distribution area. In addition, the distribution points were subjected to buffer analysis and 128 data sample points were proofread and screened to finally identify 108 sorghum distribution points [11].

Predictive Environmental Factors
Climate factors are widely used as important environmental variables and modeling references in biodistribution prediction [12]. In this study, a total of 22 predictive environmental factors related to sorghum distribution were selected, of which 19 climatic factors represent mainly temperature and precipitation and seasonal variation characteristics [13], and the other three are topographic factors mainly containing elevation slope slope direction. The WorldClim climate dataset (version 1.4) is the highest resolution climate data publicly available, and The topographic factors of elevation, slope, and slope direction were extracted using the 3D Analyst tool in ArcGIS 10.8.1 software.
Using ArcGIS 10.8.1, the 22 environmental factor raster data were processed separately into a transformed format and unified to the same coordinate system, same range, and 1kmx1km resolution.
There are certain correlations among environmental factors [14]. In correlation analysis, the correlation coefficient is a quantity that describes the degree and direction of the prevailing relationship. Correlation analysis refers to the analysis of two or more variable elements with correlation, so as to measure the correlation degree of two variable factors. Correlation analysis can be carried out only when there is a certain connection or probability between the elements of correlation. It is generally expressed as r. Generally, an absolute value of r is greater than 0.95 represents the presence of a significant correlation, and an absolute value of r is greater than 0.80 is highly correlated. Highly correlated environmental factors are highly likely to be over-fitted, which will increase the AUC value in prediction, so correlation analysis and screening of environmental variables should be performed.
As shown in Table 1

Future Climate Scenario Data
In this paper, the distribution of sorghum under future climate scenarios is modeled using two GHG emission scenarios, the medium GHG emission scenario (RCP4.5) and the highest GHG emission scenario (RCP8.5

Model Simulation and Evaluation
In this paper, the Maxent model was selected to predict the sorghum fitness distribution under different climate patterns. The model has the advantages of simple modeling, accurate prediction, and high stability, and is widely used in several research areas.
Research related to species distribution models has developed rapidly in recent years, and several distribution prediction models that are currently widely used are mainly as follows. First is the bioclimatic (Bioclim) model [16], the Bioclim model as the earliest species distribution model, the early application of the MaxEnt model has a great relevance [17], the disadvantage of this model is that it is only suitable for some species and has limitations for some species biological categories, the advantage is that the simulation results are more accurate in the case of specific ecological amplitude and environmental characteristics [17] [18]; followed by the regional environmental (Domain) model [19], which has the disadvantage of requiring a high level of specialized knowledge, requiring subjective judgment thresholds, and low requirements for objectivity, leading to unstable accuracy of simulation results; followed by the genetic rule set (GARP) model: the disadvantage of the GARP model is its high sample size requirements and poor simulation results [20]; CLIMEX is a climate specific tool that assesses region-specific adaptation of target species in terms of climate change and predicts potential distribution, climate similarity and seasonal phenology [21]. The MaxEnt model helps one to adapt environmental variables such as land cover, distance and geographical factors and to evaluate the contribution of each variable [22]. The MaxEnt model has more advantages than several traditional species distribution models, which are based on certain algorithms to project the ecological requirements of species and combine different climate scenario models to make scientific predictions of suitable species distribution areas, with high objective accuracy of the prediction results and without restricting species categories, which can be predicted with less data on sample points [13].
In this study, MaxEnt 3.4.1 maximum entropy model prediction software was used to model the data and ArcGis software was used to analyze the data. Max-Ent software was used to load the sorghum sample point distribution data in CSV format and the processed environmental factor data, and the proportion of distribution points in the test set was set to 25% (testing data) and the propor-  [11]. In this study, the probability value P (P = 0.32) was used as a threshold [24] to classify sorghum habitats as highly suitable (P ≥ 0.5), suitable (0.32 < P < 0.5), and non-suitable (P < 0.32) [11].

Current Potential Habitat Distribution of Sorghum
As shown in Figure 2

Analysis of Important Factors Affecting Potential Sorghum Habitat Distribution
As shown in Figure 3, the results of the knife-cut test showed that Bio2 (monthly mean diurnal temperature difference), Bio6 (minimum temperature in the coldest month), Bio13 (rainfall in the wettest month) and Bio14 (rainfall in the driest month) were prominent in the gain of the tested variables.
As shown in Figure 4, the monthly mean diurnal temperature difference is the sum of the diurnal difference in daily temperature for a given month divided by the number of days. The response curves of the monthly mean diurnal temperature difference and the probability of existence are as follows: The results show that the probability of existence remains at a certain level when the monthly  between precipitation and temperature, when precipitation is not higher than 180 mm and the minimum temperature is greater than minus ten degrees, the probability of existence of the genus is greater than 0.32, meeting the minimum fitness conditions. When the rainfall in the driest month was between 10 mm and 52 mm, the probability of existence was greater than 0.5, and the fitness probability was high.
As shown in Table 2, the 13 environmental variables were ranked in descending order according to the contribution and importance of the variables in the output results. The top four were rainfall in the wettest month, rainfall in the driest month, the ratio of diurnal temperature difference to annual temperature difference, and minimum temperature in the coldest month, and these four environmental variables contributed 86.2% to the model, accounting for 29.5%, 27.5%, 27.5%, and 14.2%, respectively. The contribution of precipitation to the model was higher than the temperature-related variables, and the environmental factors that contributed less than 1% were altitude 0.8%, Bio5 maximum temperature in the hottest month 0.8%, Slope slope 0.8%, Bio15 rainfall variance 0.5%, and slope direction 0.3%, which shows that the importance of temperature and humidity on the distribution of suitable areas for sorghum is much greater than the influence of topographic factors on suitable areas This shows that temperature and moisture have a much greater impact on the distribution of suitable areas for sorghum than topographic factors on the suitable areas.

Changes in Spatial Distribution Patterns of Sorghum under Climate Change
The fitness results under the four climate models were reclassified using ArcGIS

Changes in Sorghum Range Area under Climate Change
Using ArcGIS10.8 to rank future sorghum suitable habitats according to the (P > 0.32) criteria, the changes in the area of suitable areas and the percentage of them were counted, and Table 3

Sorghum Distribution in Relation to Environmental Factors
In the future, with global warming, the sorghum suitability pattern changes significantly and the area of suitability increases significantly. In this study environmental data as an important factor influencing sorghum fitness distribution, temperature, humidity, and topographic data all have an impact on the geographic distribution of sorghum. The ranking of contribution and importance showed that rainfall was more important in the wet and dry months, while the results of the knife cut test showed that the temperature factor was more important. In this case, the wettest month has precipitation between 130 mm and 410 mm for sorghum growth. When the precipitation is not higher than 180 mm and the minimum temperature is greater than minus ten degrees, the genus has a probability of existence greater than 0.32 and meets the minimum fitness conditions. When the rainfall in the driest month was between 10 mm -52 mm the probability of existence P value was greater than 0.5 and the probability of fitness was high. The presence probability is high when the monthly average value of diurnal temperature difference is less than 69, and the presence probability is highest when the minimum temperature is up to 17˚C in the coldest month.
Drought and flood tolerance as characteristics of sorghum are sensitive to both temperature and moisture. The results of this study showed that temperature and precipitation environmental variables contributed 86.2% to the model, with rainfall in wet and dry months affecting sorghum habitat distribution by as much as 57%, fully demonstrating that sorghum is heat tolerant but not cold tolerant, and in the selection of habitat, try to avoid places with high low temperatures and humidity.

Accuracy Evaluation of Simulation Results
This study used sample data from the Chinese Natural Herbarium and Botanical Library combined with the MaxEnt model ecological niche modeling to establish a predictive map of the distribution of sorghum suitability zones across the country. A comprehensive analysis of the ecological characteristics affecting sorghum was conducted and the distribution of sorghum suitability areas was obtained visually. The maximum entropy model was validated by ROC curve analysis, and the ROC curve was relatively close to 1. The AUC value for the training model dataset was 0.881, and the AUC value for the test dataset was 0.841, indicating good prediction results.
In this study, nineteen climatic factors and three topographic factors were used to model the main effects of climate change on the distribution of sorghum in China. However, the conditions of species present are quite complex and there are likely to be some environmental factors of species presence that we do not know at present. This study has not yet considered the effects of soil, water quality, and community environment elements on sorghum growth and some stochastic factors. From the analysis of the modeling results, it was determined that the habitat of sorghum suitable for growth is similar to that of known sorghum, but this determination is not absolute and does not necessarily mean that sorghum exists in this area. Environmental factors and climatic conditions are not static, and the survival dynamics of any one species can change. In addition, the prediction results may vary depending on the climate scenario model selected [27]. In summary, multiple realistic factors need to be fully considered in future studies to make the prediction results more accurate.