An Enhanced U-Net Model for Large-Scale Landslide Prediction Using Multi-Source Remote Sensing Data and Physical Risk Assessment ()
1. Introduction
Landslides, particularly those triggered by rainfall, represent a significant hazard to human safety, infrastructure, and ecosystems, especially in mountainous and hilly regions [1]. These events are typically precipitated by intense or prolonged rainfall, which decreases the shear strength of the soil, leading to slope instability and failure. As rainfall-induced landslides account for a substantial portion of natural disasters worldwide, understanding the complex mechanisms driving these events and developing accurate predictive models is critical for mitigating their impact and enhancing disaster response strategies [2]. Despite advances in the field of landslide prediction [3] [4], several challenges persist, particularly in the accurate forecasting of landslide occurrences, timing, and magnitude across large spatial and temporal scales.
Traditional landslide prediction models primarily rely on physical models [5] or data-driven methods [6], particularly machine learning [7] [8]. Such methods simulate the underlying geological processes of slope destabilization and provide a more comprehensive understanding of landslide mechanisms [9]. Recent studies have integrated static environmental factors (such as slope steepness and soil type) with dynamic variables [10] [11], such as precipitation and soil moisture [12], to improve landslide prediction. Notably, soil moisture plays a critical role in the initiation of landslides, yet its real-time monitoring is challenging at a fine granularity over a wide range of scales [13].
In recent years, advancements in remote sensing technologies and machine learning have opened new avenues for improving landslide prediction [14]. Remote sensing tools like InSAR, LiDAR, and optical imagery enable the collection of high-resolution spatial data [15]. Example InSAR is particularly effective for detecting landslide movement and surface displacement [16], but its resolution limitations and Risk to atmospheric noise can hinder its effectiveness in capturing fine-grained changes in slope behavior. To overcome these challenges, multi-sensor data fusion techniques, which combine InSAR with optical imagery or pixel offset tracking (POT) [17], have shown promise in enhancing predictive accuracy. And combining remote sensing data with machine learning models, such as Random Forest (RF) or Generalized Additive Models (GAM) [18], has shown promise in predicting landslides with better accuracy. However, the need for large, high-quality datasets and the computational complexity of these models remain barriers to their practical application.
In parallel, deep learning models, especially convolutional neural networks (CNNs) [19] [20], have demonstrated exceptional capabilities in handling large-scale, complex datasets. These models are particularly adept at identifying intricate spatial and temporal patterns in data [21], such as rainfall, terrain features, and soil moisture. By leveraging deep learning techniques, particularly in combination with multi-source remote sensing data [22], researchers have achieved improvements in landslide prediction accuracy over traditional methods. The integration of deep learning with remote sensing datasets such as InSAR, LiDAR, and optical imagery [23] has allowed for the joint modeling of landslide frequency, location, and magnitude. However, a major limitation remains: most existing approaches still rely on post-event landslide data for prediction, rather than proactively identifying regions at Risk before landslides occur.
This gap in predictive capabilities is particularly evident in large-scale, real-time monitoring of rainfall-induced landslides, where the need for accurate, high-resolution, and timely predictions is critical. To address this, we propose a novel approach that integrates multi-source remote sensing data, physical modeling, and deep learning techniques to predict and automatically extract potential rainfall-induced landslide areas prior to their occurrence. This approach aims to bridge the gap between existing models, which often rely on retrospective analysis, and proactive, large-scale landslide Risk management.
Specifically, this study introduces a comprehensive dataset that incorporates six key channels representing critical factors for landslide occurrence. This dataset is paired with a physical landslide Risk assessment model that calculates the landslide Risk index based on remote sensing data. The Risk index, alongside the dataset, is then used to train an enhanced U-Net model for landslide prediction and automatic extraction of landslide-prone areas.
1.1. Technological Innovations
1. Multi-source Remote Sensing Dataset: A novel dataset comprising six channels, including soil moisture, building index, water body index, vegetation index, slope, and elevation, has been developed to better capture the dynamic environmental factors influencing rainfall-induced landslides. This multi-faceted dataset provides a more comprehensive view of the conditions that lead to landslide Risk.
2. Landslide Risk Assessment Model: The study introduces an integrated approach combining remote sensing indices with physical modeling to compute a landslide Risk index, which serves as a robust predictor of landslide Risk. It provides a dynamic and real-time assessment of landslide Risk, enabling more timely and accurate prediction.
3. Enhanced U-Net Model: The use of an enhanced U-Net convolutional neural network, designed to capture more detailed spatial features from the multi-source dataset. The model’s architecture incorporates a Hybrid Convolutional and a Multi-Channel Spatial Attention (MCSA) designed, enhancing its ability to handle complex, heterogeneous data and enabling the automatic extraction of potential landslide areas from Risk maps and image.
1.2. Application Innovations
1. Automated Extraction of Landslide-Prone Regions from Risk Maps: Conventional methods for landslide Risk mapping require manual delineation of high-Risk zones by researchers or decision-makers, a process that is both time-consuming and subjective. Our study overcomes this limitation by incorporating an automated extraction framework that identifies potential landslide regions directly from the Risk distribution maps generated by our remote sensing-based Risk assessment model. This automation eliminates human intervention in the delineation process, ensuring consistent and reproducible results. Furthermore, it allows for the application of our methodology at large scales, facilitating rapid and large-area monitoring of landslide Risk with minimal labor.
2. Adaptability to Diverse Geographic and Rainfall Conditions: The flexibility and adaptability of the proposed model are vital application innovations that allow it to be used across diverse geographical regions and varying rainfall conditions. The enhanced U-Net model, which is trained on a multi-source remote sensing dataset, can be fine-tuned to different landscapes, including mountainous, coastal, and urban environments. This adaptability ensures that the landslide prediction system can be applied to a wide range of regions with different topographies, precipitation patterns, and soil types, making it a versatile tool for global landslide Risk monitoring and management.
In doing so, we aim to provide a more robust tool for early warning systems and offer a more effective means of mitigating the risks associated with these natural disasters.
2. Methods
2.1. Multi-Source Remote Sensing Data Collection and
Construction
The dataset employed in this study is composed of multi-source remote sensing data collected from a variety of satellite platforms, ground-based sensors, and UAVs, with the aim of capturing a comprehensive set of environmental factors that influence landslide occurrence. The dataset spans nearly a decade, incorporating data from the ALOS PALSAR, Sentinel-1, Sentinel-2, Gaofen-2 satellites, as well as drone-based imaging and ground-based radar systems. This diverse array of data sources ensures that the dataset provides a robust and comprehensive representation of the environmental conditions surrounding landslides, allowing for more accurate and generalizable landslide prediction models.
To ensure data diversity and robustness, we compiled nearly 14,000 representative images through manual review and screening of real satellite imagery from the past 10 years. These images include those captured during pre-landslide conditions, specifically between 1 to 5 days before the occurrence of a landslide, alongside a set of 2,000 images that represent periods devoid of landslides. The images are uniformly sized at 128 × 128 pixels with a 10-meter spatial resolution. Of the labeled targets, 60% correspond to small landslides, with the remaining 40% divided between medium and large landslides (Figure 1). This distribution of landslide sizes ensures that the dataset covers a range of landslide events, improving the model’s ability to generalize across different magnitudes of landslides.
Figure 1. Landslide target data statistics.
These images were processed to derive key remote sensing indices that reflect critical environmental factors influencing landslide occurrence. Specifically, the dataset includes indices for soil moisture, building, water body, and vegetation, as well as slope and Digital Elevation Model (DEM) elevation data. This results in a comprehensive six-channel dataset, each channel representing a distinct environmental factor, forming the basis for further analysis.
These channels are as follows:
Soil Moisture Index (SMI): Soil moisture plays a crucial role in determining soil stability, as it directly affects the shear strength of the soil and contributes to slope failure under saturated conditions. The Soil Moisture Index (SMI) in this dataset is derived from the SMOS (Soil Moisture and Ocean Salinity) mission and the Inversion Modeling LPRM Algorithm in AMSR-2, with supplementary data obtained through interpolation where necessary. The inversion process incorporates rainfall information to generate reliable estimates of soil moisture, following methodologies based on rainfall threshold partitioning, as outlined in existing literature [24]. Figure 2(a) visualizes the SMI values, highlighting areas with varying soil moisture content that are essential for landslide prediction.
Normalized Difference Built-Up Index (NDBI): The NDBI is used to capture urban structures that may influence slope stability by altering the natural terrain or contributing to drainage issues, Figure 2(b). The index is computed using satellite imagery and remote sensing data. As shown in equation (1).
(1)
Where:
1) SWIR is the short-wave infrared band;
2) NIR is the near-infrared band.
Normalized Difference Wetland Index (NDWI): The index identifies the presence of lakes, rivers, or other water bodies in the region, which can contribute to landslide hazards by influencing groundwater levels or eroding slopes, Figure 2(c). As shown in equation (2).
(2)
Where: Green is the green band.
Normalized Difference Vegetation Index (NDVI): The vegetation index reflects the presence of vegetation that can stabilize the soil and reduce landslide Risk, Figure 2(d). As shown in equation (3).
(3)
Where: Red is the red band.
Slope Information: Slope steepness is a critical factor in landslide Risk, with steeper slopes being more prone to failure. Slope information is derived from the Digital Elevation Model (DEM), which provides precise topographic data of the study area. The DEM is processed to calculate the gradient at each pixel. Figure 2(e) provides a visualization of slope values, emphasizing areas with particularly steep terrain that are more likely to experience landslides.
Digital Elevation Model (DEM): DEM data is used to create detailed topographic maps that help to identify regions of varying elevation, which contribute to slope instability under certain conditions. Figure 2(f) displays the topographic relief information from the DEM, which serves as an essential layer for landslide Risk modeling.
Figure 2 shows the visualization information of three of the channels.
Figure 2. Channels visualization.
Each of these six data channels is aligned and pre-processed to ensure that all images are of the same spatial resolution (10 meters) and are temporally synchronized for accurate landslide prediction.
2.2. Landslide Risk Index (LSI) Assessment Model
The Landslide Risk Index (LSI) is a critical component of our model, designed to quantitatively assess and map the potential landslide Risk across a given area based on multi-source remote sensing data. By integrating rainfall and various surface feature factors—such as DEM, slope, NDVI, NDBI, NDWI, and SMI—the LSI model provides a comprehensive and dynamic Risk assessment that accounts for the intricate relationships between these environmental variables.
The LSI assessment draws on methodologies presented in previous literature [25], where weights are assigned to landslide-inducing factors based on their relative importance in contributing to slope instability. This process involves determining the contribution of each factor to the overall landslide Risk through a weighted sum, with the specific weight of each factor adjusted according to its influence on landslide occurrence. These weights are a crucial aspect of the LSI formula, allowing for more accurate Risk prediction. The formula used to compute the LSI is as follows:
(4)
Where:
1) LSI is the landslide Risk value.
2)
is the Sigmoid function, which is used to map linear combinations to the [0,1] interval.
3)
is Rainfall multiplication factor.
4)
are the weights for each feature.
The Rainfall Multiplication factor (
) be combined with the LSI model as a weighting factor to dynamically adjust the landslide Risk assessment. It can be calculated using the following formula (5):
(5)
Where:
1) P is the near real-time rainfall intensity (unit: mm/h).
2) D is the duration of rainfall (unit: days).
3)
and
are the reference rainfall intensity and reference rainfall duration, respectively.
4) α and β are regulatory factors used to adjust the relative impact of rainfall intensity and duration on landslide Risk. These factors are obtained through fitting historical data.
2.3. Architectural Overview
The overall architecture can be summarized as follows:
1) Data acquisition. Obtain relevant surface feature data.
2) Feature normalization: Use Min Max Scaler to normalize features to ensure consistent contribution of each feature to landslide Risk calculation. Analyze the correlation between features by reducing dimensionality through Principal Component Analysis (PCA).
3) Landslide Risk calculation: Calculate the landslide Risk value through weighted summation based on the set weights.
2.4. Proposed Model Architecture
In this study, we propose an enhanced U-Net model for landslide prediction, aimed at automatically identifying and extracting landslide-prone regions from multi-source remote sensing data. The model leverages a combination of high-resolution spatiotemporal datasets, LSI, Remote sensing index and deep learning techniques to predict rainfall-induced landslides in real-time, enabling proactive landslide Risk management.
The U-net architecture [26] is particularly well-suited for semantic segmentation tasks [27]-[29], where the goal is to classify each pixel in an image according to whether it corresponds to a landslide or not. This section outlines the methodology for developing and implementing the enhanced U-Net model, as shown in Figure 3.
Figure 3. Overall framework diagram of the model.
The improvements to the standard U-Net architecture in this study include: A) a hybrid convolutional approach combining standard convolution and dilated convolution, and B) a Multi-Channel Spatial Attention (MCSA) mechanism designed to optimize the feature learning process in the multi-dimensional feature space.
2.4.1. Hybrid Convolutional U-Net Backbone
One of the primary challenges in this paper is effectively capturing multi-scale contextual information while preserving high-resolution spatial details. To address this, In the hybrid convolutional architecture, we combine two types of convolution operations (Figure 3(A)): standard convolution and dilated convolution [30] [31], each of which operates with different receptive fields.
Standard Convolution (Conv_s): The standard convolution operation can be mathematically described as (6) and (7):
. (6)
. (7)
Where:
1) H and W are the height and width of the input feature matrix.
2)
and
are the height and width of the output matrix after convolution.
3) K is the size of the convolution kernel, usually taking values of 3, 5, or 7. The range of the input feature matrix covered by K at one time is referred to as the receptive field of the convolution.
4) S is the convolution stride, which specifies the step size or interval at which the convolution kernel moves over the input feature matrix.
5) P is the padding unit, used to transform the areas of the input feature matrix that do not satisfy the size of the convolution kernel into multiples of the convolution kernel size.
Dilated Convolution (Conv_d): Dilated convolution increases the receptive field by introducing gaps (dilations) between the kernel elements. The dilated convolution operation can be formulated as (8):
. (8)
Where:
1)
is the size of the dilation convolution kernel.
2) r is dilation factor, r expands the kernel by
zeros between the kernel elements.
The dilation operation increases the receptive field by covering a larger area without changing the kernel size, which allows the model to capture contextual information at multiple scales.
Hybrid Convolution (Conv_h): As shown in A in Figure 2. The hybrid convolution combines the outputs of both the standard and dilated convolutions. This can be mathematically expressed as (9):
. (9)
Where:
1)
is the output from the standard convolution.
2)
is the output from the dilated convolution.
This hybrid convolution approach allows the model to effectively capture both local and global contextual information, crucial for accurate landslide prediction in remote sensing imagery.
2.4.2. Multi-Channel Spatial Attention (MCSA)
The Multi-Channel Spatial Attention (MCSA) mechanism aims to focus on the most relevant features in both the channel and spatial domains, optimizing the model’s learning process (Figure 3(B)). Its ideas are derived from the literatures [32] [33].
Channel Attention [34]: The channel attention mechanism calculates a weight map for each channel to highlight important channels and suppress irrelevant ones [35] [36]. Given the input feature map
(with C channels, H height, and W width), the channel attention mechanism computes the channel-wise attention map
, which is given by (10):
(10)
Where:
1) Pool is a pooling operation, which includes both max pooling and average pooling.
2) FC denotes the fully connected layer, The first FC layer reduces the dimensionality, while the second FC layer restores it.
3) MLP is shared multilayer perceptron.
4) Sigmoid normalizes the attention weights to the range [0,1].
The output feature map after applying channel attention is (11):
. (11)
Where:
1)
denotes element-wise multiplication.
The specific process of Channel Attention is shown in Figure 4.
Figure 4. Channel attention structure diagram.
Spatial Attention [37]: The spatial attention mechanism generates a spatial attention map
, which highlights important spatial locations in the feature map [38] [39]. It can be computed as (12):
(12)
Where:
1)
and
are convolution operations applied to the feature map to generate a spatial attention map,
2)
normalizes the spatial attention map to the range [0,1].
The output feature map after applying spatial attention is (13):
. (13)
The specific process of Spatial Attention is shown in Figure 5.
Figure 5. Spatial attention structure diagram.
Multi-Channel Spatial Attention (MCSA): The final step in the MCSA mechanism combines both the channel attention and spatial attention mechanisms. As shown in equation (14). The combined attention map is given by the element-wise addition of
and
:
. (14)
The final output feature map after applying the MCSA mechanism is (15):
. (15)
This MCSA mechanism allows the model to focus both on the most relevant channels and spatial locations, improving its ability to capture the most important features related to slope instability in multi-dimensional remote sensing data.
2.5. Architectural Overview
The overall architecture can be summarized as follows:
1) Input Layer: The input feature map consists of seven channels (SMI, NDBI, NDVI, NDWI, Slope, DEM, and LSI).
2) Encoder: The encoder is composed of convolutional layers, which include Hybrid Convolution to extract multi-scale features.
3) Decoder: The decoder reconstructs the output feature map from the encoded features, utilizing upsampling and deconvolution layers.
4) Output Layer: The final output is the predicted slope instability map, where each pixel represents the likelihood of slope instability at that location.
Although the LSI is statistically derived from the other six factors, it serves as a high-level prior that summarizes regional landslide risks. Incorporating both the images, raw features and the LSI enables the model to simultaneously learn fine-grained patterns from raw data and leverage holistic risk information, enhancing its ability to detect subtle landslide-prone subregions within broader high-risk zones. This design also aligns with prior-guided learning frameworks and does not result in harmful data redundancy, as the model can learn to balance and weigh each input feature accordingly.
3. Experimentation and Analysis
Our experimental environment consists of a 12th Gen Intel® CoreTM i5-12400F 2.5 GHz processor, 32 GB DDR4 3200 MHz memory, and an NVIDIA GeForce RTX 3060 Ti (12 GB) graphics card, running on a Windows 11/64 system with CUDA 10.2, Python 3.8, and PyTorch 1.7 software.
3.1. Model Parameters and Evaluation Metrics
The description of all parameters used in the experiment is shown in Table 1. The weights used in the LSI calculation were determined through a combined method of Spearman’s rank correlation analysis and domain expert consultation [40]-[42]. Initially, the correlation between each environmental variable and known landslide occurrences was quantified using Spearman’s coefficient to capture monotonic trends. The resulting values informed a preliminary ranking of factor importance. These were then adjusted through expert consultation to better reflect regional geological characteristics. For example, slope and DEM were emphasized for their direct physical relevance, while NDVI was negatively weighted to account for vegetation’s stabilizing effects. This integrative approach ensures that the final weights capture both statistical associations and practical domain insights.
Table 1. Parameter description.
Parameter name |
Explanation |
Value |
epochs |
Number of epochs |
50 |
batch-size |
Batch size |
16 |
lr |
Learning rate |
1e-5 |
classes |
Number of classes |
2 |
r |
dilation factor |
2 |
|
Weight of slop features |
0.25 |
|
Weight of DEM features |
0.15 |
|
Weight of NDVI features |
−0.2 |
|
Weight of NDWI features |
0.2 |
|
Weight of SMI features |
0.1 |
|
Weight of NDBI features |
0.1 |
|
Rainfall intensity adjustment factor |
1.2 |
|
Duration adjustment factor for rainfall |
0.8 |
Train: Val: Test |
The proportion of the dataset used for
training, validation, and testing. |
6:2:2 |
In this study, to comprehensively evaluate the model performance, we adopted a variety of common evaluation metrics [43] [44], including the Receiver Operating Characteristic curve (ROC), Area Under the Curve (AUC), Mean Squared Error (MSE), Dice coefficient, and Frames Per Second (FPS). The following is a detailed explanation and formula for each metric.
3.1.1. Receiver Operating Characteristic Curve (ROC)
The ROC curve assesses the performance of a binary classification model by plotting the relationship between the true positive rate (TPR) and the false positive rate (FPR) at various threshold settings. TPR and FPR are defined as follows (16) and (17):
(16)
. (17)
Where:
1)
denotes true positives,
2)
denotes true negatives,
3)
denotes false positives,
4)
denotes false negatives.
3.1.2. Area Under the Curve (AUC)
AUC is the area under the ROC curve, indicating the model’s ability to distinguish between positive and negative classes. The AUC value ranges from 0 to 1, where a higher AUC value signifies a stronger ability of the model to differentiate. When the AUC value is 1, the model can perfectly distinguish between positive and negative samples; when the AUC value is 0.5, the model’s performance is equivalent to random guessing.
3.1.3. Mean Squared Error (MSE)
MSE is used to measure the difference between predicted values and actual values and is a commonly used loss function in regression tasks. The formula is as follows (18):
(18)
Where:
1)
is the true value of the sample,
2)
is the predicted value of the model,
3) N is the number of samples.
The smaller the MSE value, the smaller the prediction error of the model, indicating better performance.
3.1.4. Dice Coefficient
The Dice coefficient (also known as the F1 score) is used to evaluate the performance of binary segmentation models, especially in image segmentation tasks. Its formula is as follows (19):
(19)
The value of Dice is between 0 and 1, with a higher value indicating that the segmentation result is more similar to the true label.
3.1.5. Frames Per Second (FPS)
FPS measures the model’s detection speed, indicating the number of image frames the model can process per second. The higher the FPS, the better the model’s real-time performance. The calculation formula is (20):
(20)
These evaluation metrics provide a comprehensive performance assessment, covering aspects such as the accuracy, precision, and speed of the model, offering a multi-faceted evaluation of the model’s performance in this study.
3.2. Experimental Research Area
The study area for this experiment is located in Hongya County, Meishan City, Sichuan Province, China, a region characterized by complex terrain and frequent landslide occurrences [45] [46]. This area, situated within the latitude and longitude bounds of [102.8˚E, 29.5˚N] and [103.5˚E, 30.0˚N] (As shown in Figure 6(a)), is highly vulnerable to natural disasters, particularly landslides, due to its steep slopes and heavy rainfall (As shown in Figure 6(d), Data date: June 2024-September 2024). The selected study site, delineated as a specific patch from remote sensing imagery, encompasses diverse land cover types such as forests & farmlands, urbanized areas, and water areas (As shown in Figure 6(c)). The availability of high-resolution satellite images and geographic data, including Digital Elevation Models (DEM) (As shown in Figure 6(b)) and land use information, offers a valuable opportunity to apply advanced remote sensing techniques for landslide prediction. Understanding the spatial distribution of landslide-prone areas in this region is essential for disaster Risk management and land-use planning, as well as for developing strategies to mitigate the impacts of landslides on local communities.
![]()
Figure 6. Analysis map of the study area.
3.3. LSI Assessment Model Validation
In this section, we present the validation of the LSI assessment model using a comprehensive set of real-world data. The idea is the same as that of the Landslide Condition Factor (LCF) correlation matrix implementation in the literature [47]. The model was quantitatively assessed through key performance metrics. Then, apply the model to the study area in Hongya County and perform visual analysis.
First, split all feature information in the model, conduct correlation analysis on them, and determine if there are redundant features, that is, if two factors are highly correlated. This experiment uses a heatmap for display, as shown in Figure 7.
From this, it can be concluded that the correlation between various feature factors is not high (max = 0.32), and there is no need for feature reduction. One striking piece of data is the low correlation between DEM and slope: through the process of collecting a large amount of data, we found that areas with high DEM values do not necessarily have steep slopes. Also, based on the idea of literature [48], we need to do validation experiments from different dimensions.
Figure 7. Feature correlation matrix.
The summary of the evaluation results based on the test data is shown in Table 2. In this comparative experiment, we compared two forms of the LSI model: 1. The LSI model without the use of a rainfall multiplication factor (LSI without Rainfall); 2. The LSI model with a dynamic rainfall multiplication factor (LSI). To further verify the advantages of the LSI model, we selected several common landslide Risk assessment models for comparison (such as SVM model, RF model, XGBoost model, GAMM).
The LSI with Rain model demonstrated the best AUC, indicating its strongest discriminative ability in classification tasks. In terms of MSE and Accuracy, LSI with Rain performed similarly well as XGBoost, with lower error and higher accuracy. This further confirms the effectiveness of the model in predicting landslide Risk.
Table 2. Evaluation results.
Model |
Accuracy |
AUC |
MSE |
SVM [7] |
0.870 |
0.902 |
0.222 |
Random Forest [7] |
0.904 |
0.926 |
0.180 |
XGBoost [9] |
0.963 |
0.965 |
0.166 |
GAMM [6] [10] |
0.955 |
0.958 |
0.169 |
LSI without Rainfall |
0.935 |
0.956 |
0.171 |
LSI |
0.960 |
0.968 |
0.163 |
Next, the LSI assessment model was applied to the study area. The model outputted a landslide Risk map, visualizing the spatial distribution of landslide Risk across the region. The results were interpreted by analyzing areas with high Risk, which were primarily located in steep slopes and regions with heavy rainfall. Figure 8 shows the visualized Risk map for the study area, with Risk levels classified as low, medium, and high.
Figure 8. LSI assessment visualization in the study area.
The Risk assessment model can only roughly indicate the landslide-prone areas, lacking a more precise expression [48]. Therefore, this study further integrates it into a landslide prediction segmentation model based on deep learning and remote sensing images.
3.4. Improved U-Net Model Ablation Study
In this section, we present the experimental analysis using a landslide area dataset based on remote sensing indices to evaluate the efficiency of the improved U-net.
3.4.1. Hybrid Convolution Analysis
We combined two types of convolution operations: standard convolution and dilated convolution. This allows for effectively capturing multi-scale contextual information while maintaining high-resolution spatial details. Given that different convolution operations can have varying impacts on the network, we designed two sets of experiments to analyze the effectiveness of the model.
As shown in Table 3, this section conducts a comparative analysis of various convolution operations. It can be seen that the combination of standard convolution and dilated convolution has significant advantages, with a 2.3% increase in Dice compared to a single standard convolution.
Table 3. Hybrid convolutional ablation study results.
Convolutions |
Dice (%) |
Loss |
FPS (imgs) |
Standard Convolution [49] |
62.3 |
0.034 |
22.98 |
Dilated Convolution [50] |
63.3 |
0.028 |
22.74 |
Grouped Convolution [51] |
62.0 |
0.052 |
23.06 |
Standard + Dilated |
64.6 |
0.022 |
22.83 |
Standard + Grouped |
62.9 |
0.040 |
22.90 |
Dilated + Grouped |
63.8 |
0.031 |
22.88 |
The combination of two types of convolutions will inevitably involve the allocation of weight sizes. Table 4 conducts comparative experiments with different weight allocations, and the experimental results show that when both weights are set to 0.5, the model achieves the highest Dice score.
Table 4. Weight distribution results.
|
|
Dice (%) |
Loss |
0.3 |
0.7 |
63.8 |
0.031 |
0.4 |
0.6 |
64.2 |
0.028 |
0.5 |
0.5 |
64.6 |
0.022 |
0.6 |
0.4 |
63.5 |
0.034 |
0.7 |
0.3 |
63.0 |
0.039 |
In summary, this hybrid convolution approach allows the model to effectively capture both local and global contextual information, crucial for accurate landslide prediction in remote sensing imagery.
3.4.2. MCSA Analysis
We combined two types of attention mechanisms: channel attention and spatial attention. This allows the model to simultaneously focus on the most relevant channels and spatial locations. Two sets of experiments were also designed to analyze the effectiveness of MCSA.
As shown in Table 5, this section has conducted corresponding ablation experiments for each type of attention, verifying that MCSA has good performance, with an improvement of 2.1% compared to the model without the attention mechanism. The decrease in the model’s detection speed is not significant, indicating that the model has a good balance between accuracy and detection speed.
Table 5. MCSA ablation study results.
Attentions |
Dice (%) |
Loss |
FPS (imgs) |
NULL (U-net+ Hybrid Convolution) |
64.6 |
0.022 |
22.83 |
Channel Attention [52] |
64.9 |
0.019 |
22.02 |
Spatial Attention [53] |
65.1 |
0.018 |
21.77 |
Channel + Spatial |
66.7 |
0.016 |
21.66 |
To more intuitively demonstrate the effectiveness of the model, we visualized the prediction results of the images, as shown in Figure 9.
Figure 9. LSI assessment visualization in the study area.
We selected five typical landslide-prone scenarios: (a) a hillside stream flowing from top to bottom. (b) a mountainous road winding up the hillside. (c) along the riverbank. (d) a cliff-like terrain. (e) terraced fields with varying elevations on the hillside.
The results show that the improved U-net (Ours) has good continuity compared to the U-net model and is able to more accurately locate areas that are about to landslide in complex environmental contexts.
3.4.3. Comparative Experiment
In this section, we present a comprehensive comparison between the proposed Enhanced U-Net (Ours) model and several state-of-the-art deep learning-based landslide prediction models, including U-Net, MobileNetV2, DeepLabV3+, SegNet, and DRs-Unet. The comparison is conducted on a large-scale multi-source remote sensing dataset. The detailed comparison results are presented in Table 6.
Table 6. Performance comparison of different landslide prediction models.
Models |
Dice |
Loss |
FPS |
Accuracy |
Precision |
Recall |
Parameters (millions) |
U-net |
62.3 |
0.034 |
22.98 |
0.982 |
0.80 |
0.71 |
14 |
MobileNetV2 [37] |
59.7 |
0.035 |
25.23 |
0.968 |
0.76 |
0.70 |
13 |
DeepLabV3+ |
65.9 |
0.021 |
20.15 |
0.980 |
0.88 |
0.74 |
58 |
SegNet |
63.6 |
0.036 |
21.62 |
0.979 |
0.86 |
0.72 |
24 |
DRs-Unet [27] |
65.3 |
0.026 |
20.92 |
0.983 |
0.85 |
0.73 |
47 |
Model [54] |
66.0 |
0.019 |
22.45 |
0.980 |
0.85 |
0.77 |
23 |
Ours |
66.7 |
0.016 |
21.66 |
0.988 |
0.86 |
0.79 |
15 |
The results of the comparative experiment demonstrate that the proposed Enhanced U-Net (Ours) model significantly outperforms existing state-of-the-art deep learning models in multiple critical aspects. The model excels in both segmentation accuracy and computational efficiency, making it highly suitable for real-time, large-scale landslide prediction in practical remote sensing applications. The Enhanced U-Net not only provides higher precision and recall but also achieves faster processing speeds, which are essential for timely landslide warnings. Thus, it represents a robust solution for proactive landslide Risk management using multi-source remote sensing data.
3.5. Application of Landslide Prediction and Risk Assessment in the Study Area
In the study area, we compare the performance of four different models for landslide prediction, each employing unique approaches and methodologies to address the dynamic and complex nature of landslide initiation. The models are evaluated based on their ability to predict landslide occurrence, with a focus on spatial accuracy, landslide sensitivity, and interpretability.
3.5.1. Dynamic Spatial Landslide Initiation Model [55]
This model introduces spatiotemporal information into data-driven prediction by dynamically adjusting spatial thresholds in response to evolving temporal patterns. As illustrated in Figure 10(a), the model predominantly highlights regions with high threshold values, indicating areas where landslides are most likely to occur. Although the model provides a clear prediction, its emphasis on high-threshold areas results in fewer predicted events, which may limit its sensitivity to smaller, less frequent landslides. Nonetheless, its strong predictive capability for large-scale, high-Risk zones is evident.
3.5.2. GAMM-Based Landslide Prediction Model [6]
This approach integrates Generalized Additive Mixed Models (GAMM), using separate binomial models for spatial and meteorological factors, which are then combined to generate the landslide prediction. As shown in Figure 10(b), this model is particularly effective at identifying high-frequency landslide zones, especially in regions prone to smaller events. The ability to model both spatial and meteorological factors allows the model to capture more nuanced variations, making it suitable for detecting localized, less severe landslides. However, the model’s performance in predicting larger, more complex events remains less robust compared to others.
3.5.3. Deep Learning-Based Rain-Induced Landslide Prediction Model [54]
This model utilizes deep learning techniques and supervised binary classification to distinguish between areas that can or cannot trigger rain-induced shallow landslides. The prediction probabilities visualized in Figure 10(c) demonstrate the model’s superior performance in capturing both the intensity and spatial distribution of landslides, outperforming both the dynamic spatial model (a) and the GAMM-based model (b). The deep learning approach excels in its ability to adapt to complex, non-linear patterns in the data, making it particularly effective for predicting landslides triggered by rainfall events.
3.5.4. Our Model
Our model builds upon the improved U-Net architecture, improving it for landslide prediction by incorporating both remote sensing data and terrain characteristics to generate landslide Risk assessment maps. As shown in Figure 10(d), our model not only offers superior continuity in prediction, compared to model (c), but also accurately identifies landslide-prone regions with higher spatial resolution. The key advantage of our model is its ability to learn both the sensitivity of remote sensing values to rainfall-induced landslides and the impact of different terrain types on landslide magnitude. Unlike model (c), which focuses primarily on rainfall-triggered events, our model captures the broader, more complex interactions between environmental variables and the subsequent scale of landslide disasters. This results in a more comprehensive and detailed Risk map, as evidenced by the larger and more complex areas identified in the prediction outputs.
In conclusion, while each model demonstrates strengths in specific areas, our model offers a more robust and holistic solution for landslide prediction. By integrating terrain sensitivity and environmental variability with advanced deep learning techniques, it provides more accurate, continuous, and interpretable results, making it a valuable tool for both prediction and Risk assessment in real-world applications.
Figure 10. Comprehensive landslide risk assessment visualization in the study area.
3.6. Application of the Model in Junlian Research Area
To further evaluate the robustness and generalizability of our proposed landslide prediction model, we conducted an additional set of experiments in Junlian County, Yibin City, Sichuan Province, China. This region, characterized by steep mountainous terrain and frequent rainfall, has recently experienced severe rainfall-induced landslides, making it an ideal location for further validation of our model’s capabilities.
The study area is located within the geographic coordinates of [104.60˚E, 28.02˚N] and [104.64˚E, 27.98˚N] (Figure 11(a)), encompassing diverse landforms such as agricultural fields, forests, and rural settlements, all of which are susceptible to landslides during heavy rainfall events (Figure 11(b)). The region’s complex topography and varying land cover types offer a challenging environment for landslide prediction, providing a valuable test for the effectiveness of the multi-source remote sensing data and the enhanced U-Net model developed in this study.
Figure 11. Overview of the study area.
The enhanced U-Net model was applied to the new study area to predict landslide-prone zones based on the multi-source dataset. The output of the model was a pixel-wise landslide prediction map (Figure 12(a)), which classified regions into high, medium, and low-Risk categories (Figure 12(b)). The prediction results for Junlian County showed a clear delineation of landslide-prone areas, with high-Risk zones primarily located in steep slopes and areas with significant vegetation loss, consistent with known historical landslide occurrences.
Figure 12. Predicted results chart.
Interestingly, the model not only identified the regions where landslides had occurred but also predicted additional high-Risk areas where no landslides had been observed historically (The yellow solid line marker in Figure 12). These areas, though free from landslides in the present dataset, exhibited high Risk due to factors such as steep terrain, soil moisture accumulation, and changes in vegetation cover. Such results are crucial as they highlight regions that, while not affected by landslides in the current period, are at heightened Risk in future events, particularly during periods of intense or prolonged rainfall.
4. Conclusions
This study presents a comprehensive framework for large-scale landslide prediction by integrating multi-source remote sensing data, physics-driven Risk modeling, and an enhanced deep learning architecture. The proposed Landslide Risk Index (LSI) model, which dynamically incorporates rainfall intensity and duration, achieved an AUC of 0.968 and an MSE of 0.163, outperforming traditional methods (e.g., XGBoost) in both discriminative capability and error reduction. The enhanced U-Net model further advances landslide segmentation accuracy through two key innovations: (1) a hybrid convolutional layer combining standard and dilated convolutions to capture multi-scale spatial features, and (2) a Multi-Channel Spatial Attention (MCSA) mechanism that prioritizes critical channel-spatial relationships. Experimental results demonstrate state-of-the-art performance, with a Dice coefficient of 66.7% and real-time processing at 21.66 FPS, surpassing benchmarks such as DeepLabV3+ (65.9% Dice) while maintaining computational efficiency (15M parameters).
Validations in Hongya and Junlian Counties, Sichuan Province, highlight the model’s practical value. Notably, the framework identified previously unrecognized high-Risk zones in Junlian (e.g., steep vegetated slopes with no historical landslides), underscoring its ability to support proactive Risk management.
Despite promising results in the selected study areas, the current validation is geographically constrained to regions within Sichuan Province, which are characterized by mountainous terrain and humid subtropical climate. As such, the model’s applicability to other regions with differing geological and environmental characteristics remains to be explored. In future work, we plan to conduct cross-regional validations and investigate adaptation strategies—such as transfer learning or dynamic recalibration of LSI weights—to enhance the model’s robustness and generalizability across diverse terrains.
5. Future Directions
Future research should focus on three key areas to enhance the model’s robustness and practicality:
Integration of Real-Time Data Streams: Leveraging emerging satellite missions (e.g., Sentinel-1’s radar data) and IoT sensors could enable dynamic updates of soil moisture and rainfall patterns, improving real-time prediction during critical rainfall events. Coupling these data with hydrological models (e.g., TRIGRS) may further refine physical Risk assessments.
Hybrid Physics-AI Frameworks: Combining the proposed deep learning architecture with physics-based landslide simulations (e.g., finite element slope stability models) could address data scarcity in unmonitored regions while enhancing interpretability. Techniques like physics-informed neural networks (PINNs) offer promising avenues for such integration.
Deployment and Generalizability Testing: Operational deployment in disaster management platforms requires addressing computational bottlenecks (e.g., model compression for edge devices) and validating generalizability across diverse geographies (e.g., tropical vs. alpine regions). Transfer learning and federated learning approaches could adapt the model to low-data environments.
6. Limitations
While the proposed enhanced U-Net model demonstrates strong performance in the selected regions, several limitations should be acknowledged:
(1) Data Dependency: The model heavily relies on the availability, resolution, and quality of remote sensing inputs (e.g., NDVI, SMI, DEM). In areas where such data are sparse, outdated, or affected by noise (e.g., cloud cover), the prediction accuracy may be significantly reduced. Moreover, the consistency of sensor types and preprocessing methods across datasets is critical to ensure model robustness.
(2) Transferability of LSI Weights: The Landslide Susceptibility Index (LSI) incorporates empirically derived weights calibrated for specific regional geological and environmental characteristics. Applying these weights directly to new regions without recalibration may lead to inaccurate risk estimation due to variations in terrain morphology, vegetation types, hydrology, and anthropogenic factors. Future work could explore adaptive or learning-based LSI weighting strategies to improve cross-region generalizability.
These limitations provide valuable insights for practical deployment and highlight directions for further research, such as data augmentation strategies and dynamic LSI adaptation frameworks.
Funding
This work supported by the Meishan Science and Technology Program (2024-KJZD137) and the Sichuan University Jinjiang College Youth Fund (QNJJ-2025-A07)