Spatio-Statistical Analysis of Flood Susceptibility Assessment Using Bivariate Model in the Floodplain of River Swat, District Charsadda, Pakistan

Flood is one of the most predominant disasters around the globe and frequently occurring phenomena in the northern part of Pakistan. In this study, the effects of various divisions of flood inventory and combinations of conditioning factors were assessed for the preparation of final susceptibility map. The flood inventory map was prepared for Charsadda by visual interpretation of Landsat-7 image alongside the field survey and a total of 161 flood locations were mapped. The flood inventory was subsequently divided into training and validation datasets, 129 (80%) and 112 (70%) locations for training the model and 32 (20%) and 49 (30%) for validation of the model. In this study, nine conditioning factors were used (Elevation, Slope, Aspect, Curvature, Plan curvature, Profile curvature, Proximity to river, roads, and Land use/land cover) for the development of flood susceptibility map. All the conditioning factors were correlated with flood inventory map using the information value method. The final susceptibility maps were validated using prediction rate and success rate curve. The results from validation showed that the areas under curve in the prediction rate curve for the models are: Model A (99.47%), Model B (95.04%), and Model C (94.06%), respectively. The Area under curve (AUC) in the success rate curve obtained for the three models are: Model A (95.03%), Model B (86.91%), and Model C (89.67%), respectively. Eventually, the susceptibility maps were classified into five susceptibility zones. The success rate and prediction rate curve indicated that model A has more accuracy in comparison to model B and model C; though, the reHow to cite this paper: Ul Moazzam, M. F., Lee, B. G., Ur Rahman, A., Farid, N., & Rahman, G. (2020). Spatio-Statistical Analysis of Flood Susceptibility Assessment Using Bivariate Model in the Floodplain of River Swat, District Charsadda, Pakistan. Journal of Geoscience and Environment Protection, 8, 159-175. https://doi.org/10.4236/gep.2020.85010 Received: April 21, 2020 Accepted: May 18, 2020 Published: May 21, 2020 Copyright © 2020 by author(s) and Scientific Research Publishing Inc. This work is licensed under the Creative Commons Attribution International License (CC BY 4.0). http://creativecommons.org/licenses/by/4.0/ Open Access M. F. Ul Moazzam et al. DOI: 10.4236/gep.2020.85010 160 Journal of Geoscience and Environment Protection sults obtained from prediction and success rate curve indicated that all the models are reliable and has no significant difference between the susceptibility maps. Consequently, results obtained from this study are useful for researchers, disaster managers, and decision-makers to manage the flood-prone areas in the study area to mitigate the flood damages.


Introduction
Natural hazards are increasing day by day and have gained attention at the global and national levels. Flood is the most common and destructive hazard among all-natural disasters (Uddin et al., 2013). It is one of the most severe hazards in which the river cannot accommodate water more than its capacity and overspills on the banks of river and causes the economic, social, and human losses (Jonkman, 2005). According to the international disaster database, more than a million people died every year due to floods in the low-income countries [EM-DAT: The OFDA/CERD, 2010 cited in (Al Saud, 2012)].
Flood is the most destructive natural hazard in Pakistan, and since independence, the country has faced seventeen severe floods, which have caused an economic loss of 12 billion USD (WAPDA, 2013). However, since 2010, the country has faced floods almost every year during the monsoon season (July-September). The 2010 flood of Pakistan was worst in history and was more destructive than the 2004 Indian tsunami, 2005 Kashmir earthquake, 2008 Cyclone Nargis, and 2010 Haiti earthquake (Alam et al., 2015). The 2010 flood killed more than 1900 people, affected 17 million of agricultural land, 1.5 million houses, and 20 million people affected due to the flood (WAPDA, 2013). Mostly the low-lying areas affected by riverine flooding during monsoon season and flash flood belong to hilly and semi-hilly regions. Indus River is the primary source of flooding in Pakistan and distresses the river basins in Khyber Pakhtunkhwa, Sindh, and Punjab. Since 1973, the densely populated districts of Khyber Pakhtunkhwa (Charsadda, Nowshera, and Peshawar) get affected by floods (Atta-ur-Rahman & Khan, 2013;. In the 2010 flood, Mardan, Charsadda, Nowshera, and Peshawar were severely affected because of its exposure to three main rivers of the province, namely, river Swat, Kabul, and Indus (Ahmad et al., 2011;Khan & Iqbal, 2013). Swat and Kabul River went through record flow of 400,000 cusecs of discharge in 2010 flood, which broke the previous record of 250,000 cusecs in 1929. The exceptional high discharge overwhelmed Charsadda, Peshawar, and adjoining areas during the 2010 flood (WAPDA, 2013). Figure 1 is showing the number of affected and dead people due to the disastrous floods in the history of Pakistan. The 2010 flood is the deadliest one, which killed almost 1900 and affected 20 million people, followed by a flood in 1992 and 1995 in which 9.8 and 1.8 million people were affected with death toll of 1446 and 1063 respectively (EM-DAT, 2018).
Flood inventory mapping is the first step towards susceptibility mapping (Rahmati et al., 2016). There are various techniques available for the generation of flood inventory maps, for example, Support vector machine (Ali et al., 2017) visual interpretation, and object based image classification (Owen et al., 2008) but mainly develop through visual interpretation of satellite images.In literature, it was found that most of the time, inventory data was divided with a 70:30 ratio (Tehrany et al., 2015); however, a few researchers used the 80:20 ratio as well (Bacha et al., 2018). Flood susceptibility can be assessed by qualitative and quantitative techniques that can divide the area into various susceptibility zones. Many statistical methods have been used for flood susceptibility which include Frequency ratio, weight of evidence (Rahmati et al., 2016), and logistic regression model .
The objective of this paper is to explain the effects of two different types of flood inventories (70:30 & 80:20 ratio) and also the partial combination of conditioning factors on the final flood susceptibility map using the bivariate statistical information value method. Thus, three different kind of models developed for this study which are: Model (A) with 70% training and 30% validation inventory data; moreover, Mode (B) used 80% training and 20% validation inventory data. Model (C) is totally different from Model (A & B), Model (C) used few conditioning factors for flood susceptility mapping, to evaluate the predictive power of selected combination of parameters in Model C, only seven out of nine parameters were used. Eventually, three different kinds of flood susceptibility maps obtained Figure 5 and Table 2. Hence, there is a need for flood susceptibility mapping, which can mitigate the impacts of flood, flood inventory and flood susceptibility maps are of a significant importance for developing and implanting flood mitigation and providing a base for proper flood management strategies.
In this study, flood inventory was developed using remote sensing data. The spatial distribution of flood inventory was evaluated for developing flood susceptibility map using information value technique.

Study Area
According to the 2017 population census, District Charsadda is the 7 th largest district in the Province of Khyber Pakhtunkhwa, located between 34˚2'42" to 34˚27'24" North Latitude and 71˚29'10" to 71˚56'7" East Longitude. The significant crops in the area are sugarcane, wheat, rice, and tobacco. Intensive monsoonal rainfall and melting of snow from the mountainous region are the primary sources of river flow. The main sources of irrigation in the district are River Swat, River Kabul along with the upper and lower swat canal, Michni Dalazak canal, and Dooaba feeder canal Farish et al., 2017). According to the location and flow of rivers in the district make it susceptible to floods. Monsoonal rainfall starts in July and ends in September. Figure 4 shows the flooded area, the record 274 mm rainfall occurred on 29 July 2010, which results in the flooding which incurred around 1156 human losses and affected 3.8 million people in Khyber Pakhtunkhwa (Ushiyama et al., 2014). The district was severely affected in 2010 flooding due to the intensive rainfall, and rivers failed to accommodate plenty of water. In 2010 the discharge of the Swat river at Munda headworks was recorded at 300,000 Cusec (Moazzam et al., 2018).
This study is based on the devastating flood event, which occurred in July 2010. Figure 2 shows the highest annual rainfall recorded in 2003 with 904 mm, which is followed by 710 mm, 667 mm, 642 mm, and 595 mm in 1983, 1996, 1994, and 2010 respectively. However, the maximum annual rainfall in monsoon season (July-September) reaches to 400 mm in 2010, followed by 2003,1984,1983 with 381 mm, 359 mm, and 298 mm rainfall, respectively

Materials and Methods
In this study information value technique used for flood susceptibility mapping.
The methodology adopted for this study is given in Figure 3.

Flood Inventory
Flood locations were identified using Landsat 7 ETM+ captured after the flood event. The product consists of six multispectral bands with 30 m spatial resolution, one thermal band with 60 m spatial resolution and one panchromatic band   with 15 m resolution. All raw images were atmospherically corrected. Eventually, flood, affected areas were demarcated by visual interpretation of Landsat 7 ETM+ image (Table 1), and a total of 161-flood locations were mapped over the study area ( Figure 4). It is necessary for developing the model to divide the flood inventory data into two groups, i.e., training and validation (Caniani et al., 2008). For the development of the model, the training dataset was used to allocate the weight values to each conditioning factors ; on the contrary, validation flood inventory datasets were used to estimate the efficacy of the models. There are no such guidelines available for the division of inventory data (Pradhan, 2010). Thus, in this study, flood inventory was subsequently divided into both 70:30 and 80:20 ratio to assess the effects of the division of inventory data on the final susceptibility map and its efficacy.

Flood Influencing Factors
It is necessary to have main influencing factors for generating of flood susceptibility map (Lee et al., 2012;Tehrany et al., 2015;Rahmati et al., 2016). Several influencing factors contribute to flooding, i.e., elevation, slope, aspect, curvature, plan and profile curvature, land use/land cover, proximity to roads, and proximity to rivers. The selection of influencing factors for flood susceptibility mapping varies from area to area. The correlation of each influencing factor should be assessed with flooding to perform the flood susceptibility mapping (Elkhrachy, 2015).
All the conditioning factors were classified using the natural break classification method (Margarint et al., 2013). The high-quality topographic representation is the base to build the best quality of flood susceptibility model. Low elevated and flat areas are more prone to flood, and for that purpose, ASTER DEM with 30 m spatial resolution (Table 1) was used in this study to extract the elevation, slope, aspect, and curvature. The attribute of DEM can play a significant role for identifying the prone areas towards flooding (Pradhan, 2009).
Land use/Land cover is the important influencing factor for flood susceptibility assessment because each class of LULC has different effects on increasing or decreasing the flow of water. Shrubs and bushes can control and reduce the flood; on the contrary, barren/open land intensifies the flooding (Tehrany et al., 2015). Common land use types in the study area are Settlement, agricultural land, shrubs and bushes, rangeland, and water bodies.
Proximity to river is a significant factor due to its impact on flood spread and magnitude (Glenn et al., 2012). Euclidean distance tool was used to generate the proximity to the river map.
Slope angle can hold the surface runoff infiltration and the velocity of water flow. The slope angle is inversely proportional to rise in the lower catchment (Tehrany et al., 2015;Khosravi et al., 2016).
Curvature is the significant geomorphological index extracted from DEM, which defines the rate of change of slope in a particular direction. Plan and profile curvature can better assess the flow and slope morphology .
Proximity to roads, the impact of roads on water depth is less significant as compared to the velocity of the flood, which can cause damage to the roads and flow-through structures (Nam, 2011).
Slope aspect is concerned with trends of earth surface and patterns of moisture in the soil, so it is considered as an effective hydrological factor.
Elevation is considered as the most significant factor as it decreases the occurrence of flood and increases the flood resilience because it is a natural fact that on high elevated areas, flood cannot occur.

Information Value Method
Information value is a bivariate statistical method proposed by (Yin & Yan, 1988) and later modified by (Van Westen, 1993). In statistical analysis, all parameters are compared to the flood inventory map. The weighted values were used to categorize the classes of the parameters based on landslide density (Yalcin, 2008). IFV was used to predict the event based on the correlation between flood inventory map and its conditioning factors. This method can determine the impact of flood conditioning parameters on floods in the area (Zêzere, 2002). The information value l i of each factor i can be calculated using the formula given below (Yin & Yan, 1988).
where S i = class containing the landslide pixels. N i is the total number of pixels in the class. S is the total number of landslides pixels in the study area. N is the total number of pixels in the study area.
The natural logarithm takes care of variations in the values if the density of flood is lower than normal, the negative weight will be assigned, and if the density of flood is more than normal, the positive weight will be assigned (Saha et al., 2005). Thus the information value of each class of a parameter was summed up using (Equation (2)).

Results and Discussion
The flood susceptibility maps were produced from the bivariate statistical method alongside its sub-models using a GIS-based approach.

Information Value Method and Flood Susceptibility Assessment
The information value of each influencing factor was calculated through Equa-tion (1), and their relationship with the occurrence of the flood is shown in (Table A1) After the calculation of IFV values for all influencing parameters, that values were assigned to each concerned class of a parameter and summed up using Equation (2) in order to get the final flood susceptibility map. The final susceptibility maps were divided into various zones range from very low to very high using the natural break classification method ( Figure 5).

Validation of the Model
The results obtained from the models were evaluated with success and prediction rate curve, and the area under curve (AUC) was calculated using Excel software. The area under curve indicates the efficacy of the model for its reliability in the prediction of flood occurrence (Chung & Fabbri, 2003). The model would be considered good if AUC value ranges from 50% -100%; however, below 50% would be considered as a failure of the model. For the success rate curve, the training flood inventory (112, 129) was compared with flood susceptibility maps (Figure 6), and area under curve of three models were calculated. The results obtained from the success rate curve of all the models shown in  ( Figure 5). Though, the results indicated by the success rate curve are not appropriate for predicting the flood (Brenning, 2005). That is why the prediction rate curve is taken into consideration to predict the occurrence of flood and can describe how well the flood models and its conditioning factor predict the flooding (Chung & Fabbri, 2003;Brenning, 2005). Thus, the prediction rate curve results were obtained by comparing the validation flood inventory map (49, 32) with flood susceptibility maps ( Figure 5). Figure 7 shows the results obtained from the prediction rate curve and area under curve (Table 2) for the three models are 99.47%, 95.04%, and 94.06%. Subsequently, the results obtained from the success and prediction rate curves indicated that Model A is more accurate in assessing floods as compared to Model B and Model C, but the results of success and prediction rate curve proves that all the models (flood inventory with 70%, 80%, and few selected parameters) have overall good accuracy for assessing flood in the study area. And the comparison of flood susceptibility zones with flood inventory map also proved that calculated and classified susceptibility zones are in good agreement because of very high, high, and moderate susceptibility zones covered most of the flooded area (Figure 8).

Discussion
It has been mentioned that bivariate statistical methods usually use in hazard studies with a pre-defined ratio of inventory dataset. However, a few researchers have used different ratio of inventory (Bacha et al., 2018), however, in this study, both (70% & 80%) training dataset and different combinations of conditioning factors were used for producing the susceptibility maps, which make this study novel. Many researchers have used 70:30 ratio for inventory and a few used 80:20 ratio but this study used both to test what are the effects of random inventory data on final susceptibilty map which also differentiate this work from others.
The correlation between flooding and its conditioning nine factors was investigated using the information value method (IFV) and their results shown in (Table A1). The study revealed that proximity to roads, proximity to rivers, land use/land cover, and low elevated areas of the nine factors are the significant factors for the occurrence of flooding. Especially, the proximity to river is the most important factor for flooding in the study, which is also observed by . The majority of floods occurred within the range of 200 m from rivers. According to Tehrany et al. (2014), the elevation and curvature are generally the essential factors for the occurrence of flood. The results of this study revealed that flood occurs on the low elevated and flat area (Table A1). The elevation class below 317 m and slope angle in a range of 3.30 -6.58 were the most susceptible areas for flooding. Moreover, it is also the natural fact that floods cannot occur in high elevated areas, and when the angle of slope increases, the number of flood cases increases in the catchment area (Tehrany et al., 2015;Khosravi et al., 2016). Among the concave and convex slope of curvature, plan and profile curvature were found more susceptible to flooding which is also observed by Khosravi et al., (2016) but contradicting with various other studies Tehrany et al., 2015;Rahmati et al., 2016). Flat, north, southeast, and southwestern slopes are more susceptible to the occurrence of flooding. Rahmati et al. (2016) also concluded that flat and southwest facing slopes are susceptible to flooding. Rangeland and water bodies class from land cover factor has the highest IFV values 0.444 and 0.580, respectively. From the results, we found a close relationship of flood susceptibility with distance from roads, and it is clear from Table A1, that distance from roads increase the flood susceptibility increase.
Based on the results, the very high and high susceptibility zones contribute 23% in model A, 20%, and 16% in model B and model C, respectively. Mostly very high and high susceptible zones are located in the north, east and southeastern parts of Charsadda and its proportion expressively increase in the southeastern part. According to the flood susceptibility map of Charsadda, most parts are located in very low and low susceptibility zones.
In this study, the effects of different combination of flood inventory dataset and the combination of various conditioning factors were also assessed (Table 2 and Figure 5). The prediction rate curve for Model A, Model B, and Model C was obtained 99.47%, 95.04%, and 94.06%, respectively. Meanwhile, the success rate curve also showed that Model A has the better accuracy as compared to Model B and Model C, with area under the curve obtained 95.03%, 86.91%, and 89.67% respectively. As the one way ANOVA and student t-test were performed to test the significant difference between the maps obtained from the given models, the result of tests showed no significant difference between them. Thus, we can find that the various types of flood inventory maps and combinations of different influencing factors have no significant variation on the final flood susceptibility map in the study area.

Conclusion
Floods are the most dominant and destructive phenomena in Pakistan. Flood susceptibility mapping has been used for watershed management in order to have sustainable development. For the mapping, it is necessary to have an accurate and reliable method to identify the flood-prone areas; moreover, this needs to gain the author's attention to understand the capabilities of these approaches. In this study, flood susceptible zones have been identified using IFV (Information Value) method. Initially, flood inventory map with 161 floods locations was prepared by the visual interpretation of Landsat 7 ETM+ satellite image. The nine conditioning factors (Elevation, slope degree, slope aspect, slope curvature, plan, and profile curvature, LULC, proximity to roads, and rivers) were used for the flood susceptibility maps using the IFV method. Eventually, AUC curves using validation datasets were produced to test the efficacy of the model. The validation results show that IFV with various flood inventory types and combinations of various conditioning factors have more or less similar results.
The information values calculated for the nine factors helped to determine the significance of them for the occurrence of flooding in the study area. In these the factors, proximity to roads, proximity to rivers, land use/land cover, and elevation are significant contributing factors for flooding that were found. The flood inventory map and flood susceptibility map generated in this study could be used for flood hazard and risk management. Therefore, this map can also assist decision-makers, urban planners, and engineers for proper actions to avoid and lessen the flood occurrence in future.