An Effective Prediction Method for Supporting Decision Making in Real Estate Area Selection ()
1. Introduction
With the rapid development of economics, investment and development in real estate have entered a new era of development [1] [2]. The area selection of real estate projects is the first step in real estate investment [3]. The success or failure of a real estate project development depends on the area of the project. How to effectively evaluate the development value of an area is an important and difficult part of scientific decision-making for real estate developers.
There are many factors that affect the area selection of real estate projects [4]-[8]. The area selection of a real estate project is a comprehensive weighing of various factors, so the area selection of a project often depends on the experience, cognition, and subjective thoughts of decision-makers. While previous studies on this issue mainly focused on market needs and economic prospects, ignoring the impact of natural disasters, we observed that natural disasters are important for real estate area selection because they will introduce considerable losses to real estate enterprises. The values of real estate are highly impacted by severe weather and natural disasters because they will affect the construction progress of houses and destroy existing buildings. Extreme weather events, exacerbated by climate change, are increasingly disrupting lives and threatening property security [9]. Property owners rely more on insurance for protection, yet insurers, facing claims in 2022 more than double the 30-year average, are hiking premiums and scaling back coverage. This has widened the global insurance protection gap to 57%, underscoring a growing crisis for both insurers and property owners [10]. Real estate developing companies are encouraged to pursue market expansion, diversifying their offerings across various areas while striving to minimize premium hikes. Nonetheless, these firms must be equipped with practical risk assessment tools to avoid incurring substantial losses, particularly in the face of natural disasters. There is an urgent need to develop a more sophisticated and rational decision-making model tailored to evaluating and managing the risks associated with natural catastrophes [9] [10].
In this paper, we study the approach for real estate developing companies to select the most valuable areas when they plan to start a new real estate development project. Overall, the main contributions of the paper are as follows:
We propose a self-defined new indicator called Average Loss Ratio to measure the probability of natural disasters in an area. We experimentally demonstrate that the new indicator can provide a foundation for decision-making in real estate area selection.
We present a decision model based on the ARIMA model to predict the Average Loss Ratio of an area. The evaluation results on real datasets, including EM-DAT, NAIC, and KPMG, show that the proposed Average Loss Ratio can predict the possible loss of an area effectively.
We improve the TOPSIS model by integrating Average Loss Ratio into the model and use the enhanced TOPSIS model to solve the real estate area selection problem. The results on real datasets, including EM-DAT, BEA, and USA-GDP, show that our approach is effective in selecting real estate areas.
The remainder of the paper is structured as follows. Section 2 reviews related literature. Section 3 introduces assumptions and notations. Section 4 details our models and results. Finally, Section 5 concludes the whole paper and discusses future work.
2. Literature Review
Real estate area selection refers to the process of choosing the optimal location for a real estate project, such as residential housing, commercial buildings, or other property developments [3]. It involves evaluating various factors to determine the best area for the project’s success, profitability, and marketability.
The primary objective is to identify a location that maximizes the project’s potential for success. There are many factors that affect the area selection of real estate projects [4]-[8]. The area selection of a real estate project is a comprehensive weighing of various factors, such as market demand [4], geographic conditions [7], transportation accessibility [8], land prices [5], and other relevant considerations [6]. Therefore, the area selection of a project often depends on the experience, cognition, and subjective thoughts of decision-makers.
When conducting real estate area selection, developers or investors typically conduct extensive market research to understand the needs and preferences of potential buyers or tenants. They also consider the economic prospects of the area, population growth trends, and future infrastructure plans [11]-[13]. Other important considerations may include land costs, local government planning and regulations, the proximity and influence of existing competitors, availability of essential services such as healthcare, education, and shopping centers, as well as natural environmental factors like landscapes and conservation requirements.
Existing work on real estate area selection encompasses a range of research and practical approaches [3] [6]-[8]. For example, previous studies have considered location factors and examined the significance of these factors and their impact on area selection decision-making [3]. Risk evaluation is essential in area selection to mitigate potential challenges and uncertainties. Researchers have explored methods for assessing and managing risks associated with factors like environmental hazards, legal constraints, market fluctuations, and financial feasibility [13]. With increasing emphasis on sustainability, researchers have focused on integrating environmental considerations into area selection processes [14] [15]. This includes evaluating areas based on energy efficiency, green building practices, environmental impact assessments, and the promotion of sustainable communities. Advancements in technology, such as big data analytics, machine learning, and artificial intelligence, have facilitated data-driven area selection approaches [16] [17]. Researchers have explored the use of these technologies to analyze large datasets, predict market trends, optimize area selection, and automate decision-making processes.
Overall, research on real estate area selection encompasses a multidisciplinary approach, combining elements of economics, geography, urban planning, and decision sciences. The aim is to develop robust methodologies and tools that assist developers, investors, and policymakers in making informed area selection decisions that align with market demands, financial objectives, and sustainable development principles. However, few studies have focused on the impact of natural disasters on real estate area selection. To the best of our knowledge, this study is the first one that proposes a new indicator, average loss ratio, to study the impact of natural disaster on real estate area selection.
3. Assumptions and Notations
3.1. Assumptions
Before presenting the solution to the problem, we first make the following assumptions.
1) We assume that the natural disaster damage data obtained are authentic and credible, with no invalid values.
Since the losses caused by natural disasters are generally tallied and reported by national agencies, we consider such data to have a high level of authority and reliability. Additionally, natural disaster losses are highly volatile, and hastily removing outliers could potentially lead to the loss of critical features, affecting the prediction precision. Therefore, we assume that all acquired data are authentic and credible, and we do not discard any data.
2) We assume that the disaster conditions, population count, and population growth rate are independent variables that do not influence each other.
Currently, there is no clear evidence to suggest a significant correlation between the aforementioned three variables. Therefore, for decision analysis, we can assume that these three criteria are mutually independent, and a change in one criterion will not affect the utility values of the others.
3) We assume that there are no other policies, economic factors, or such influences affecting real estate developers in their development activities.
This study focuses solely on the impact of natural disasters and population demand. We regard all other factors related to housing development as irrelevant. These factors do not influence real estate developers’ decisions. By doing this, we simplify the model and keep it targeted to our main objectives.
3.2. Notations and Datasets
The main symbols we use in this paper and the explanations of them are put in Table 1. The symbols which are not frequently used will be introduced once we use them.
Table 1. Main symbols used in this study.
Symbol |
Description |
Pft |
Predicted annual profits based on natural conditions. |
P |
Predicted annual profits based on market conditions. |
Prem |
Annual premium of insurance covering natural disasters. |
Clms |
Annual claim of insurance covering natural disasters. |
Loss |
Annual disaster loss amount. |
LR |
Loss Ratio. |
|
Average Loss Ratio. |
|
The total asset value of a year. |
GDP |
Gross Domestic Product. |
PIS |
Positive Ideal Solution. |
NIS |
Negative Ideal Solution. |
CSI |
Composite Score Index. |
The datasets used in this study are summarized in Table 2. Overall, we use six datasets, but each task only uses one or more datasets. All datasets are public on the Web.
Table 2. Datasets used in this study.
4. Models and Verification
In this section, we detail the models and verification for the problem of real estate area selection. We first describe the model and then present the model’s verification results.
4.1. Average Loss Ratio
In this paper, we define a new indicator named Average Loss Ratio to reflect the impact of natural disasters on real estate area selection. Such an indicator is used to predict the possible losses of an area according to historical data.
First, in order to get the loss information about a specific area, we refer to the historical insurance data that are associated with particular insurance covering natural disasters. Insurance companies usually release two main types of information: the annual premium and the annual claim. Thus, we can establish an equation that maps the annual claim caused by natural disasters to the disaster loss amount, yielding Equation (1), where Loss represents the annual disaster loss amount in the area.
(1)
However, predicting the exact value of Loss is highly challenging because the total property value, as well as the scale and frequency of disasters within an area, varies from year to year. As a result, there will be wide fluctuations in the value of Loss, complicating the task of making reliable predictions. To verify this analysis, we refer to the EM-DAT database (see Table 2) to experimentally compare the annual disaster loss amount caused by extreme weather, geological, hydrological, and storm disasters in the state of Texas, USA, from 2016 to 2023. The results are shown in Figure 1. We can see that the annual disaster loss amount fluctuates significantly, making it difficult to fit a trend. This reflects the extreme difficulty in directly predicting the disaster loss amount.
Figure 1. Annual disaster loss in Texas.
To eliminate the interference of total property value on the prediction, we introduce a new parameter, the loss ratio. The loss ratio can be defined by Equation (2).
(2)
Here, Loss refers to the annual disaster loss amount, and
represents the total asset value for the year. LR is defined as the Loss Ratio. By using this ratio, we eliminate the interference of the total property value.
The intensity and frequency of natural disasters in an area are closely related to its geographical location and climatic conditions. For instance, areas located in tropical monsoon areas receive substantial and concentrated rainfall almost every year, leading to floods, while inland areas situated in the central parts of the continents are almost free from such risks. Although the global climate is changing, the climatic conditions of a place are unlikely to undergo significant alterations over shorter periods (within 10 years). Taking into account the factors above, we believe that LR (Loss Ratio) conforms to statistical characteristics, meaning that LR for a given area should have a constant theoretical value.
Based on the data from the Public EM-DAT database, we take Texas as an example and calculate the Loss Ratio (LR) from 2016 to 2023, and the results are shown in Figure 2. We can see that the data volatility has been significantly reduced, with most data points concentrated within a very small range. Accordingly, we use the Average Loss Ratio (denoted as
) to assess the theoretical disaster severity of the area. The average loss ratio represents the likelihood and extent of disaster losses in an area due to its geographical and climatic conditions. The average loss ratio can be calculated using Equation (3). Here,
represents the average loss ratio, where LR is the Loss Ratio for each year, and n is the total number of years.
(3)
Figure 2. The Loss ratio (LR) of Texas.
Having obtained
, we can then calculate the expected loss amount for a given year using
for that year, which in turn yields the payout for that year. The calculation method is demonstrated in Equation (4).
(4)
Subsequently, we obtain the annual Prem,
, and P for a certain area from 2016 to 2023 and predict their future values using the ARIMA model [18] [19]. The ARIMA model, short for AutoRegressive Integrated Moving Average, is a popular statistical method for time-series prediction. The ARIMA model is particularly suitable for short to medium-term prediction when the data exhibits signs of non-stationarity, as it can capture the dynamics of past values and the randomness in the data, making it quite fitting for the prediction of this Task. The ARIMA model has been widely used in economics for forecasting future sales, stock market trends, industrial production, and macroeconomic indicators. Its adaptability to different types of input data and forecasting requirements makes it a versatile tool in time series analysis. In this study, we use the IBM SPSS 26.0 software to apply the ARIMA model and predict Prem,
, and P for 2024. Subsequently, we will predict the expected loss of an area.
We use the EM-DAT database (see Table 2) to verify the model. Particularly, we chose two areas to verify our model: Texas in the United States and Australia. They are located in North America and Oceania, respectively.
For Texas, we obtained relevant data from the EM-DAT database, compiling and calculating the annual Loss for Texas from 2016 to 2023, adjusted to the 2023 currency value. We also retrieved the corresponding annual
from the Texas Department of Treasury website. Based on these two datasets, we calculated the annual LR and determined
. At the same time, through the NAIC, we acquired the property insurance industry’s annual Prem and P for the state for those years. We conducted the ARIMA analysis on
, Prem, and P in IBM SPSS 26.0. Due to space constraints, we only present the analysis report for
and Prem; the analysis procedures and results for the other variables are similar.
Figure 3. Predicted PVt result for Texas.
Figure 3 shows the ARIMA analysis results for
. According to the analytical report provided by IBM SPSS 26.0, the model has an R2 of 0.944, indicating a good fit. Consequently, the predicted value of
for the year 2024 is 6034.070083 billion USD.
Figure 4. Predicted Prem result for Texas.
The Prem result of the ARIMA model for Texas is shown in Figure 4. The model has an R2 of 0.917, indicating a good fit. From this, we can infer that the predicted value for Prem in the year 2024 is 87288.91835 million USD.
Continuing with the calculations for the remaining variables, we ultimately obtained the final results of the ARIMA model, as shown in Table 3.
For Australia, we obtained relevant data from the EM-DAT database, compiling and calculating the annual Loss for Australia from 2016 to 2023, adjusted to the 2023 currency value. We also retrieved the corresponding annual
from the KPMG Database (see Table 2). Based on these two datasets, we calculated the annual LR and subsequently determined the
. At the same time, through the KPMG Database, we acquired the property insurance industry’s annual Prem and P for the country for those years.
Table 3. The predicted results for Texas in 2024.
Variables |
Value |
Unit |
|
6034.070083 |
billion USD |
|
0.011608 |
N/A |
Loss |
70043.48552 |
million USD |
Clms |
70043.48552 |
million USD |
Prem |
87288.91835 |
million USD |
Pft |
17245.43283 |
million USD |
P |
23131.56336 |
million USD |
We conducted the ARIMA analysis on
, Prem, and P using IBM SPSS 26.0. Due to the space limit, we present only the Prem result; the analysis procedures and results for the other variables are similar.
Figure 5. Predicted Prem result for Australia.
Figure 5 shows the Prem result of the ARIMA analysis. According to the analytical report provided by IBM SPSS 26.0, the model has an R2 of 0.956, indicating a good fit. Consequently, the predicted value of Prem for the year 2024 is 44820.5 million USD.
Continuing with the calculations for the remaining variables, we ultimately obtained the final results of the model, as shown in Table 4.
Table 4. The predicted results for Australia in 2024.
Variables |
Value |
Unit |
|
5630 |
billion USD |
|
0.001012 |
N/A |
Loss |
5697.56 |
million USD |
Clms |
5697.56 |
million USD |
Prem |
44821 |
million USD |
Pft |
39123.44 |
million USD |
P |
18600.712 |
million USD |
4.2. Real Estate Area Selection
In this subsection, we present the model design and verification to offer decision support for real estate developing companies to select appropriate areas. The new contribution of this part is that we propose an enhanced TOPSIS model integrating the newly proposed indicator, Average Loss Ratio.
4.2.1. Enhancing the TOPSIS Model for Real Estate Area Selection
We need to establish a model to assist real estate companies in assessing which area is most worthwhile for development. According to the previous analysis, we need to consider disaster factors and the demand of the regional population. Therefore, we will take into account four factors:
(Average Loss Ratio), population, population growth rate, and GDP.
Further, we propose to use the TOPSIS model to support decisions for real estate area selection. TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) [20] [21] is a widely used MCDM (Multi-Criteria Decision-Making) tool suitable for the multi-criteria decision scenario presented in this question. TOPSIS is based on the concept that the chosen alternative should have the shortest geometric distance from the Positive Ideal Solution (PIS) and the longest geometric distance from the Negative Ideal Solution (NIS). TOPSIS is straightforward to implement and easy to understand, which helps in explaining the decision-making process to managers.
The novelty of our model is that we introduce the newly proposed indicator,
(Average Loss Ratio), into the original TOPSIS model to improve the effectiveness of the model.
We will use the states of Alabama, California, Florida, Georgia, and Washington as examples for TOPSIS analysis and provide recommendations. Before conducting the analysis, we first need to obtain data on the population count, population growth rate, GDP, and
for these areas in the year 2024.
After obtaining the data, we apply the Grey Prediction Model [22] [23] to predict each set of data. The Grey Prediction Model is a forecasting technique that falls under the category of Grey System Theory. The Grey Prediction Model is particularly valuable in scenarios where traditional modelling techniques struggle due to a lack of data or high levels of uncertainty. It offers a unique approach by leveraging the information available, no matter how incomplete, to generate forecasts that can guide decision-making in complex and uncertain environments. In addition, the Grey Prediction Model does not require a large volume of data and is particularly suitable for situations where the data fluctuates significantly. This model is appropriate for handling data with a degree of randomness, such as population growth rates and GDP, making it suitable for the scenario of this study.
Integrating the enhanced TOPSIS and the Grey Prediction Model, we will ultimately obtain a ranking of the recommendation levels for the above five states, assisting real estate companies in their decision-making process.
4.2.2. Model Verification
The datasets (see Table 2) used in this task include the EM-DAT database and two other datasets, including the USA-POP dataset and the BEA dataset. The USA-POP dataset consists of population-related data for each state in the USA from 2016 to 2023. The BEA dataset consists of the GDP data for each state in the USA.
First, we complete the prediction for the relevant data for the year 2024. After organizing the data, we apply the Grey Prediction Model in IBM SPSS 26.0 to derive the predicted values for 2024. With a variety of data types, the model results are similar. Due to space limitations, we will take the population growth rate of Georgia as an example to analyze the results of the Grey Prediction Model.
Figure 6. Predicted population growth rate.
We input the raw data into IBM SPSS 26.0, select Grey Prediction, and obtain results as shown in Figure 6. According to the analysis report from IBM SPSS 26.0, the model has an average relative error of 7.866%, indicating a good fit. From this, we obtain the predicted population growth rate for Georgia in 2024, which is 0.805%. The other predicted values for Task 2 can be obtained in the same manner.
For
, we use the same methods and data types as in Section 2 for calculation. In addition to the EM-DAT database, we also use the CEIC database (see Table 2) to calculate the Average Loss Ratio for other states.
After completing all calculations and predictions, we get Table 5. Table 5 displays all the input data required for conducting the TOPSIS analysis.
Then, we proceed with the TOPSIS model processing flow. According to Equation (5), we complete the data normalization.
(5)
Here,
is the original data in the decision matrix, and
is the value after normalization. Subsequently, we utilize SPSS PRO to calculate the weights of
Table 5. TOPSIS input data for the year 2024.
Area |
Population |
GDP (Trillions $) |
|
Growth Rate (%) |
California |
41,372,000 |
3.898 |
0.00058 |
3.757 |
Alabama |
5,097,641 |
0.302 |
0.00559 |
0.748 |
Washington |
7,951,150 |
0.808 |
0.00037 |
1.052 |
Florida |
22,610,726 |
1.595 |
0.00386 |
1.833 |
Georgia |
11,000,000 |
0.811 |
0.00066 |
0.805 |
each indicator using the Entropy Weight Method [24]. The calculation results are shown in Table 6.
Table 6. The calculated indicator weights.
Indicator |
Information Entropy (e) |
Information Utility (d) |
Weight (%) |
GDP |
0.657 |
0.343 |
25.354 |
Population |
0.644 |
0.356 |
26.293 |
Growth Rate |
0.527 |
0.473 |
34.912 |
|
0.818 |
0.182 |
13.441 |
After obtaining the weights for each indicator, we use Equation (6) and Equation (7) to calculate the ideal and negative-ideal solution vectors, with the results shown in Table 7.
(6)
(7)
Table 7. Calculation of PIS and NIS values.
Indicator |
Positive Ideal Solution |
Negative Ideal Solution |
GDP |
0.99996451 |
0.00003549 |
Population |
0.99996336 |
0.00003664 |
Growth Rate |
0.99996253 |
0.00003747 |
|
0.99995927 |
0.00004073 |
Finally, we calculate the distance between each option and the Positive Ideal Solution (PIS) and the Negative Ideal Solution (NIS), and we compute the relative closeness. We use Equation (8) for these calculations.
(8)
Finally, we score each option based on
and
, with the scoring method as shown in Equation (9). With this, we have completed the TOPSIS analysis, resulting in the table presented in Table 8.
(9)
Table 8. The TOPSIS result for the year 2024.
Area |
PIS Distance (
) |
NIS Distance (
) |
CSI |
Rank |
California |
0.01423101 |
0.99479635 |
0.98589631 |
1 |
Alabama |
0.99992562 |
0 |
0 |
5 |
Washington |
0.83212112 |
0.38026516 |
0.31365017 |
3 |
Florida |
0.61400106 |
0.39274119 |
0.39011096 |
2 |
Georgia |
0.84112637 |
0.36339509 |
0.3016925 |
4 |
From Table 8, we can see that California has the highest Composite Score Index (CSI) and ranks first, while Alabama has the lowest Index and ranks last. Therefore, we recommend that real estate developers prioritize development in California. By meeting the local community and population needs while experiencing lower risks of natural disasters, they can achieve higher profits.
5. Conclusions and Future Work
In this paper, we presented an effective approach for predicting the most valuable area for real estate development. We proposed a sefl-defined new indicator called Average Loss Ratio to measure the probability of natural disasters in an area, based on which we presented a decision model based on ARIMA to predict the average loss ratio of an area. In addition, we proposed an enhanced TOPSIS model for real estate area selection. Combined with the Grey Prediction Model, we demonstrated that such a model can offer high effectiveness for real estate companies to select areas. We verified the effectiveness of the proposed models on various datasets, and the results suggested the effectiveness of our proposal.
Some limitations of this study are as follows. First, the current models in this study only consider a few existing models, such as ARIMA and TOPSIS. Many other models could also be used as the fundamental models. In the future, we will consider other models or hybrid models to improve the prediction performance. Second, the datasets used in this study only include the USA and Australia. Thus, the conclusions dropped could only be meaningful to these countries. In the future, we will try to obtain datasets from a wide range of countries, like Asian and European countries, enhancing the scalability and adaptivity of the proposed approach. Third, the proposed approaches in this study were evaluated on a static dataset. When a dataset is frequently updated in some scenarios, it is necessary to refine the models to make them adaptable to dynamic datasets.