Path Loss Modeling: A Machine Learning Based Approach Using Support Vector Regression and Radial Basis Function Models

Path loss prediction models are vital for accurate signal propagation in wireless channels. Empirical and deterministic models used in path loss predictions have not produced optimal results. In this paper, we introduced machine learning algorithms to path loss predictions because it offers a flexible network architecture and extensive data can be used. We introduced support vector regression (SVR) and radial basis function (RBF) models to path loss predictions in the investigated environments. The SVR model was able to process several input parameters without introducing complexity to the network architecture. The RBF on its part provides a good function approximation. Hyperparameter tuning of the machine learning models was carried out in order to achieve optimal results. The performances of the SVR and RBF models were compared and result validated using the root-mean squared error (RMSE). The two machine learning algorithms were also compared with the Cost-231, SUI, Egli, Freespace, Cost-231 W-I models. The analytical models overpredicted path loss. Overall, the machine learning models predicted path loss with greater accuracy than the empirical models. The SVR model performed best across all the indices with RMSE values of 1.378 dB, 1.4523 dB, 2.1568 dB in rural, suburban and urban settings respectively and should therefore be adopted for signal propagation in the investigated environments and beyond.


Introduction
In a post-pandemic world, high speed internet access is not a luxury but a necessity because many human activities have gone virtual. High quality of service among wireless subscribers is therefore the lifeline for digital inclusion in a world ravaged by the overall negative effect of the pandemic. To ensure an adequate and quality signal level for users in wireless networks, signal propagation models are useful and essential. The rapid expansion in wireless communication services because of the increase in subscribers' demand has brought path loss predictions to the front burner [1] [2]. Network planning and optimization have occupied a central place in wireless communication because of their pivotal role in signal characterizations. Path loss is the attenuation in signal strength as signals propagate from the transmitter to the receiver in wireless channels. This gradual reduction in the strength of the signal as it propagates in air is the path loss. Path loss on its own is inevitable because the mechanism for electromagnetic propagation like refraction, scattering and diffraction are diverse but can be accurately represented and characterized [3] [4].
Signal propagation models are important for interference analysis, frequency assignment estimation and network planning. The accurate characterization of signal path loss in a wireless channel will help network providers make in-depth estimation before deployment. The essentiality of propagation models for accurate path loss representation is vital for its sustainability and adaptivity [5]. Several propagation models have been developed and used for signal propagation in different frequency bands across different countries and territories. A model developed for a particular propagation environment performed optimally in that environment but failed when deployed to another environment. This has been a major setback for propagation models that were developed based on empirical measurements. The reliability of the propagation models fades when the deployment outside the initial region of development is carried out [6] [7] [8]. This is because the model parameters are a function of the specific environment. Antenna height, width of buildings, building to building distance and heights of the transmitter and receiver antenna are a function of the environment and therefore become the dominant factors in wireless signal predictions. When a developed model is put to use in other locations, correction factors are always being added to each of the model parameters so as to make them give an optimal solution [9] [10]. This practice of adding certain factors to the model parameters has not provided an accurate solution as the ultimate model after modification ends up overpredicting or underpredicting the network.
Propagation models are broadly classified into empirical and deterministic models. Empirical models are based on experimental measurements within a particular environment. These models are developed based on measurements of base station heights, frequency, distance between transmitter and receiver, building to building distance and a host of others [11] [12] [13]. These models are simple and easy to develop but do not guarantee high level accuracy because in a bid to get them simplified many components are left out. The need to reduce complexity in empirical models in many instances has necessitated the need to use just few environmental parameters in modeling. Empirical models are thus a trade-off between accuracy and simplicity.
Deterministic models are built based on the law governing electromagnetic induction. They are based on several radio wave propagation characteristics in a particular propagation environment [14] [15] [16] [17]. They produced a high level accuracy than the empirical models but introduce so many complexities into the developed model. Such complexities introduced in deterministic models and the lack of accuracy as in the case of empirical models made them not suitable for signal propagation in complex multipath propagation environments. There is need for a model that will give a high level of accuracy and also provides easy representation. The shortcomings of empirical and deterministic models have necessitated the need for path loss models with greater functionality and adaptability [18] [19] [20]. This is the introduction of machine learning algorithms for path loss predictions in wireless channels.
The introduction of machine learning algorithms to signal propagation modeling provides a robust network architecture, robust adaptibility and extensive data usability. With machine learning adoption to signal propagation, many components from the transmitter to the receiver can be modelled effectively. Machine learning algorithms will capture all the environmental parameters without necessarily introducing any complexities into the model. Thousands of data sets can be introduced in order to ensure accurate representation and adaptability. The path loss prediction problem is a supervised regression problem in which the main objective is to effectively train the data sets, such that the cross validation can produce a set of optimal weights that can guarantee an accurate model [21] [22].
Extensive dataset is needed for this, because the data will be split into training and test sets in such a way that the model can be cross validated with the test dataset after training in order to ensure the accuracy. Any path loss prediction model developed with machine learning can be effectively adopted for use in any other environment outside of the initial place of development. This is because the model is generic and can be adopted on completely different set of data with an higher accuracy. Function approximation is therefore very easy with ML algorithms by all standards and their accuracies are fore more superior than the empirical and deterministic models using any of the key performance indices. Several ML path loss prediction models have been adopted in signal propagation modeling in wireless environments.
In [23] a neural network model for path loss prediction model was developed in the ultra-wide-band frequency band. The model was trained and the optimal value of the weight was found using backpropagation. The model compared with other empirical models gave the lowest value of RMSE. Also in [24], a multi-layer perceptron neural network path loss prediction model was introduced.
The key contribution is that it examined the performances of using several hid-den layers on the overall network performance. It was discovered that increasing the number of hidden layers in the network increases the predictive accuracy but also introduced some complexities.
Differential evolution was used with artificial neural network in [25]. The model was trained with experimental data in urban environments. The model also employed gradient descent algorithm in conjunction with the backpropagation neural network. The model compared with other experimental studies in the same location gave an more accurate path loss characterization than the other models. A radial basis function path loss model was introduced in [26]. The model was compared with the performance of the multi-layer perceptron neural network and other five existing propagation models. The radial basis function predicted path loss with the highest accuracy and gave the lowest value of error in the considered environments. In [27], an artificial neural network for several frequency bands was developed with experimental data. The performance of the ANN network was examined among the different frequency bands. The ANN network gave the most accurate prediction. The other empirical models over predicted path loss and did not give an accurate signal characterization. Some other applications of machine learning algorithms and methods are given in Table 1.
Several other ML algorithms have been implemented in the literature. They ranged from both supervised and unsupervised learning algorithms with the sole Table 1. Some related works.

Reference
Year Algorithms and methods Contributions [28] 2021 Radial basis function and Multilayer perceptron Provides accurate path loss prediction using radial basis function and multilayer perceptron algorithms [29] 2022 Ensemble Machine Learning Combined several machine learning algorithms to achieve accurate prediction [30] 2018 Multiplicative calculus Multiplicative calculus gave predictions with the lowest value of RMSE as compared to the other empirical models [31] 2022 Atmospheric propagation modeling at microwave frequency Modelling of path loss with greater accuracy even at microwave frequency [32] 2022 Prognostic modeling of specific radio attenuation Thorough prognosis of and modeling of signal attenuation [33] 2021 Systems dynamic approach with some machine learning methods Machine learning methods enhanced with blockchain methods [34] 2021 Automation with machine learning algorithms Machine learning algorithms for automation of processes [35] 2022 Machine learning algorithms for EEG signals MLP algorithms, decision trees and random forest used in signal modeling Multilayer perceptron neural network for path loss prediction Accurate signal prediction and characterization objective of giving accuracy to path loss modeling. Some Machine learning algorithms have not been explored and even when they are explored, there is need to compare the performances of the different machine learning algorithms for the purpose of generalization. The contributions of this paper are specified as follows. • In other to achieve accurate path loss prediction in wireless networks, Support vector regression (SVR) and Radial basis function model were developed with extensive data sets. These machine learning based models were compared to the predictions given by empirical models.
• Several hyperparameters of the RBF model were tuned in order to achieve optimization for accurate system design of the SVR and RBF machine learning algorithms.
• The performances of the SVR and RBF machine learning were compared with five other empirical models for accurate results validation using RMSE. The remaining part of this work is organized as follows; Section 2 presents the measurement campaign scenario and data pre-processing, the methodology and model developments are given in Section 3. The results are presented in Section 4 and conclusion is contained in Section 5.

Measurement Campaign Scenario and Data Preprocessing
Extensive field measurements were collected across base stations in Lefkosia and Kyrenia both located in the Northern part of Cyprus with the aid of drive test. The five environmental features used as input to the machine learning models are elevation, clutter heights, distance between the transmitter and receiving antenna, altitude, building to building distance and the street orientation angle. The received signal strength indicator (RSSI), reference signal received power (RSRP) and the reference signal received quality (RSRQ) were all measured during the drive test which was carried out at 40 km/hr to minimize the effect of Doppler effect. For the drive test, the measurement equipment which are TEMs mobile system dongle software, GPS, 3G mobile phone are connected to a laptop and housed inside a moving vehicle. The height of the base transceiver station for the urban centers where field measurements were collected is 25 m while that of suburban centers is 35 m. The mobile antenna height used is 1.5 m. The reference distance is 100 m and data was measured spanning a transmitter-receiver distance of 3 km. As the mobile station moves away from the transmitting station, the RSRP was measured at each distance. At each distance, 10 readings were recorded and the average was taken in order to ensure data accuracy. The RSRP was then subtracted from the EIRP at each distance to determine the path loss at that point. Figure 1 provides the field measurement interface during the drive test.
The cross validation was done by dividing the dataset into two as follows. 75% of the dataset was used for training and 25% was used as the test dataset. This was done so that the model can be tested with the data outside of the training sample. The data preprocessing was done and some data normalization was carried out on the measured data. The MinMax scaling normalization was carried out on the data as given in Equation (1) min max min Path loss is computed as given in Equation (2).

( )
Path loss dB Effective Isotropic Radiated Power Received power = − (2) where, 1 ρ is the power of the transmitter in dBm, 2 ρ is the gain of the transmitting antenna, 3 ρ is the gain of the receiving antenna, 1 2 3 , , n n n represents the feeder cable loss, antenna loss, and antenna filter loss, respectively.
The figurative descriptive statistics of the input parameters to the machine models are presented in Figure 2. It also represents correlation of each of the input parameters to the output path loss.

Methodology and Model Developments
The methodology and model developments are introduced in this section. Two machine learning based models namely; support vector regression and radial basis functions were introduced. The development of this model will be adapted to path loss prediction as in a supervised learning regression problem. Training of both models will be carried out with the same datasets so as to accurately make a justifiable comparison between the performances of both models.

Support Vector Regression Path Loss Prediction Model
One of the most significant solutions of engineering problems is to be able to approximate a function with another function. Support vector regression is a generalization of the support vector machines which can be essentially used in providing accurate characterization and solution for path loss prediction problems which are typically non-linear regression problems. SVR has the ability to Open Journal of Applied Sciences transform data which are non-linear separable into a linearly separable form by taking the sets of data initially in the single-dimensional space into an higher dimensional space. After this transformation to the multidimensional space, they become linearly separable. In the high-dimensional place, the functions search for a hyperplane that enables all the datapoints to fall accurately on the hyperplane, with the samples falling on the hyperplane explicitly. The predictive accuracy of the proposed SVR model is due to its capacity to optimize the margin, absence of local minima and the sparseness of the given solution. Thus it is very suitable for function approximation problems. For where x is the input vector and represents all the five input parameters namely, elevation, clutter heights, distance between the transmitter and the receiving antenna, altitude and building to building distance.
( ) x ∅ represents the non-linear mapping function.
b stands for the bias and w is the normal vector that controls and specifies the direction of the hyperplane in the multi-dimensional space. The predicted path loss from the SVR model is given in Equation (5) For the SVR path loss regression problem, we measure the difference between the actual and the predicted value by calculating the error of approximation because we are dealing with a regression and not classification problem.
We adopt the Vapnik-Chervonenkis principle to estimate the loss function with the E sensitivity zone expressed in (7) The error function i e is the difference between the measured path loss and the predicted path loss from the SVR model and given in (8)

The Radial Basis Function Model Design
The radial basis function is essentially good for function approximation and highly suitable for solving path loss regression problems. It is a three layer feedforward network consisting of input layer, hidden layer and the output layer.
The hidden layer performs a non-linear transformation of the input parameters while the output is a linear combiner of the outputs of the hidden layer. The radial basis function is efficient because the training is faster as it has one single hidden layer and so reduces all forms of complexities. The function of each node can also be easily interpreted in the RBF network.
For the RBF network, there are five input parameters into the network are elevation, clutter heights, distance between the transmitter and the receiving antenna, altitude, and building to building distance.
The objective of the radial basis function is to obtain a function that will give the path loss close to the measured path loss.

Hyperparameter Tuning of the RBF Model
The parameters of the radial basis function were tuned in order obtain optimal parameters for modelling. The Gaussian, multiquadric, inverse multiquadric functions were examined to determine which one is best for the model. The number of centroids for the particular kernel function is varied to determine the level of convergence. The number of centroids is plotted against the mean square error. The MSE value decreases as the number of centroids increases for all the kernel function that was explored. The Gaussian kernel function also gave the best performance. The hyperparameter tuning is presented in Figure 3.   The developed machine learning models were validated using key performance metrics of root mean squared error (RMSE) and squared R. This is done to make a proper validation of the ML models and the existing analytical models.

Results and Discussion
The performances of the two machine learning models, SVR and RBF were compared with the experimental data. The performances of SUI, Cost-231 Hata, Egli, freespace, Cost-231 Walfisch Ikegami models were also examined in comparison with the measured data. The results were examined in rural, suburban and urban areas. Figure 4 gives the comparison of the measured path loss with the predicted path loss for the proposed SVR model in rural settings. It is clearly evident from the plots that the SVR model aligned well with the measured path loss. The SVR model fits the datasets accurately. Figure 5 gives the plots of the comparison of the experimental data, SVR model

Path Loss Vs Distance
Measured SVR RBF and the RBF models. The plot also shows that the SVR model performed well than the RBF model. The SVR model aligned more accurately with the measured data than the RBF model. Figure 6 presents the plots of the SVR and RBF models with the other five existing empirical models. The introduced machine learning path loss models predicted path loss more accurately than the empirical models. The five other models overpredicted path loss. The support vector regression model gave path loss closest to the experimental data. It was closely followed by the RBF model.
The performances of the seven models are presented in Table 2 using RMSE and Squared R as an error metrics. A model with the lowest value of RMSE is the one with the greatest accuracy. The SVR gives the lowest value of RMSE with a value of 1.3678 dB in the rural areas. The value of squared R closest to 1 represents high accuracy. The SVR model gave a value of squared R = 0.9786 dB better than Figure 6. Plots of measured data, SVR, RBF and five empirical models in rural settings. the other models and followed closely by the RBF model. From the RMSE and the squared R values, it shows that the empirical models predicted path loss outside of the given range and not suitable for signal propagation in the investigated environments.

Results for Areas Classified as Suburban
The results for the areas classified as suburban are presented in this section. The results obtained are similar to the results in the rural settings. Figure 7 gives the comparison of the measured path loss with the predicted path loss for the proposed SVR model in suburban areas. It is clearly evident from the plots that the SVR model aligned well with the measured path loss. The SVR model fits the datasets accurately. Figure 8 gives the plots of the comparison of the experimental data, SVR model and the RBF models in suburban areas. The plot also shows that the SVR model performed well than the RBF model. The SVR model aligned more accurately with the measured data than the RBF model. The results are also consistent with the one obtained in the rural areas. Figure 9 presents the plots of the SVR and RBF models with other five existing empirical models in suburban areas. The introduced machine learning path loss models predicted path loss more accurately than the empirical models. The five other models overpredicted path loss. The support vector regression model gave path loss closest to the experimental data. It was closely followed by the RBF model.
The performances of the seven models are presented in Table 3 using RMSE and squared R as the error metrics. A model with the lowest value of RMSE is the one with the greatest accuracy. The SVR gives the lowest value of RMSE with    a value of 1.4523 dB in the suburban areas. The value of squared R closest to one represents high accuracy. The SVR model gave a value of squared R = 0.9286 dB which is better than the other models and followed closely by the RBF model. From the RMSE and the squared R values of the plots, it showed that they predicted path loss outside of the given range. The results obtained for suburban areas were consistent with the ones obtained for rural areas. The results for rural areas were slightly better than the suburban areas because of the several NLOS situation obtained in these areas. There are so many LoS possibilities in the rural areas and so the path loss is well lower.

Results for Areas Classified as Urban
The results for the areas classified as urban are presented in this section. The results obtained are similar to the results in the rural and suburban areas. Figure 10 gives the comparison of the measured path loss with the predicted path loss for the proposed SVR model in urban settings. It is clearly evident from the plots SVR model aligned well with the measured path loss. The SVR model fits the datasets accurately for this area. Figure 11 gives the plots of the comparison of the experimental data, SVR model and the RBF models in urban areas. The plot also showed that the SVR model performed well than the RBF model. The SVR model aligned more accurately with the measured data than the RBF model. The results are also consistent with the one obtained in the rural and suburban areas. Figure 12 presents the plots of the SVR and RBF models with other five existing empirical models in urban areas. The introduced machine learning path loss models predicted path loss more accurately than the empirical models. The five other models overpredicted path loss. The support vector regression    model gave path loss closest to the experimental data. It is closely followed by the RBF model. The path loss is slightly higher in the urban areas as shown because of the several multipath components. Signal is basically propagating under NLoS conditions in urban areas. The performances of the seven models are presented in Table 4 using RMSE and squared R as error metrics. A model with the lowest value of RMSE is the  one with the greatest accuracy. The SVR gives the lowest value of RMSE with a value of 2.1568 dB in the urban areas. The value of squared R closest to one represents high accuracy. The SVR model gave a value of squared R to be 0.8528 dB, which is better than the other models and followed closely by the RBF model. From the RMSE and the squared R, it showed that the other analytical models predicted path loss outside of the given range. The results obtained for urban areas were consistent with the ones obtained for rural and suburban areas. The results for rural areas were slightly better than the urban areas because of the several NLOS situation obtained in these areas. There are so many LoS possibilities in the rural areas and so the path loss is well lower.

Conclusion
The study introduced machine learning algorithms to path loss predictions with experimental data collected in Cyprus. Support vector regression (SVR) and radial basis function (RBF) are the two models that were trained with extensive field measurements. The predictive accuracies of five other empirical/analytical models were then analysed and compared with the performances of the machine learning models. The hyperparameters of the SVR and the RBF models were tuned before modeling and the Gaussian kernel function gave the best performing results; it was subsequently adopted and used in the ML models. The validation was then carried out with the use of root-mean square error (RMSE) and propagation in these environments. The introduced ML algorithms (SVR and RBF) aligned accurately with the measured data. They gave a high predictive accuracy. This is the power of machine learning algorithms over analytical models.
The SVR gave the highest predictive accuracy because it trains the model using a symmetrical loss function and its computational complexity does not depend on the dimensions of input parameters. Conclusively, we have been able to demon-strate that machine learning models are more efficient for path loss prediction than the analytical models. Consequently, signal propagation, interference analysis, network planning and cell parameters evaluation should be carried out with these Machine learning models (SVR and RBF) because they produced higher predictive accuracies as demonstrated in this study.

Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.