Performance Analysis of Support Vector Machine (SVM) on Challenging Datasets for Forest Fire Detection

This article delves into the analysis of performance and utilization of Support Vector Machines (SVMs) for the critical task of forest fire detection using image datasets. With the increasing threat of forest fires to ecosystems and human settlements, the need for rapid and accurate detection systems is of utmost importance. SVMs, renowned for their strong classification capabilities, exhibit proficiency in recognizing patterns associated with fire within images. By training on labeled data, SVMs acquire the ability to identify distinctive attributes associated with fire, such as flames, smoke, or alterations in the visual characteristics of the forest area. The document thoroughly examines the use of SVMs, covering crucial elements like data preprocessing, feature extraction, and model training. It rigorously evaluates parameters such as accuracy, efficiency, and practical applicability. The knowledge gained from this study aids in the development of efficient forest fire detection systems, enabling prompt responses and improving disaster management. Moreover, the correlation between SVM accuracy and the difficulties presented by high-dimensional datasets is carefully investigated, demonstrated through a revealing case study. The relationship between accuracy scores and the different resolutions used for resizing the training datasets has also been discussed in this article. These comprehensive studies result in a definitive overview of the difficulties faced and the potential sectors requiring further improvement and focus.


Introduction
Support Vector Machines (SVMs) represent a powerful class of supervised machine learning algorithms renowned for their versatility and effectiveness in solving a wide range of classification and regression tasks.Introduced by Vladimir Vapnik and his colleagues in the 1960s, SVMs have since become a cornerstone of modern machine learning.
At their core, SVMs excel in finding optimal decision boundaries that separate data points belonging to different classes while maximizing the margin, or distance, between these boundaries.This unique characteristic allows SVMs to perform exceptionally well in scenarios where data may be complex, high-dimensional, or not linearly separable.Moreover, SVMs are known for their ability to generalize from training data to new, unseen examples, which makes them valuable tools for both classification and regression problems.We can see a similar work in [1] and [2].A comparative analysis of forest fire detection can be seen in [3].
High-dimensional datasets refer to datasets where each data point has a large number of features or dimensions.A dataset can be represented as a matrix, where each row corresponds to a data point, and each column corresponds to a feature.High-dimensional datasets have a large number of columns or dimensions, making them challenging to visualize and analyze.Collecting, storing, and processing this extensive information may not contribute additional advantages to optimal decisionmaking; instead, it could potentially complicate matters and incur excessive costs.In [4], Ghaddar et al. address the problem of feature selection within SVM classification that deals with finding an accurate binary classifier that uses a minimal number of features available in the high-dimensional datasets.
SVMs have found applications in diverse fields, including image classification, text categorization, biological sciences, finance, and more.Their adaptability, robustness, and capacity to handle large datasets make them a preferred choice for many researchers and practitioners in the domain of machine learning.

Optimization in Machine Learning
Machine learning tasks often follow a common structure: we start with a dataset containing pairs of input and output, like {(x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n )}.The goal is to find a function, denoted as f θ , where θ represents a set of parameters from a predefined set P .We want this function to minimize a loss function, which measures the difference between the predicted output f θ (x) and the actual output y.
So, in a formal sense, machine learning tasks can be boiled down to two optimization problems: 1. Optimization in the space of possible functions M , where we seek to minimize the loss: 2. Optimization in the space of parameters P , where we aim to minimize the loss by adjusting θ: To prevent overfitting and enhance the model's ability to generalize to new data, we may include regularization terms in the loss function.This leads to a modified optimization problem: Here, R(θ) represents the regularization term, and λ is a hyperparameter controlling the strength of regularization.The details can be found in [5] and [6].

Kernel Used For Optimization
The kernel method represents a prominent machine learning technique, specifically designed to address nonlinear classification and regression problems.In numerous practical scenarios, the association between input variables and the target variable does not stick to linear patterns.In such instances, conventional linear models like linear regression or logistic regression may exhibit suboptimal performance.(In [7], Schölkopf et al. provides an introduction to SVMs and related kernel methods including the latest research in this area.) The kernel method provides a remedy by facilitating the transformation of input variables into a higher-dimensional feature space.Within this feature space, it becomes possible to establish linear relationships between the input variables and the target variable.This transformation is facilitated through the utilization of a kernel function, which is a mathematical function designed to quantify the similarity between pairs of input data points.
By projecting the input data into this higher-dimensional feature space, the kernel method is capable of capturing intricate and nonlinear relationships between the input variables and the target variable.It also offers the versatility to complement various machine learning algorithms, including support vector machines (SVMs), ridge regression, and principal component analysis (PCA).
From a mathematical perspective, the kernel method involves the mapping of input data points, represented as vectors denoted as x within a d-dimensional input space, into a higher-dimensional feature space denoted as F , achieved through the application of a kernel function labeled as K.
This kernel function operates by taking a pair of input vectors, namely x and x ′ , and subsequently computing the dot product of their corresponding feature representations within F .Formally, the kernel function can be articulated as follows: K(x, x ′ ) = ⟨ϕ(x), ϕ(x ′ )⟩, where ϕ(x) and ϕ(x ′ ) respectively denote the feature representations of x and x ′ within the feature space F .
The selection of an appropriate kernel function is a pivotal decision in machine learning, contingent on the specific problem at hand and the unique characteristics of the input data.Several common kernel functions serve distinct purposes: • Linear Kernel: This kernel, denoted as K(x, x ′ ) = x T x ′ , assumes a linear relationship between data points.It is suitable when the underlying data patterns follow linear trends.
• Polynomial Kernel: The polynomial kernel, defined as K(x, x ′ ) = (x T x ′ + c) d , introduces nonlinearity by elevating the dot product of data points to a certain power (d), while c represents a constant.This kernel is versatile and can effectively capture more intricate relationships within the data.
• Gaussian (RBF) Kernel: The Gaussian or Radial Basis Function (RBF) kernel, expressed , relies on the Euclidean distance between data points.The parameter σ governs the width of the Gaussian function, enabling adaptation to data with varying scales.It excels at capturing intricate patterns in the data.
• Sigmoid Kernel The sigmoid kernel in machine learning is a similarity measure used for classification tasks.It is defined as K(x, x ′ ) = tanh(αx T x ′ + β), where α and β are hyperparameters controlling the kernel's shape.It captures non-linear relationships between data points and is particularly useful when data has complex, sigmoid-shaped decision boundaries.
Once the input data has been transformed into the higher-dimensional feature space F using a selected kernel function, linear algorithms such as Support Vector Machines (SVMs) or ridge regression can be applied for classification or regression tasks.Within this transformed feature space, the decision boundary is represented as a hyperplane, often corresponding to a nonlinear decision boundary when projected back into the original input space.This approach empowers the handling of complex, nonlinear relationships present in the data.
In practical machine learning applications, the feature representations denoted as ϕ(x) within the higher-dimensional feature space F are frequently not explicitly calculated.This is made possible by the kernel function, which enables the computation of dot products between data points without the need to construct and store the feature vectors themselves.This remarkable technique, often referred to as the "kernel trick", bestows upon the kernel method the computational efficiency necessary to operate effectively within high-dimensional feature spaces.As a result, it simplifies the procedure of managing intricate data alterations while preserving computational feasibility.The details can be found in [5].

Data Availability
One of the dataset employed for this study is conveniently accessible in Dataset for Forest Fire Detection in Mendeley Data.It is provided as a compressed file "Dataset.rar".This archive contains two essential components: the training dataset and the test dataset.These datasets consist of images, each having a resolution of 250×250 pixels.
The other dataset we used for checking the efficiency of our model is accessible from images.cv.
In this dataset all images are of 256×256 pixels and all belong to the category of forest fire.They have been curated to focus specifically on imagery related to forest fires.

Aim
The primary focus of this report is to thoroughly analyze the performance of Support Vector Machines (SVMs) in the context of forest fire detection, with a particular emphasis on challenging datasets characterized by high dimensionality and limited samples.A unified theory for general class of nonconvex penalized SVMs in the high-dimensional setting can be found in [8].In our article, the overarching objective is to contribute to the development of a model for enhancing the efficiency of forest fire detection using image data.
Specifically, our aim is to examine the performance of SVMs under challenging conditions, understanding how the model responds to high-dimensional datasets with sparse samples.The goal is to gain insights into the variations in performance and to identify the factors that influence the model's accuracy in detection.Take a look at [9] for a quick idea in the field of machine learning in Forest Fire Detection.An empirical study on the viability of SVM in the context of feature selection from moderately and highly unbalanced datasets can be seen in [10].
Through this analysis, we strive to acquire a thorough comprehension of the model's reactions under various circumstances.This understanding is crucial in pinpointing opportunities for growth and enhancement that positively influence the model, ultimately resulting in improved precision in forest fire detection.

Implementation
Our dataset consists of labeled images categorized as "fire" and "no-fire."The primary objective of this study was to employ predictive methods to determine whether unseen test data belonged to either the "fire" or "no-fire" category.Given the binary nature of this classification task, several techniques were at our disposal.Ultimately, we opted to train a Support Vector Machine (SVM) model for this purpose.This SVM model was carefully trained to classify images and predict the occurrence of forest fires based on the visual content of the images.This choice was made after careful consideration of its suitability for binary classification tasks involving image data.While there were options to employ diverse dimension reduction methods for handling high-dimensional datasets, our focus was to assess the performance of the SVM model under these conditions.Consequently, we refrained from engaging in dimension reduction techniques.However, dimensionality reduction techniques such as Principal Component Analysis, Linear Discriminant Analysis, Independent Component Analysis, Canonical Correlation Analysis, Fisher's Linear Discriminant, Topic Models and Latent Dirichlet Allocation, etc. could have been used to increase the efficiency of the SVM model.

Procedure
Our dataset presents a challenge in which the number of data samples is notably low in comparison to the multitude of attributes considered.Specifically, we have utilized the pixel values of images as these attributes.To address this issue, we undertook a series of preprocessing steps to enhance the dataset's suitability for analysis.It is important to note that the resizing and data augmentation procedures were exclusively applied to the training datasets.The two test datasets remained unaltered and were treated as unseen data during our model evaluation, thereby ensuring a robust assessment of model generalization and performance.Our model was quite simple as we are only interested in the performance.Some prediction related works can be seen as in [11] and [12].

Results & Observations
In the thorough assessment of our models carried out on both balanced and unbalanced datasets, it is apparent that specific classifiers display different levels of performance, providing valuable understanding into their effectiveness within the scope of the given classification task.This examination aims to clarify the relative advantages and disadvantages of these classifiers, thereby giving a comprehensive comprehension of their performance.
First and foremost, the Sigmoid Kernel Support Vector Machine (SVM) has emerged as the weakest contender within the spectrum of classifiers examined.Its performance was observed to be suboptimal, failing to meet the standards set by other classifiers.Significantly, when compared to the Logistic Regression model, the Sigmoid Kernel SVM demonstrated a noticeably poorer performance, thereby highlighting its inappropriateness for the specific classification task being examined.It is evident that this particular SVM variant struggled to recognize and categorize patterns within the data effectively, leading to a comparatively higher rate of misclassification.
In stark contrast, the Polynomial Kernel SVM showcased a more promising performance trajectory.In direct comparison to the Logistic Regression model, the Polynomial Kernel SVM managed to outperform the latter.This indicates that the former possesses a certain degree of resilience and robustness in handling the complications of the dataset.It is worth noting that the Polynomial Kernel SVM, with its capacity to model complex, nonlinear relationships, demonstrated an inherent advantage over the logistic regression model, which tends to assume linearity in its decision boundaries.
Further refinement in the classification outcomes was observed with the Gaussian Kernel SVM.
This specific variant of the Support Vector Machine displayed superior performance when compared to all the classifiers discussed in this evaluation.Its enhanced effectiveness in identifying and categorizing instances can be credited to the Gaussian kernel's capacity to grasp complex patterns and nonlinearity, which might be present in the dataset.The Gaussian Kernel SVM, therefore, presents itself as a formidable choice when intricate and nonlinear relationships are inherent in the data.
In short, this detailed analysis has illuminated the varying degrees of success among the classifiers employed, providing critical insights into their relative performance.The Sigmoid Kernel SVM, owing to its subpar performance, is evidently ill-suited for the classification task at hand.Conversely, the Polynomial Kernel SVM has proven to be a more adept choice when compared to logistic regression, thanks to its ability to model complex relationships.Finally, the Gaussian Kernel SVM has emerged as the most proficient classifier, particularly when dealing with datasets replete with intricate and nonlinear patterns, establishing its superiority among the classifiers considered in this assessment.These findings provide valuable guidance for making informed decisions about the choice of classifier in future endeavors, ensuring optimal performance in classification tasks.We can increase the efficiency of the SVM model by using methods as in [13] but our main focus here is to analyze the performance.
The data presented in Tables 1 to 8 comprises key performance metrics, including Resolution    positive and negative instances within the dataset.The accuracy score for a given confusion matrix is defined as: We shall now proceed to conduct a detailed examination of performance, specifically in relation to its connection with the resolution size.This investigation will enable us to gain a comprehensive understanding of how various performance metrics are influenced by changes in the resolution size.
In Figure 3, we have presented a series of plots illustrating the interplay between Accuracy scores and Resolution Size.
The data indicates a consistent trend of increasing accuracy as the resolution size expands, especially for the Logistic Regression model and SVMs using both Polynomial and Gaussian Kernels.
Importantly, this gradual increase in accuracy aligns with larger resolution settings, implying that more detailed data representation leads to improved model performance.
However, an interesting deviation is observed when looking at the SVM with the Sigmoid Kernel.
At a resolution size of 200, there is a notable and somewhat perplexing drop in accuracy.This dip in accuracy is, however, rectified when the resolution is set to 250.Such fluctuations in accuracy can be attributed to several factors, often dependent on the specifics of the training and test datasets used.It is crucial to note that the dataset in question, being relatively small, is prone to these changes.While this occurrence is not rare, it poses a considerable challenge in the quest to develop robust classification models.Therefore, the importance of addressing and reducing such accuracy fluctuations, especially at critical resolution settings, is emphasized, as it is crucial for the creation of dependable classification models.
Another significant observation relates to how our models perform when used on unbalanced datasets.In this case, it is clear that the accuracy scores generally maintain a higher average The reason for this occurrence is rooted in the inherent differences between balanced and unbalanced datasets, such as changes in statistical measures like variance, skewness, and other pertinent factors.These differences can significantly affect the performance results of classification models.
The ability of the models to handle the inherent imbalances in the context of the unbalanced dataset is crucial in determining their accuracy scores.
We also conducted an analysis of the True Positive Rate (TPR) and False Positive Rate (FPR) values for both balanced and unbalanced datasets.These values can be obtained from Tables 9 to 12 using the relations and T P R = T P T P + F N .The results for TPR and FPR values are on Tables 9 to 12.
We can see the plot of ROC curve for the balanced dataset in Figure 4. We can observe that the curves consistently exhibit a high True Positive Rate (TPR) even as the False Positive Rate (FPR) further supports these findings, with a value close to 1, affirming the model's superior discriminatory power.The details of ROC curves and their significance can be seen in [15].These results collectively highlight the effectiveness of the classification model in striking a robust balance between sensitivity and specificity, thus validating its appropriateness for forest fire detection.
In summary, our findings highlight the critical importance of considering dataset balance and related statistical attributes when developing and evaluating classification models.The close connection between model performance and dataset attributes emphasizes the necessity for a detailed and customized strategy in tackling the complexities of real-world classification tasks.
on refining model efficiency under challenging conditions, particularly with datasets characterized by high dimensionality and limited samples.Further exploration could involve optimizing feature extraction methods, investigating advanced SVM kernel functions, and incorporating ensemble techniques for enhanced predictive accuracy.Additionally, attention should be given to exploring the scalability of the model to larger datasets and evaluating its robustness in diverse environmental contexts.

Figure 1 :
Figure 1: Images of the upper and the lower row belong to the training and testing datasets respectively of "Dataset.rar"with: (a),(b),(c) Elements of the training dataset with fire (d),(e),(f) Elements of the training dataset with no fire (g),(h),(i) Elements of the test dataset with fire (j),(k),(l) Elements of the test dataset with no fire.

Figure 2 :
Figure 2: Images of the upper and the lower row belong to the training and testing datasets of images.cvrespectively.

First, we performed
resizing of the images to various resolutions, including: 10 × 10, 20 × 20, 30 × 30, 40 × 40, 50 × 50, 60 × 60, 70 × 70, 80 × 80, 90 × 90, 100 × 100, 150 × 150, 200 × 200, 250 × 250.This resizing was performed exclusively on the training datasets to better comprehend the relationship between the quantity of samples and the number of attributes.Furthermore, in our desire for increasing the number of data samples, we applied data augmentation techniques like flip and median blur which resulted in increase of 4 times the samples we haveoriginal samples, flipped version of original samples and median blurred samples of both original and flipped samples.These augmentations were essential in the development of a more robust dataset for subsequent analysis.Subsequently, we explored classification methodologies, utilizing Support Vector Machines (SVM) with polynomial, sigmoid, and Gaussian kernels.Additionally, we employed 4−Folds Cross Validation and the Grid Search algorithm on the various resized image datasets to assess their classification performance.Furthermore, we conducted a comparative study of the SVM models under these challenging conditions by applying Logistic Regression to the same image datasets.The samples were taken as input of SVM as vectors of values of red, green and blue values of all the pixels of the images.The SVM model is then run on various values of parameters for kernels and among them the best model is chosen for observations.
assessment, is delineated by four fundamental parameters: TP (True Positive), FP (False Positive), TN (True Negative), and FN (False Negative).Here in the tables provided Confusion Matrix is as [[TP, FP], [FN, TN]] Structurally, the confusion matrix is represented as a 2x2 matrix with the following format: In this representation, 'TP' signifies the count of True Positives, 'FP' indicates the count of False Positives, 'TN' denotes the count of True Negatives, and 'FN' enumerates the count of False Negatives.These values are essential in gauging the accuracy and efficacy of the classification models under consideration, providing valuable insights into their performance in distinguishing between

Figure 3 :
Figure 3: Plots Representing Accuracy Score (vertical axis) and Resolution Size (horizontal axis) for (a) Logistic Regression, (b) SVM with Sigmoid Kernel, (c) SVM with Polynomial Kernel, (d) SVM with Gaussian Kernel

Figure 4 :
Figure 4: Plots representing ROC curve on Balanced Dataset for (a) Logistic Regression, (b) SVM with Sigmoid Kernel, (c) SVM with Polynomial Kernel, (d) SVM with Gaussian Kernel

Table 1 :
Logistic Regression on Balanced Dataset

Table 2 :
Logistic Regression on Unbalanced Dataset

Table 3 :
Sigmoid Kernel SVM on Balanced Dataset

Table 4 :
Sigmoid Kernel SVM on Unbalanced Dataset Size, Accuracy Score, and the Confusion Matrix.The Confusion Matrix, a pivotal component of this

Table 5 :
Polynomial Kernel SVM for Balanced Dataset

Table 6 :
Polynomial Kernel SVM for Unbalanced Dataset

Table 7 :
Gaussian Kernel SVM for Balanced Dataset

Table 8 :
Gaussian Kernel SVM for Unbalanced Dataset

Table 9 :
Logistic Regression on Balanced Dataset

Table 10 :
Sigmoid Kernel SVM on Balanced Dataset actual positive instances correctly identified by a classification model.It is calculated as the ratio of true positives to the sum of true positives and false negatives.False Positive Rate (FPR): FPR measures the proportion of actual negative instances incor-

Table 11 :
Polynomial Kernel SVM for Balanced Dataset

Table 12 :
Gaussian Kernel SVM for Balanced Dataset ROC is a graphical representation of a model's performance across various discrimination thresholds.It plots the True Positive Rate against the False Positive Rate, providing insights into the trade-off between sensitivity and specificity.Area Under the ROC Curve (AUC): AUC quantifies the overall performance of a classification model by calculating the area under the ROC curve.It ranges from 0 to 1, with higher values indicating better discriminatory ability.AUC is a common metric for evaluating the effectiveness of binary classification models.