TITLE:
Comparative Analysis of Machine Learning Algorithms for Optimal Land Use and Land Cover Classification: Guiding Method Selection for Resource-Limited Settings in Tiaty, Baringo County, Kenya
AUTHORS:
John Kapoi Kipterer, Mark K. Boitt, Charles N. Mundia
KEYWORDS:
Support Vector Machine, Random Forest, Classification and Regression Trees. Gradient Boosting Trees, Naïve Bayes, Semiarid, Weighted F-1 Score, Land Use and Land Cover
JOURNAL NAME:
Journal of Geoscience and Environment Protection,
Vol.13 No.4,
April
23,
2025
ABSTRACT: Arid and semiarid regions face challenges such as bushland encroachment and agricultural expansion, especially in Tiaty, Baringo, Kenya. These issues create mixed opportunities for pastoral and agro-pastoral livelihoods. Machine learning methods for land use and land cover (LULC) classification are vital for monitoring environmental changes. Remote sensing advancements increase the potential for classifying land cover, which requires assessing algorithm accuracy and efficiency for fragile environments. This research identifies the best algorithms for LULC monitoring and developing adaptive methods for sensitive ecosystems. Landsat-9 imagery from January to April 2023 facilitated land use class identification. Preprocessing in the Google Earth Engine applied spectral indices such as the NDVI, NDWI, BSI, and NDBI. Supervised classification uses random forest (RF), support vector machine (SVM), classification and regression trees (CARTs), gradient boosting trees (GBTs), and naïve Bayes. An accuracy assessment was used to determine the optimal classifiers for future land use analyses. The evaluation revealed that the RF model achieved 84.4% accuracy with a 0.85 weighted F1 score, indicating its effectiveness for complex LULC data. In contrast, the GBT and CART methods yielded moderate F1 scores (0.77 and 0.68), indicating the presence of overclassification and class imbalance issues. The SVM and naïve Bayes methods were less accurate, rendering them unsuitable for LULC tasks. RF is optimal for monitoring and planning land use in dynamic arid areas. Future research should explore hybrid methods and diversify training sites to improve performance.