Journal of Computer and Communications

Volume 11, Issue 2 (February 2023)

ISSN Print: 2327-5219   ISSN Online: 2327-5227

Google-based Impact Factor: 1.12  Citations  

Supervised Learning Algorithm on Unstructured Documents for the Classification of Job Offers: Case of Cameroun

HTML  XML Download Download as PDF (Size: 1241KB)  PP. 75-88  
DOI: 10.4236/jcc.2023.112006    67 Downloads   352 Views  

ABSTRACT

Nowadays, in data science, supervised learning algorithms are frequently used to perform text classification. However, African textual data, in general, have been studied very little using these methods. This article notes the particularity of the data and measures the level of precision of predictions of naive Bayes algorithms, decision tree, and SVM (Support Vector Machine) on a corpus of computer jobs taken on the internet. This is due to the data imbalance problem in machine learning. However, this problem essentially focuses on the distribution of the number of documents in each class or subclass. Here, we delve deeper into the problem to the word count distribution in a set of documents. The results are compared with those obtained on a set of French IT offers. It appears that the precision of the classification varies between 88% and 90% for French offers against 67%, at most, for Cameroonian offers. The contribution of this study is twofold. Indeed, it clearly shows that, in a similar job category, job offers on the internet in Cameroon are more unstructured compared to those available in France, for example. Moreover, it makes it possible to emit a strong hypothesis according to which sets of texts having a symmetrical distribution of the number of words obtain better results with supervised learning algorithms.

Share and Cite:

Makembe, F. , Etoundi, R. and Tapamo, H. (2023) Supervised Learning Algorithm on Unstructured Documents for the Classification of Job Offers: Case of Cameroun. Journal of Computer and Communications, 11, 75-88. doi: 10.4236/jcc.2023.112006.

Cited by

No relevant information.

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.