TITLE:
Using Decision Tree Classification and Principal Component Analysis to Predict Ethnicity Based on Individual Characteristics: A Case Study of Assam and Bhutan Ethnicities
AUTHORS:
Tianhui Zhang, Xinyu Zhang, Xianchen Liu, Zhen Guo, Yuanhao Tian
KEYWORDS:
Decision Tree Classification, Principal Component Analysis, Anthropometric Features, Dimensionality Reduction, Machine Learning in Anthropology
JOURNAL NAME:
Journal of Software Engineering and Applications,
Vol.17 No.12,
December
12,
2024
ABSTRACT: This study investigates the use of a decision tree classification model, combined with Principal Component Analysis (PCA), to distinguish between Assam and Bhutan ethnic groups based on specific anthropometric features, including age, height, tail length, hair length, bang length, reach, and earlobe type. The dataset was reduced using PCA, which identified height, reach, and age as key features contributing to variance. However, while PCA effectively reduced dimensionality, it faced challenges in clearly distinguishing between the two ethnic groups, a limitation noted in previous research. In contrast, the decision tree model performed significantly better, establishing clear decision boundaries and achieving high classification accuracy. The decision tree consistently selected Height and Reach as the most important classifiers, a finding supported by existing studies on ethnic differences in Northeast India. The results highlight the strengths of combining PCA for dimensionality reduction with decision tree models for classification tasks. While PCA alone was insufficient for optimal class separation, its integration with decision trees improved both the model’s accuracy and interpretability. Future research could explore other machine learning models to enhance classification and examine a broader set of anthropometric features for more comprehensive ethnic group classification.