TITLE:
Multimodal AI Application for Vietnamese Digital Learning Material Classification
AUTHORS:
Giang Ma, Quoc Nguyen, Hai Tran
KEYWORDS:
Multimodal Artificial Intelligence, Digital Learning Material Classification, Deep Learning, Transformer, Digital Transformation of Education
JOURNAL NAME:
World Journal of Engineering and Technology,
Vol.14 No.1,
December
31,
2025
ABSTRACT: This study proposes a multimodal AI model for classifying Vietnamese digital learning materials by integrating three key information sources: text content, image and graphic features, and document layout structures. The model is designed with a dual-branch architecture in which Vietnamese transformer models (BERT, PhoBERT) process textual information, while convolutional neural networks extract visual features from document images. A hybrid fusion mechanism combines multimodal representations at both intermediate and prediction levels to enhance the robustness of the classification process. Based on theoretical foundations and evidence from international multimodal research, this model is expected to outperform single-modal approaches, particularly when applied to visually complex learning materials such as slides, diagrams, and documents with diverse layouts. The proposed framework contributes conceptually to the development of multimodal learning material classification tailored to Vietnamese characteristics and offers potential practical value for automating classification, improving search and recommendation functions, and supporting digital transformation in Vietnam’s higher education context.