TITLE:
Classification with Convolutional Neural Networks in MapReduce
AUTHORS:
Min Chen
KEYWORDS:
Distributed System, Image Classification, CNNs, MapReduce, Overfitting
JOURNAL NAME:
Journal of Computer and Communications,
Vol.12 No.8,
August
29,
2024
ABSTRACT: Deep learning (DL) techniques, more specifically Convolutional Neural Networks (CNNs), have become increasingly popular in advancing the field of data science and have had great successes in a wide array of applications including computer vision, speech, natural language processing, etc. However, the training process of CNNs is computationally intensive and has high computational cost, especially when the dataset is huge. To overcome these obstacles, this paper takes advantage of distributed frameworks and cloud computing to develop a parallel CNN algorithm. MapReduce is a scalable and fault-tolerant data processing tool that was developed to provide significant improvements in large-scale data-intensive applications in clusters. A MapReduce-based CNN (MCNN) is developed in this work to tackle the task of image classification. In addition, the proposed MCNN adopted the idea of adding dropout layers in the networks to tackle the overfitting problem. Close examination of the implementation of MCNN as well as how the proposed algorithm accelerates learning are discussed and demonstrated through experiments. Results reveal high classification accuracy and significant improvements in speedup, scaleup and sizeup compared to the standard algorithms.