TITLE:
Vanishing Gradients in an Artificial Deep Neural Network: The Impact of Activation Functions and Optimization Techniques
AUTHORS:
Baron Ntambwe
KEYWORDS:
Vanishing Gradient, Sigmoid, Optimization, ADNN, ReLU, SGD, ADAM
JOURNAL NAME:
Journal of Computer and Communications,
Vol.13 No.11,
November
25,
2025
ABSTRACT: Artificial deep neural networks (ADNNs) have become a cornerstone of modern machine learning, but they are not immune to challenges. One of the most significant problems plaguing ADNNs is the vanishing gradient problem, which hinders their ability to learn and generalize. Activation functions and optimization techniques play a crucial role in mitigating or aggravating this issue. This study compares how two activation functions (Sigmoid, ReLU) and two optimizers (SGD, Adam) affect the vanishing-gradient phenomenon in a 12-layer neural network trained on MNIST. Gradient-norm ratios, mean gradients, loss, and accuracy are tracked across 10 epochs for each of four activation-optimizer combinations. Results indicate that ReLU with Adam maintains the most stable gradient flow and achieves the highest accuracy, whereas both Sigmoid and SGD combinations exhibit severe vanishing or exploding gradients. The author concludes that activation-optimizer synergy is critical and recommends ReLU with Adam for deep networks.