Pedamonti, D. (2018) Comparison of Non-Linear Activation Functions for Deep Neural Networks on MNIST Classification Task. arXiv. - References

Article citationsMore>>

Pedamonti, D. (2018) Comparison of Non-Linear Activation Functions for Deep Neural Networks on MNIST Classification Task. arXiv.
https://arxiv.org/abs/1804.02763

has been cited by the following article:

TITLE: Vanishing Gradients in an Artificial Deep Neural Network: The Impact of Activation Functions and Optimization Techniques

AUTHORS: Baron Ntambwe

KEYWORDS: Vanishing Gradient, Sigmoid, Optimization, ADNN, ReLU, SGD, ADAM

JOURNAL NAME: Journal of Computer and Communications, Vol.13 No.11, November 25, 2025

ABSTRACT: Artificial deep neural networks (ADNNs) have become a cornerstone of modern machine learning, but they are not immune to challenges. One of the most significant problems plaguing ADNNs is the vanishing gradient problem, which hinders their ability to learn and generalize. Activation functions and optimization techniques play a crucial role in mitigating or aggravating this issue. This study compares how two activation functions (Sigmoid, ReLU) and two optimizers (SGD, Adam) affect the vanishing-gradient phenomenon in a 12-layer neural network trained on MNIST. Gradient-norm ratios, mean gradients, loss, and accuracy are tracked across 10 epochs for each of four activation-optimizer combinations. Results indicate that ReLU with Adam maintains the most stable gradient flow and achieves the highest accuracy, whereas both Sigmoid and SGD combinations exhibit severe vanishing or exploding gradients. The author concludes that activation-optimizer synergy is critical and recommends ReLU with Adam for deep networks.

	customer@scirp.org
	+86 18163351462(WhatsApp)
	1655362766

	Paper Publishing WeChat

Journals by Subject

Publish with us

Article citationsMore>>

Home

About SCIRP

Service

Policies