Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter ϵ via gradient magnitude histogram analysis

Silva, G.; Rodríguez, P.

doi:https://doi.org/10.1007/s00521-024-10302-2

Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter ϵ via gradient magnitude histogram analysis

dc.contributor.affiliation	Pontificia Universidad Católica del Perú. Departmento de Ingeniería Eléctrica
dc.contributor.author	Silva, G.
dc.contributor.author	Rodríguez, P.
dc.date.accessioned	2026-03-13T16:59:04Z
dc.date.issued	2024
dc.description.abstract	Stochastic optimizers play a crucial role in the successful training of deep neural network models. To achieve optimal model performance, designers must carefully select both model and optimizer hyperparameters. However, this process is frequently demanding in terms of computational resources and processing time. While it is a well-established practice to tune the entire set of optimizer hyperparameters for peak performance, there is still a lack of clarity regarding the individual influence of hyperparameters mislabeled as “low priority”, including the safeguard factor e and decay rate e, in leading adaptive stochastic optimizers like the Adam optimizer. In this manuscript, we introduce a new framework based on the empirical probability density function of the loss’ gradient magnitude, termed as the “gradient magnitude histogram”, for a thorough analysis of adaptive stochastic optimizers and the safeguard hyperparameter e. This framework reveals and justifies valuable relationships and dependencies among hyperparameters in connection to optimal performance across diverse tasks, such as classification, language modeling and machine translation. Furthermore, we propose a novel algorithm using gradient magnitude histograms to automatically estimate a refined and accurate search space for the optimal safeguard hyperparameter e, surpassing the conventional trial-and-error methodology by establishing a worst-case search space that is two times narrower.
dc.description.sponsorship	Funding: This manuscript is supported by Consejo Nacional de Ciencia, Tecnología e Innovación Tecnológica (CONCYTEC), and Fondo Nacional de Desarrollo Científico, Tecnológico y de Innovación Tecnológica (FONDECYT), under contract No. 174-2020-FONDECYT "Doctoral Programs in Peruvian Universities", and by the Army Research Office (ARO) under Grant W911NF-22-1-0296.
dc.identifier.doi	https://doi.org/10.1007/s00521-024-10302-2
dc.identifier.uri	http://hdl.handle.net/20.500.14657/206150
dc.language.iso	eng
dc.publisher	Springer Science and Business Media Deutschland GmbH
dc.relation.ispartof	urn:issn:0941-0643
dc.rights	info:eu-repo/semantics/closedAccess
dc.source	Neural Computing and Applications; Vol. 36, Núm. 35 (2024)
dc.subject	Hyperparameter
dc.subject	Fine-tuning
dc.subject	Stochastic optimizers
dc.subject	Deep neural network
dc.subject.ocde	https://purl.org/pe-repo/ocde/ford#2.02.01
dc.title	Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter ϵ via gradient magnitude histogram analysis
dc.type	http://purl.org/coar/resource_type/c_6501
dc.type.other	Artículo
dc.type.version	https://vocabularies.coar-repositories.org/version_types/c_970fb48d4fbd8a85/

Collections

Artículos (DFI)

Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter ϵ via gradient magnitude histogram analysis

Files

Collections