Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs

Severin Reiz; Tobias Neckel; Hans-Joachim Bungartz

Titel:: Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs
Dokumenttyp:: Zeitschriftenaufsatz
Autor(en):: Severin Reiz; Tobias Neckel; Hans-Joachim Bungartz
Abstract:: Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order optimization methods with fewer hyperparameters for large-scale neural networks and (2) to compare optimizers for specific tasks to suggest users the best one for their problem. We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only and avoids the huge cost of explicitly setting up the Hessian for large-scale networks. We compare the proposed second-order method with two state-of-the-art optimizers on five representative neural network problems, including regression and very deep networks from computer vision or variational autoencoders. For the largest setup, we efficiently parallelized the optimizers with Horovod and applied it to a 8 GPU NVIDIA A100 (DGX-1) machine with 80% parallel efficiency. «
Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order optimization methods with fewer hyperparameters for large-scale neural networks and (2) to compare optimizers for specific tasks to suggest users the best one for their problem. We introduce a novel second-order optimization method that requires the effect of the... »
Stichworte:: Numerical methods Machine learning; Deep learning; Second-order optimization; Data-parallelism
Kongresstitel:: 14th International Conference on Parallel Processing and Applied Mathematics
Zeitschriftentitel:: In Proceedings of the 14th International Conference on Parallel Processing and Applied Mathematics
Jahr:: 2022
Jahr / Monat:: 2022-09
Quartal:: 3. Quartal
Monat:: Sep
Seitenangaben Beitrag:: 13
Reviewed:: ja
Sprache:: en
WWW:: Springer Link
Eingereicht (bei Zeitschrift):: 10.05.2022
Angenommen (von Zeitschrift):: 09.09.2022
Publikationsdatum:: 28.04.2023
Semester:: WS 22-23
TUM Einrichtung:: Department of Informatics
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Hochschulbibliographie 2022 Schools und Fakultäten Informatik Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)New folder