Training Deep Convolutional Neural Networks on the GPU Using a Second-Order Optimizer

Mihai Zorca

If you experience problems opening the document, please try this link.

Document type:: Bachelorarbeit
Author(s):: Mihai Zorca
E-mail address:: mihai@zorca.de
Title:: Training Deep Convolutional Neural Networks on the GPU Using a Second-Order Optimizer
Abstract:: Deep Convolutional Neural Networks (CNNs) are a prominent class of powerful and flexible machine learning models. Training such networks requires vast compute resources: due to the large amount of training data and due to the many training it- erations. To speed up learning, many specialized algorithms have been developed. First-order methods (using just the gradient) are the most popular, but second-order algorithms (using Hessian information) are gaining importance. In this thesis we give an overview over the most common first-order optimizers and how they are used to train networks. Then we build upon a sample second-order algorithm which we call EHNewton (Efficient Hessian Newton). We have integrated this into the TensorFlow platform, such that the new method can act as a drop-in replacement to standard algo- rithms. We make use of this, by training one CNN model each out of the (1) Inception, (2) ResNet and (3) MobileNet architectures. Due to technical limitations we limit this study to last layer training on a single NVIDIA Titan GPU. EHNewton shows speed-up and accuracy benefits compared to first-order training algorithms on all three CNNs using the ImageNet database. «
Deep Convolutional Neural Networks (CNNs) are a prominent class of powerful and flexible machine learning models. Training such networks requires vast compute resources: due to the large amount of training data and due to the many training it- erations. To speed up learning, many specialized algorithms have been developed. First-order methods (using just the gradient) are the most popular, but second-order algorithms (using Hessian information) are gaining importance. In this thesis we give... »
Supervisor:: Hans-Joachim Bungartz
Advisor:: Severin Reiz
Year:: 2020
Quarter:: 3. Quartal
Year / month:: 2020-07
Month:: Jul
Pages:: 54
Language:: en
University:: TUM
BibTeX

Occurrences:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)2020