Deep Convolutional Neural Networks (CNNs) are a prominent class of powerful
and flexible machine learning models. Training such networks requires vast compute
resources: due to the large amount of training data and due to the many training it-
erations. To speed up learning, many specialized algorithms have been developed.
First-order methods (using just the gradient) are the most popular, but second-order
algorithms (using Hessian information) are gaining importance. In this thesis we give
an overview over the most common first-order optimizers and how they are used to
train networks. Then we build upon a sample second-order algorithm which we call
EHNewton (Efficient Hessian Newton). We have integrated this into the TensorFlow
platform, such that the new method can act as a drop-in replacement to standard algo-
rithms. We make use of this, by training one CNN model each out of the (1) Inception,
(2) ResNet and (3) MobileNet architectures. Due to technical limitations we limit this
study to last layer training on a single NVIDIA Titan GPU.
EHNewton shows speed-up and accuracy benefits compared to first-order training
algorithms on all three CNNs using the ImageNet database.
«
Deep Convolutional Neural Networks (CNNs) are a prominent class of powerful
and flexible machine learning models. Training such networks requires vast compute
resources: due to the large amount of training data and due to the many training it-
erations. To speed up learning, many specialized algorithms have been developed.
First-order methods (using just the gradient) are the most popular, but second-order
algorithms (using Hessian information) are gaining importance. In this thesis we give...
»