Deep neural networks have become some of the most prominent models in machine
learning due to their flexibility and therefore, their broad applicability. The training of
large-scale deep neural networks requires vast computational resources. Stochastic gra-
dient descent methods still enjoy great popularity but Hessian-based optimisation tech-
niques are on the rise. While computing the second derivative of the loss function is still
computationally expensive, a possibly much faster convergence rate justifies the consid-
eration of such methods. Gradient descent is inherently sequential and cannot take full
advantage of highly parallelised computing architectures. This motivates the exploration
of second-order optimisation methods also in the context of high performance comput-
ing. This thesis aims to provide an overview of numerical challenges, their solutions and
stochastic details regarding the application of Hessian-based optimisation to the training
of large-scale deep neural networks. It lays emphasis on a strong theoretical foundation,
which is crucial for the less heuristic second-order methods. The potential of a quasi-
Newton method is showcased by outperforming gradient descent in optimisation of the
loss function corresponding to ResNet.
«
Deep neural networks have become some of the most prominent models in machine
learning due to their flexibility and therefore, their broad applicability. The training of
large-scale deep neural networks requires vast computational resources. Stochastic gra-
dient descent methods still enjoy great popularity but Hessian-based optimisation tech-
niques are on the rise. While computing the second derivative of the loss function is still
computationally expensive, a possibly much faster converg...
»