Application of second-order optimisation for large-scale deep learning

Julian Suk

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Dokumenttyp:: Masterarbeit
Autor(en):: Julian Suk
eMail-Adresse:: j.suk@gmx.de
Titel:: Application of second-order optimisation for large-scale deep learning
Abstract:: Deep neural networks have become some of the most prominent models in machine learning due to their flexibility and therefore, their broad applicability. The training of large-scale deep neural networks requires vast computational resources. Stochastic gra- dient descent methods still enjoy great popularity but Hessian-based optimisation tech- niques are on the rise. While computing the second derivative of the loss function is still computationally expensive, a possibly much faster convergence rate justifies the consid- eration of such methods. Gradient descent is inherently sequential and cannot take full advantage of highly parallelised computing architectures. This motivates the exploration of second-order optimisation methods also in the context of high performance comput- ing. This thesis aims to provide an overview of numerical challenges, their solutions and stochastic details regarding the application of Hessian-based optimisation to the training of large-scale deep neural networks. It lays emphasis on a strong theoretical foundation, which is crucial for the less heuristic second-order methods. The potential of a quasi- Newton method is showcased by outperforming gradient descent in optimisation of the loss function corresponding to ResNet. «
Deep neural networks have become some of the most prominent models in machine learning due to their flexibility and therefore, their broad applicability. The training of large-scale deep neural networks requires vast computational resources. Stochastic gra- dient descent methods still enjoy great popularity but Hessian-based optimisation tech- niques are on the rise. While computing the second derivative of the loss function is still computationally expensive, a possibly much faster converg... »
Aufgabensteller:: Hans-Joachim Bungartz
Betreuer:: Severin Reiz
Jahr:: 2020
Quartal:: 2. Quartal
Jahr / Monat:: 2020-05
Monat:: May
Seiten/Umfang:: 99
Sprache:: en
Hochschule / Universität:: TUM
BibTeX

Vorkommen:

mediaTUM Gesamtbestand Einrichtungen Schools TUM School of Computation, Information and Technology Departments Computer Science Informatik 5 - Lehrstuhl für Scientific Computing (Prof. Bungartz)2020