Fast Evaluation and Approximation of the Gauss-Newton Hessian Matrix for the Multilayer Perceptron

Chao Chen; Severin Reiz; Chenhan Yu; Hans-Joachim Bungartz; George Biros

Informatik 5 - Lehrstuhl für Wissenschaftliches Rechnen (Prof. Bungartz)

Back
Back to start of result list
Permanent link for displayed object

Title:: Fast Evaluation and Approximation of the Gauss-Newton Hessian Matrix for the Multilayer Perceptron
Document type:: Zeitschriftenaufsatz
Author(s):: Chao Chen; Severin Reiz; Chenhan Yu; Hans-Joachim Bungartz; George Biros
Abstract:: We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton Hessian (GNH) matrix for the multilayer perceptron. The algorithm has a precomputation step and a sampling step. While it generally requires $O(Nn)$ work to compute an entry (and the entire column) in the GNH matrix for a neural network with $N$ parameters and $n$ data points, our fast sampling algorithm reduces the cost to $O(n+d/\epsilon^2)$ work, where $d$ is the output dimension of the network and $\epsilon$ is a prescribed accuracy (independent of $N$). One application of our algorithm is constructing the hierarchical-matrix (\hmatrix{}) approximation of the GNH matrix for solving linear systems and eigenvalue problems. While it generally requires $O(N^2)$ memory and $O(N^3)$ work to store and factorize the GNH matrix, respectively. The \hmatrix{} approximation requires only $\bigO(N r_o)$ memory footprint and $\bigO(N r_o^2)$ work to be factorized, where $r_o \ll N$ is the maximum rank of off-diagonal blocks in the GNH matrix. We demonstrate the performance of our fast algorithm and the \hmatrix{} approximation on classification and autoencoder neural networks. «
We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton Hessian (GNH) matrix for the multilayer perceptron. The algorithm has a precomputation step and a sampling step. While it generally requires $O(Nn)$ work to compute an entry (and the entire column) in the GNH matrix for a neural network with $N$ parameters and $n$ data points, our fast sampling algorithm reduces the cost to $O(n+d/\epsilon^2)$ work, where $d$ is the output dimension of the network and $\epsilon$ is... »
Journal title:: SIAM Journal on Matrix Analysis and Applications
Year:: 2021
Year / month:: 2021-01
Quarter:: 1. Quartal
Month:: Jan
Covered by:: Web of Science
Reviewed:: ja
Language:: en
WWW:: SIAM
Publisher:: SIAM
Status:: Verlagsversion / published
Submitted:: 20.08.2020
TUM Institution:: Department of Informatics
BibTeX