We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton
Hessian (GNH) matrix for the multilayer perceptron. The algorithm has a
precomputation step and a sampling step. While it generally requires $O(Nn)$
work to compute an entry (and the entire column) in the GNH matrix for a neural
network with $N$ parameters and $n$ data points, our fast sampling algorithm
reduces the cost to $O(n+d/\epsilon^2)$ work, where $d$ is the output dimension
of the network and $\epsilon$ is a prescribed accuracy (independent of $N$).
One application of our algorithm is constructing the hierarchical-matrix
(\hmatrix{}) approximation of the GNH matrix for solving linear systems and
eigenvalue problems. While it generally requires $O(N^2)$ memory and $O(N^3)$
work to store and factorize the GNH matrix, respectively. The \hmatrix{}
approximation requires only $\bigO(N r_o)$ memory footprint and $\bigO(N
r_o^2)$ work to be factorized, where $r_o \ll N$ is the maximum rank of
off-diagonal blocks in the GNH matrix. We demonstrate the performance of our
fast algorithm and the \hmatrix{} approximation on classification and
autoencoder neural networks.
«
We introduce a fast algorithm for entry-wise evaluation of the Gauss-Newton
Hessian (GNH) matrix for the multilayer perceptron. The algorithm has a
precomputation step and a sampling step. While it generally requires $O(Nn)$
work to compute an entry (and the entire column) in the GNH matrix for a neural
network with $N$ parameters and $n$ data points, our fast sampling algorithm
reduces the cost to $O(n+d/\epsilon^2)$ work, where $d$ is the output dimension
of the network and $\epsilon$ is...
»