Nowadays, Deep Neural Networks models are at the peak of their popularity and find
applications in a variety fields, e.g. in translation engines, where Natural Language Processing
is used. Training such networks requires enormous computing resources and can take up to
2 weeks, and most often uses rather naive first-order optimization algorithms. Given the fact
that modern deep neural networks have several millions of parameters, second-order methods
have long been considered unfeasible because of their quadratic complexity of network size.
Previous studies have shown that combining Newton’s method with Conjugate Gradients
method and Fast Exact Multiplication (short: Newton-CG) leads to speed-up and accuracy
benefits in areas such as image classification and neural machine translation. In our work we
tried to determine whether the use of different learning rate schedulers can help to develop
these benefits, eliminating some of the drawbacks of standard Newton-CG.
The Newton-CG with learn-rate scheduler allows for bigger initial learning rates, while
still being stable close to minimum, and thus, faster training.
«
Nowadays, Deep Neural Networks models are at the peak of their popularity and find
applications in a variety fields, e.g. in translation engines, where Natural Language Processing
is used. Training such networks requires enormous computing resources and can take up to
2 weeks, and most often uses rather naive first-order optimization algorithms. Given the fact
that modern deep neural networks have several millions of parameters, second-order methods
have long been considered unfeasible beca...
»