Bachelor's Thesis presentation. Danylo is advised by Severin Reiz.
Previous talks at the SCCS Colloquium
Danylo Movchan: Implementing a learning-rate scheduler in a Newton-CG Optimizer for Deep Learning
SCCS Colloquium |
Nowadays, Deep Neural Networks models are at the peak of their popularity and find applications in a variety fields, e.g. in translation engines, where Natural Language Processing is used. Training such networks requires enormous computing resources and can take up to 2 weeks, and most often uses rather naive first-order optimization algorithms. Given the fact that modern deep neural networks have several millions of parameters, second-order methods have long been considered unfeasible because of their quadratic complexity of network size.
Previous studies have shown that combining Newton's method with Conjugate Gradients method and Fast Exact Multiplication (short: Newton-CG) leads to speed-up and accuracy benefits in areas such as image classification and neural machine translation. In our work we tried to determine whether the use of different learning rate schedulers can help to develop these benefits, eliminating some of the drawbacks of standard Newton-CG.
The Newton-CG with learn-rate scheduler allows for bigger initial learning rates, while still being stable close to minimum, and thus, faster training.