|
Minimax Theory and its Applications 07 (2022), No. 1, 079--108 Copyright Heldermann Verlag 2022 Backtracking Gradient Descent Method and some Applications in Large Scale Optimisation. Part 1: Theory Tuyen Trung Truong Matematikk Institut, Universitetet i Oslo, Blindern - Oslo, Norway tuyentt@math.uio.no Hang-Tuan Nguyen Axon AI Research hnguyen@axon.com Deep Neural Networks (DNN) are essential in many realistic applications, including Data Science. At the core of DNN is numerical optimisation, in particular gradient descent methods (GD). The purpose of this paper is twofold. First, we prove some new results on the backtracking variant of GD under very general situations. Second, we present a comprehensive comparison of our new results to the previously known results in the literature, providing pros and cons of these methods. To illustrate the efficiency of Backtracking line search, we will present some experimental results (on validation accuracy, training time and so on) on CIFAR10, based on implemetations developed in another paper by the authors. Source codes for the experiments are available on GitHub. Keywords: Backtracking, deep learning, global convergence, gradient descent, line search method, optimisation, random dynamical systems. MSC: 65Kxx, 68Txx, 49Mxx, 68Uxx. [ Fulltext-pdf (815 KB)] for subscribers only. |