From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes
Abstract
We focus on the classification problem with a separabledataset, one of the most important and classical problemsfrom machine learning. The standard approach to this taskis logistic regression with gradient descent (LR+GD). Recentstudies have observed that LR+GD can find a solutionwith arbitrarily large step sizes, defying conventional optimizationtheory. Our work investigates this phenomenon andmakes three interconnected key observations about LR+GDwith large step sizes. First, we find a remarkably simple explanationof why LR+GD with large step sizes solves the classificationproblem: LR+GD reduces to a batch version of thecelebrated perceptron algorithm when the step size γ → ∞.Second, we observe that larger step sizes lead LR+GD tohigher logistic losses when it tends to the perceptron algorithm,but larger step sizes also lead to faster convergence toa solution for the classification problem, meaning that logisticloss is an unreliable metric of the proximity to a solution. Surprisingly,high loss values can actually indicate faster convergence.Third, since the convergence rate in terms of loss functionvalues of LR+GD is unreliable, we examine the iterationcomplexity required by LR+GD with large step sizes to solvethe classification problem and prove that this complexity issuboptimal. To address this, we propose a new method, NormalizedLR+GD—based on the connection between LR+GDand the perceptron algorithm—with much better theoreticalguarantees.
Similar publications
partnership