Источник
ICML
Дата публикации
13.07.2025
Авторы
Александр Тюрин
Поделиться

Toward a Unified Theory of Gradient Descent under Generalized Smoothness

Аннотация

We study the classical optimization problem minx∈Rdf(x) and analyze the gradient descent (GD) method in both nonconvex and convex settings. It is well-known that, under the L-smoothness assumption (∥∇2f(x)∥≤L), the optimal point minimizing the quadratic upper bound f(xk)+⟨∇f(xk),xk+1−xk⟩+L2∥xk+1−xk∥2 is xk+1=xk−γk∇f(xk) with step size γk=1L. Surprisingly, a similar result can be derived under the ℓ-generalized smoothness assumption (∥∇2f(x)∥≤ℓ(∥∇f(x)∥)). In this case, we derive the step sizeγk=∫10dvℓ(∥∇f(xk)∥+∥∇f(xk)∥v).Using this step size rule, we improve upon existing theoretical convergence rates and obtain new results in several previously unexplored setups.

Присоединяйтесь к AIRI в соцсетях