Newton's method is a well-known algorithm for finding roots of equations in one or more dimensions.
It can be used to find local maxima and local minima of functions by noticing that if a
real number x * is a stationary point of a
function f(x), then x *
is a root of the derivative f'(x), and therefore one can apply Newton's method to f'(x).
Thus, provided that f(x) is a twice differentiable function and the initial guess x0 is chosen close enough to x * , the sequence (xn) defined by
will converge towards x * .
This iterative scheme can be generalized to several dimensions by replacing the derivative
with the gradient,
, and the reciprocal of the second derivative with the inverse
of the Hessian matrix,
. One obtains the iterative scheme
Usually Newton's method is modified to include a small step size γ > 0 instead of
γ = 1
The geometric interpretation of Newton's method is that at each iteration one approximates
by a quadratic function around
, and then
takes a step towards the maximum/minimum of this quadratic function.
Newton's method converges much faster towards a local maximum or minimum than gradient descent.
However, to use Newton's method one needs to know the Hessian of
which sometimes can be difficult to compute. There exist various quasi-Newton methods,
where an approximation for the Hessian is used instead.
Another drawback of Newton's method is that finding the inverse of the Hessian
could be an expensive operation.
See also
Gradient descent, Gauss-Newton algorithm, Optimization.
References
- Nocedal, Jorge & Wright, Stephen J. (1999). Numerical Optimization. Springer-Verlag. ISBN 0387987932.