Convergence
From Glossary
Specifically, convergence of an algorithm. The algorithm is
represented as the point-to-set map,
, where there is some
selection function to choose
if
has more than one member. Convergence means that the
sequence,
, has a limit point, say
,
such that
satisfies certain conditions. Ideally, the
conditions are that
is an optimal solution, at
least locally, but this is often not the definition used in a
convergence theorem. (For example, in unconstrained optimization,
most algorithms converge to a stationary point,
which need not be even a local optimum.)
Let , where
is a scalar series
pertaining to the sequence
. Typically,
, which is sometimes called policy convergence (to a solution,
). We could also
have
, where
is the objective function, in which case the concern is
with value convergence to an optimal value,
.
Some related terms and concepts:
- Dual convergence. Dual values, notably Lagrange multipliers, converge to a dual solution.
- Geometric convergence. Same as linear convergence, but
usually used when the sequence is exactly a geometric progression:
-
is
to the power of
- Global convergence. Usually means the same as globally convergent, but some mean that the algorithm convergences to a global solution.
- Globally convergent. Convergence to a solution from any starting point.
- Linear convergence. Order is 1 and rate is less than 1 (see below).
- Local convergence. Some mean locally convergent, but some mean that the limit point is a local optimum (or just satisfies necessary conditions, see Myths and Counter Examples in Mathematical Programming to avoid misconception).
- Locally convergent. There exists an open neighborhood of 0 such that {s^k}-->0 from any s^0 in the neighborhood.
- Order of convergence. The order is
. The order being 1 is linear convergence and the order being 2 is quadratic convergence. Most (non-finite) algorithms to solve mathematical programs are between linear and quadratic.
Let
and suppose
(for notational convenience). The term order derives from the approximate equation,
, where
is the rate. If this equation were exact, we would have
if
, and
if
, for all
. If
and
, we gain one decimal place each time:
,
,
, etc. If
, the number of accurate decimal places approximately doubles each iteration:
,
,
, etc. Unfortunately, the underlying equation is not exact – see Myths and Counter Examples in Mathematical Programming to avoid misconception. Some authors call this q-order (or Q-order) convergence to distinguish it from variations of the definition of order. Each definition is designed to capture the notion of how many digits of accuracy are added as the sequence approaches its limit.
- Rate of convergence. This is generally used to mean the limiting ratio:
, given the order is
.
- Sublinear convergence. Order is 1 and rate is 1 (slower than all linear convergence),
e.g.,
.
- Superlinear convergence. Order is 1 and rate is 0 (faster than all linear convergence),
e.g.,
.
- Stochastic convergence. This applies when the successor point is a random variable, as in simulated annealing. Here are the most common types of convergence for a sequence of random variables,
.
- with probability
or almost everywhere (abbr., a.e.).
.
- in probability.
for all
.
- in distribution. The sequence of cumulative distribution functions (cdf), converges point-wise:
for all
at which
is continuous, where
is the cdf of
and
is the cdf of
.
- in p-th mean.
. For
, this is called convergence in quadratic mean or in mean square.
- with probability