 # Convergence

Specifically, convergence of an algorithm. The algorithm is represented as the point-to-set map, $LaTeX: x' \in A(x)$, where there is some selection function to choose $LaTeX: x'$ if $LaTeX: A(x)$ has more than one member. Convergence means that the sequence, $LaTeX: x^k$, has a limit point, say $LaTeX: x$, such that $LaTeX: x$ satisfies certain conditions. Ideally, the conditions are that $LaTeX: x$ is an optimal solution, at least locally, but this is often not the definition used in a convergence theorem. (For example, in unconstrained optimization, most algorithms converge to a stationary point, which need not be even a local optimum.)

Let $LaTeX: \textstyle s^k \rightarrow 0$, where $LaTeX: s^k$ is a scalar series pertaining to the sequence $LaTeX: x^k$. Typically, $LaTeX: s^k = \|x^k - x\|$, which is sometimes called policy convergence (to a solution, $LaTeX: x$). We could also have $LaTeX: s^k = f(x^k) - f^*$, where $LaTeX: f$ is the objective function, in which case the concern is with value convergence to an optimal value, $LaTeX: f^*$.

Some related terms and concepts:

• Dual convergence. Dual values, notably Lagrange multipliers, converge to a dual solution.
• Geometric convergence. Same as linear convergence, but usually used when the sequence is exactly a geometric progression: $LaTeX: s^k = r^k(s^0)$ - $LaTeX: r^k$ is $LaTeX: r$ to the power of $LaTeX: k$
• Global convergence. Usually means the same as globally convergent, but some mean that the algorithm convergences to a global solution.
• Globally convergent. Convergence to a solution from any starting point.
• Linear convergence. Order is 1 and rate is less than 1 (see below).
• Local convergence. Some mean locally convergent, but some mean that the limit point is a local optimum (or just satisfies necessary conditions, see Myths and Counter Examples in Mathematical Programming to avoid misconception).
• Locally convergent. There exists an open neighborhood of 0 such that {s^k}-->0 from any s^0 in the neighborhood.
• Order of convergence. The order is $LaTeX: \sup\{ p : \lim_{k \rightarrow \infty} \|s^{k+1}\| / \|s^k\|^p < \infty \}$. The order being 1 is linear convergence and the order being 2 is quadratic convergence. Most (non-finite) algorithms to solve mathematical programs are between linear and quadratic.

Let $LaTeX: v^k = \|s^k\|$ and suppose $LaTeX: v^0 = 1$ (for notational convenience). The term order derives from the approximate equation, $LaTeX: v^{k+1} := r(v^k)^p$, where $LaTeX: r$ is the rate. If this equation were exact, we would have $LaTeX: v^k = r^k$ if $LaTeX: p=1$, and $LaTeX: v^k = r^{(p^k-1)/(p-1)}$ if $LaTeX: p > 1$, for all $LaTeX: k$. If $LaTeX: r=0.1$ and $LaTeX: p=1$, we gain one decimal place each time: $LaTeX: v^1 = 0.1$, $LaTeX: v^2 =0.01$, $LaTeX: v^3 = 0.001$, etc. If $LaTeX: p=2$, the number of accurate decimal places approximately doubles each iteration: $LaTeX: v^1 = 0.1$, $LaTeX: v^2 = 0.0001$, $LaTeX: v^3 =0.0000001$, etc. Unfortunately, the underlying equation is not exact – see Myths and Counter Examples in Mathematical Programming to avoid misconception. Some authors call this q-order (or Q-order) convergence to distinguish it from variations of the definition of order. Each definition is designed to capture the notion of how many digits of accuracy are added as the sequence approaches its limit.

• Rate of convergence. This is generally used to mean the limiting ratio: $LaTeX: \lim_{k \rightarrow \infty} \|s^{k+1}\| / \|s^k\|^p$, given the order is $LaTeX: p$.
• Sublinear convergence. Order is 1 and rate is 1 (slower than all linear convergence), e.g., $LaTeX: s^k = 1/k$.
• Superlinear convergence. Order is 1 and rate is 0 (faster than all linear convergence), e.g., $LaTeX: s^k = (1/k)^k$.
• Stochastic convergence. This applies when the successor point is a random variable, as in simulated annealing. Here are the most common types of convergence for a sequence of random variables, $LaTeX: X_n \rightarrow X$.
• with probability $LaTeX: 1$ or almost everywhere (abbr., a.e.). $LaTeX: P\{\lim_{n \rightarrow \infty} X_n = X\} = 1$.
• in probability. $LaTeX: P\{\|X_n - X\| > e\} \rightarrow 0$ for all $LaTeX: e > 0$.
• in distribution. The sequence of cumulative distribution functions (cdf), converges point-wise: $LaTeX: F_n(x) \rightarrow F(x)$ for all $LaTeX: x$ at which $LaTeX: F$ is continuous, where $LaTeX: F_n$ is the cdf of $LaTeX: X_n$ and $LaTeX: F$ is the cdf of $LaTeX: X$.
• in p-th mean. $LaTeX: E\{\|X_n - X\|^p\} \rightarrow 0$. For $LaTeX: p=2$, this is called convergence in quadratic mean or in mean square.