All Pages

From Glossary

Jump to: navigation, search

100% Rule

This pertains to sensitivity analysis in linear programming. In its original form, it uses the convexity of the set of admissible changes in the rim data to test whether a particular change is admissible: any combination of changes can occur as long as the total percentage deviation from the coordinate extremes does not exceed 100%. (Note: this applies to right-hand sides (b) and costs (c) separately.)

More generally, suppose the optimal value remains constant if the cost coefficient vector LaTeX: c in a linear program is replaced with any of LaTeX: c + d^1, c+d^2, \ldots , \mbox{ or } c + d^k (we could have LaTeX:  k = n and let LaTeX: d^j be the j-th coordinate extreme for LaTeX: c_j, but that is not necessary). Then, the optimal objective value is the same for LaTeX: \textstyle c + \sum_{i=1}^k \mu_i d^i, provided LaTeX: \mu_i \ge 0 and LaTeX: \textstyle \sum_{i=1}^k \mu_i \le 1

The same applies for convex combination of changes in the right-hand side LaTeX: b (maybe with the origin, which is no change). If the objective value remains optimal at if LaTeX: b is replaced with any of LaTeX: b + f^1, b + b^2, \ldots, b + f^q, then it is also optimal for the right-hand side LaTeX: \textstyle b + \sum_{i+1}^q \beta_i f^i so long that LaTeX: \beta_i \ge 0 and LaTeX: \textstyle \sum_{i=1}^q \beta_i \le 1

ABS algorithm

This is a family of algorithms to solve a linear system of LaTeX: m equations, LaTeX: Ax = b, with LaTeX: n variables such that LaTeX:  n \ge m . Its distinguishing property is that the k-th iterate, LaTeX: x^k, is a solution to the first LaTeX: k equations. While algorithms in this class date back to 1934, the family was recognized and formally developed by J. Abaffy, C.G. Broyden and E. Spedicato, so the algorithm family name bears their initials. The ABS algorithm is given by the following.

Notation: LaTeX: A_{(i,:)} is the i-th row of LaTeX: A.

  1. Initialize: choose LaTeX: x^0 \in \mathbb{R}^n, LaTeX: H^1 \in \mathbb{R}^{n \times n} so that is it nonsingular, set LaTeX: i = 1
  2. Set search direction LaTeX: d^i = H^i z for any LaTeX: z that does not satisfy LaTeX: z^T H^i A_{(i,:)} = 0. Compute residuals: LaTeX: r = Ax^i - b.
  3. Iterate: LaTeX: x^{i+1} = x^i - s d^i, where LaTeX: s is the step size: LaTeX: s = r_i / A_{(i,:)} d^i.
  4. Test for convergance (update residuals, if used in test): Is solution within tolerance? If so, stop; else, continue.
  5. Update: LaTeX: H^{i+1} = H^i - D^i, where LaTeX: D^i is a rank 1 update of the form LaTeX:  H^i A_{(i,:)} (H^i w)^T for LaTeX: w such that LaTeX: w^T H^i A_{(i,:)} = 1.
  6. Increment: let LaTeX: i = i + 1 and go to step 1.

The ABS algorithm extends to nonlinear equations and to linear diophantine equations.


AIMMS is Advanced Integrated Multidimensional Modeling Software.


AMPL is A Mathematical Programming Language.


A computer-assisted analysis system to analyze mathematical programs, including an artificially intelligent level of support.

Abstract program

This is a mathematical program defined on an abstract vector space (generally a Banach space). It unifies the ordinary mathematical program with control theory.

Active constraint

An inequality constraint that holds with equality (at a point). Same as tight, but not to be confused with binding. The set of all active constraints generally includes equality constraints as well.

Active set method

A technique that partitions inequality constraints into active constraints (or sufficiently close to be deemed active) and inactive constraints and then ignores the inactive constraints for the iteration. The active set for this iteration is sometimes called the working set. The new point is selected by moving on the surface defined by the working set (called the working surface).

Activity analysis

An approach to micro-economics for which a system is composed of elementary functions, called activities, which influenced the early growth of linear programming. An insightful way to view an activity in a standard form linear program is that negative coefficients represent input requirements and positive coefficients represent outputs:

c_j \\ \cdots \\ - \\ \cdots \\ + 
\leftarrow & \mbox{cost or revenue} \\ \\
\leftarrow & \mbox{input to activity} j \\ \\
\leftarrow & \mbox{output from activity} j

In general, the reduced cost represents the net worth of the activity for the prices LaTeX: p,

d_j = c_j - p^T A_{(:,j)} = \mbox{ input cost } - \mbox{ output revenue}.

This leads to an economic interpretation of not only linear programming but also of the simplex method: agents (activity owners) respond instantaneously to changes in prices, and the activity with the greatest net revenue wins a bid to become active (basic), thus changing the prices for the next time (iteration).

In this context, activities can be regarded as transformations, from inputs to outputs. Three prevalent transformations are: form, place and time.

Here are examples:

c \\ \cdots \\ -0.4 \\ -0.6 \\ \cdots \\ 1
\\ \\ \mbox{ apples} \\ \mbox{ cranberries} \\ \\ \mbox{ cranapple}
c \\ \cdots \\ -1 \\ \cdots \\ 0.6 \\ 0.4
\\ \\ \mbox{ Chicago} \\ \\ \mbox{ Denver} \\ \mbox{ Seattle}
c \\ \cdots \\ -1 \\ \cdots \\ 1
\\ \\ t \\ \\ t+1


Means without cycles. Arises in graphs and networks, as well as in the context of algorithms (e.g., not cycling back to some previous solution).

Adjacency matrix

The LaTeX: n \times n binary matrix, say LaTeX: A, that represents node adjacency in a simple graph: LaTeX: A_{i,j} = 1 if node LaTeX: i is adjacent to node LaTeX: j. In the undirected case, LaTeX: A is symmetric. In the directed case, the adjacency means there is an arc from LaTeX: i to LaTeX: j. In the case of a multigraph, LaTeX: A_{i,j} is the number of edges whose endpoints are nodes LaTeX: i and LaTeX: j.

Adjacent basis

Two bases are adjacent if they differ by exactly one column (so one is reachable from the other by a single pivot).


The classical adjoint of a matrix LaTeX: A is the transpose of the matrix of cofactors

\mbox{adj }A = \left[ A^{i,j} \right]_{i,j}^T,

where LaTeX: A^{i,j} is the determinant of the submatrix of LaTeX: A formed by removing the i-th row and the j-th column. The Hermitian adjoint (conjugate transpose or Hermitian transpose), LaTeX: A^*, is the transpose of the conjugate. We have that LaTeX:  A^* = A^T if LaTeX: A is real-valued.


Generically means something that satisfies conditions. One example is the admissibility of a direction of change of some parameter, such as a right-hand side of a linear program (LP) from LaTeX: b to LaTeX: b + t h, where LaTeX: t is some scalar and LaTeX: h is the direction of change. For a positive value of LaTeX: t, this might cause the LP to become infeasible, no matter how small it is. In that case it is common to call the direction vector inadmissible. For a perturbation to be admissible in this context, it is required that the mathematical program have a solution in some neighborhood of that direction. For the example, LaTeX: h is admissible if the (primal) LP is feasible for all LaTeX: t \in [0, t^*) for some LaTeX: t^* > 0.

Admissibility need not be for directions of change in the context of sensitivity analysis. The term is used elsewhere and defined for the context in which it is used. For example, a primal ascent algorithm could define a direction, say LaTeX: d, to be admissible if LaTeX: x + td is feasible and LaTeX: f(x+td) \ge f(x) for all LaTeX: t \in [0, t^*) for some LaTeX: t^* > 0.

Advanced basis

Initiating the simplex method with a known feasible basis instead of with logical variables as in Phase I. The advanced basis could be optimal for some parameter values, but those values have changed. It is usually better to start with an advanced basis because fewer iterations are generally required to reach the (new) optimal basis.

Affine combination

The affine combination of LaTeX: \{x^1, x^2, \ldots x^K\} is a linear combination LaTeX: \sum_k a_k x^k such that LaTeX: \sum_k a_k = 1.

Affine function

Let LaTeX: f : X \rightarrow \mathbb{R}^n. Then, LaTeX: f is affine if LaTeX: X is a convex set and

f(\theta x + (1-\theta) y) = \theta f(x) + (1-\theta)f(y),

for all LaTeX: x, y \in X and LaTeX: \theta \in [0, 1].

Equivalently, LaTeX: f is affine if it is both convex and concave. Moreover, if LaTeX: X = \mathbb{R}^n, LaTeX: f is the translation of a linear function: LaTeX: a^T x + b.

Affine hull

The affine hull of a set LaTeX: S is the intersection of all affine sets containing LaTeX: S. Equivalently, the set of all affine combinations of points in the set.

Affine independence

The set LaTeX:  \{x^0, x^1, \ldots, x^k \} is affinely independent if their affine hull is k-dimensional. Equivalently, LaTeX: \{ x^1-x^0, \ldots, x^k-x^0 \} is linearly independent. For n=2, any two distinct vectors are affinely independent (k=1). Three vectors need to form a triangle; otherwise, they are co-linear and are affinely dependent.

Affine scaling

A variant of Karmarkar's algorithm for the linear program in standard form: LaTeX:  \min \{c^T x : x \ge 0, \; Ax = b \}, where LaTeX: A has full row rank.

Here are the basic steps (of the primal method).

  1. Given LaTeX: x > 0, let LaTeX:  X = \mbox{diag}(x).
  2. Estimate dual: LaTeX:  y = [AX^2A^T]^{-1} AX^2c and LaTeX: d = c - A^T y.
  3. Move: LaTeX: x = x - aX^2d / \|Xd\|_{\infty}, where LaTeX: a is a parameter in LaTeX: (0, 1).

Multiplication of LaTeX: A and LaTeX: c by LaTeX: X is a scaling operation, using current levels as scale values -i.e. LaTeX: AX multiplies column LaTeX: j of LaTeX: A by LaTeX: x_j, and LaTeX: Xc multiplies LaTeX: c_j by LaTeX: x_j. Note that the reduced cost LaTeX: d is in the null space of LaTeX: A -i.e. LaTeX: Ad = 0, so LaTeX: d is a feasible direction. This affine scaling direction is used in its own right in interior methods. It minimizes the duality gap subject to LaTeX: Ax = b and LaTeX: x in an ellipsoid (replacing LaTeX: x \ge 0).

Affine set

(or affine manifold, affine variety, linear variety, flat.)

A set LaTeX: S that contains the line through any two of its points --i.e. LaTeX: x, y \in S implies LaTeX: ax + (1-a)y \in S for all LaTeX: a \in \mathbb{R}. The dimension of an affine set is that of the (unique) subspace parallel to it. Dimensions of 0, 1 and 2 are called points, lines and planes, respectively.


Aggregation is combining units to reduce problem dimensions. One form of aggregation is to combine basic entities at the data level, such as regions, time periods, and materials. The dimensions of the mathematical program, which include numbers of variables and constraints, are generally reduced by aggregation at the entity level. Another form of aggregation is the use of surrogates, including the strong form: integer equivalent aggregation.


An iterative method that generates a sequence of the form LaTeX: x^{k+1} = S_k (A_k (x^k)), where LaTeX: x^0 is given (called the initial point). LaTeX: A_k is an algorithm map that yields a set of policies, given the current point, LaTeX: x^k, and LaTeX: S_k is a selection function (in case LaTeX: A_k has more than one policy). Note that the algorithm map and selection function can depend on the iteration number LaTeX: k. A concrete example is of the form LaTeX: x^{k+1} = x^k + s_k d^k, where LaTeX: s_k is a scalar parameter, called the [[Step_size|]]tep size and LaTeX: d^k is a vector, called the direction of change. The step size may require a selection from a line search if there are many optimal solutions; one is to select the least value (perhaps above some threshold). In practice, algorithms have a variety of tolerances to control its computer representation, progression and termination.

A stochastic algorithm is one whose algorithm map is a random variable. Unlike the meaning in computer science, most people in mathematical programming mean an algorithm whereby an iterate is selected by some random mechanism. One example is simulated annealing. When LaTeX: x^k is a random variable, there is a meaning to convergence that differs from ordinary (deterministic) sequences.

An online algorithm is one that obtains a solution for an online problem. Typically, this is a heuristic that obtains a "good" solution because there is not enough time to guarantee optimality.

Alldifferent constraint

is a global constraint that is imposed on LaTeX: n variables, LaTeX: x_1, \dots, x_n, stating that every variable is assigned a different value.

In other words, it represents the set of disequalities LaTeX: x_1 \neq x_2, x_1 \neq x_3, \dots, x_{n-1} \neq x_n, however, typically, algorithms for all-different propagate far more efficiently than the set of disequality constraints.

Almost complementary

A pair of vectors, LaTeX: w, z \in \mathbb{R}^n_+, such that each coordinate, except one, satisfies complementary slackness -i.e. LaTeX: w_i z_i = 0 for all but one LaTeX: i.

Alternative systems

Two systems of inequalities and/or equations such that there exists a solution to one system, or there exists a solution to the other system, and there cannot exist a solution to both systems. Several alternative systems are widely used in mathematical programming, especially in the study of duality.

  • Fourier (1826)
    LaTeX: Ax \le a, \; Bx < b (not vacuous)
    LaTeX: A^Ty + B^Tv = 0, \; y \ge 0, \; v \ge 0, \; a^Ty + b^T v < 0
  • Gordan (1873)
    LaTeX: Ax > 0
    LaTeX: A^Ty = 0, \; y \ge 0, y \neq 0
  • Farkas (1902)
    LaTeX: Ax = b, \; x \ge 0
    LaTeX: A^Ty \ge 0, \; b^Ty < 0

    A Variant is

    LaTeX: Ax \le b
    LaTeX: A^Ty = 0, \; b^Ty < 0, \; y \ge 0
  • Fredholm (1903)
    LaTeX: Ax = b
    LaTeX: A^Ty = 0, \; b^Ty > 0
  • Stiemke (1915)
    LaTeX: Ax = 0, \; x > 0
    LaTeX: A^Ty \ge 0, \; A^T y \ne 0
  • Motzkin (1936)
    LaTeX: Ax \le a, \; Bx < b, \; Cx = c (LaTeX: B not vacuous)
    LaTeX: A^Ty + B^Tv + C^Tw = 0, \; y \ge 0, \; v \ge 0, \; v \neq 0, \; a^Ty + b^Tv +c^Tw \le 0
  • Ville (1938)
    LaTeX: Ax > 0, \; x \ge 0
    LaTeX: A^Ty \le 0, \; y \ge 0, \; y \neq 0
  • Tucker (1956)
    LaTeX: Ax \ge 0, \; Ax \neq 0, \; Bx \ge 0, \; Cx = 0
    LaTeX: A^Ty + B^Tv + C^Tw = 0, \; y > 0, \; v \ge 0
  • Gale (1960)
    LaTeX:  Ax \le b
    LaTeX:  A^Ty = 0, \; b^Ty = -1, \; y \ge 0

Integer Systems

  • Papadimitriou and Steiglitz (1982) Let LaTeX: A be an LaTeX: m \times n integer matrix, LaTeX: a^* = \max_{(i,j)} |A_{i,j}|, LaTeX: M = (ma^*)^{m+1}, and LaTeX: b an integer LaTeX: m-vector.
    LaTeX:  Ax = 0, x \geq 0, \; x \neq 0, \; x \text{ integer-valued}
    LaTeX:  A^Ty \geq 1, \; y \in \{ 0, \pm 1, \pm 2, \ldots, \pm M \}^m .

Convex Systems

For the following, LaTeX: f is convex on LaTeX: X.

  • Fan, Glicksburg and Hoffman (1957)
    LaTeX:  f(x) < 0
    LaTeX: y f(x) \geq 0 for all LaTeX: x \in X, \; y \geq 0, \; y \neq 0.
    (See NLP Myth 12 to avoid misconception.)

Analytic center

Given the set, LaTeX: \{x \in X : g(x) \ge 0 \}, which we assume is non-empty and compact, such that LaTeX: g is concave on LaTeX: X, with a non-empty strict interior, its analytic center is the (unique) solution to the maximum entropy problem:

\max \left\{ \sum_i \ln(g_i(x)) : x \in X, g(x) > 0 \right\}.

Note that the analytic center depends on how the set is defined -- i.e., the nature of LaTeX: g, rather than just the set, itself. For example, consider the analytic center of the box, LaTeX: [0,1]^n. One form is to have LaTeX: 2n functions as: LaTeX:  \{ x : x \ge 0,  1-x \ge 0\} . In this case, the analytic center is LaTeX: x^*_j = y^* for all LaTeX: j, where LaTeX: y^* is the solution to:

\max \left\{ \ln(y) + \ln(1 - y) : 0 < y < 1 \right\}.

Since LaTeX: y^* = 1/2, the analytical center is what we usually think of as the center of the box. However, the upper bounds could be defined by LaTeX: 1 - (x_j)^p \ge 0 for all LaTeX: j, where LaTeX: p > 1 (so LaTeX: 1 - (x_j)^p is concave). This changes the functional definition, but not the set -- it's still the unit box. The analytic center is skewed towards the corner because the defining mathematical program is:

\max \left\{ \ln(y) + \ln(1 - y^p) : 0 < y < 1 \right\}.

The solution is LaTeX: y^* = (1/(1+p))^{(1/p)}, so the analytic center approaches LaTeX: (1,1,...,1) as LaTeX: p \rightarrow \infty.

Ant colony optimization

A heuristic search approach to solving a combinatorial program based on the behavior of ant colonies, particularly their ability to collectively determine shortest paths through the cumulative affect of pheramones.

Anticycling rule

A rule that prevents cycling, such as Bland's rule for linear programming.

Approximation algorithm

An algorithm that reaches a feasible solution to an NP-complete problem in polynomial time with a guaranteed bound on the quality of the solution. For minimization, with LaTeX: Z^* being a positive optimal value, the bound generally has the form:

Z \le rZ^* \mbox{ for some } r  > 1,
where LaTeX: Z is the value of the feasible solution.

Arborescent sets

Any two sets that are either disjoint, or one is a proper subset of the other.

Arc consistency

A consistency property in constraint programming used as a basis for inference. A variable LaTeX: x is arc-consistent with another variable LaTeX: y if for every value in the domain of LaTeX: x there exists a value in the domain of LaTeX: y such that the binary constraint LaTeX: C(x,y) is satisfied. A binary constraint LaTeX: C(x,y) is arc-consistent if both of its variables are arc-consistent with respect to each other.

See also:


The optimality region for maximization:

\arg\max \{ f(x) : x \in X \} = \{x^* \in X : f(x^*) \ge f(x) \mbox{ for all } x \in X \}.

It is the "argument of the maximum."


The optimality region for a minimization problem:

\arg\min \{ f(x) : x \in X \} = \{x^* \in X : f(x^*) \le f(x) \mbox{ for all } x \in X \}.

It is the "argument of the minimum."

Artificial variable

A variable, say LaTeX: v, added to an equation, LaTeX: h(x) = 0. The resulting system, LaTeX: h(x) + v = 0, is feasible upon letting LaTeX: v = -h(x) for a chosen LaTeX: x. The objective function is modified to penalize nonzero values of LaTeX: v. Often, LaTeX: v \ge 0 is required, multiplying LaTeX: h by -1 if necessary. This originated in linear programming, where a Phase I objective is used to find a solution with LaTeX: v=0 (or ascertain that the original system has no feasible solution) by minimizing LaTeX: \textstyle \sum_i v_i (ignoring the original objective, LaTeX: c^T x).

Assembly line balancing problem

A combinatorial program. An assembly line consists of LaTeX: m work stations, connected by a conveyor belt. Each station performs a set of tasks that takes a cycle time, LaTeX: c. Each task must be assigned to exactly one station such that precedence constraints are satisfied. One model minimizes the total cycle time, given the number of stations. Another minimizes the number of stations, given the maximum allowable cycle time. There are variants of this simple form.

Assignment polytope

The assignment polytope is

\left\{ x \in \mathbb{R}^{n^2} : x \ge 0, \; \sum_i x_{i,j} = 1 \mbox{ for } j = 1, 2, \ldots, n, \;
\sum_j x_{i,j} = 1 \mbox{ for } i =  1, 2, \ldots, n \right\}.

The name originates from an interpretation of its extreme points in the assignment problem. If LaTeX: x_{i,j} = 1, assign person LaTeX: i to task LaTeX: j. Each extreme point has the property that each element of LaTeX: x is 0 or 1. The sums describing the polytope require that each row (LaTeX: i) and each column (LaTeX: j) has exactly one 1 in it. This means every person is assigned to some task, and every task is assigned to be done by one person. The polytope is also the set of doubly stochastic matrices, whose extreme points are the permutation matrices.

Assignment problem

Choose an assignment of people to jobs so as to minimize total cost. The ordinary model is that of a matching problem. Although the usual assignment problem is solvable in polynomial time (as a linear program), important extensions are the following NP-complete problems:

  • Quadratic assignment (QAP). The objective is a quadratic form, which can have just cross terms, so it need not be convex (in fact, it can be quasiconcave, in which case a solution occurs at an extreme point). The QAP includes the traveling salesman problem as a special case (this is what is used with a neural network model of the TSP).
  • Multi-dimensional assignment. More than 2 indexes, the variables are of the form, LaTeX: x_{i,j,k, ...}. The constraints sum all but one index -- for example, the 3-index assignment problem is

\min &  \sum_{i,j,k} c_{i,j,k} x_{i,j,k} & &  \\ \\
\mbox{s.t.} & \sum_{j,k} x_{i,j,k} & = & 1 \mbox{ for all } i, \\
& \sum_{i,k} x_{i,j,k} & = & 1 \mbox{ for all } j, \\
& \sum_{i,j} x_{i,j,k} & = & 1 \mbox{ for all } k.

Asymptotic LP

A linear program in which the coefficients are functions of a single parameter, usually denoting time (some authors require the functions to be rational -i.e. of the form LaTeX: p(t)/q(t), where LaTeX: p and LaTeX: q are polynomials). The problem is to find a steady state solution -i.e., one that is optimal (or nearly optimal) for all sufficiently large values of the time parameter.

Asymptotic stability

The system LaTeX: x^{k+1} = A x^k is asymptotically stable in the large if LaTeX: \textstyle \lim_{k \rightarrow \infty} x^k = 0 for any initial LaTeX: x^0. Originally developed for differential (and difference) equations, this applies to a sequence generated by an algorithm.

Atleast constraint

is a global constraint that states the minimum number of occurrences of a particular value among an array/set of variables. Typically it is written as atleast(LaTeX: n, LaTeX: \langle x_1, \dots, x_m \range, LaTeX: v), which is true if value LaTeX: v is assigned at least </math>n</math> times among the variables LaTeX: \langle x_1, \dots, x_m \rangle.

See also atmost constraint.

Atmost1 constraint

is a global constraint stating that LaTeX: n sets LaTeX: S_1, \dots, S_n of known cardinality should intersect pairwise in at-most one element. If LaTeX: (s_1, \dots, s_n) \in atmost1(S_1, \dots, S_n, c_1, \cdots, c_n) then LaTeX: s_1\subset S_1, \dots , s_n\subset S_n, LaTeX: |s_1|=c_1, \dots, |s_n|=c_n and LaTeX: |s_i\cap s_j| \leq 1 for all LaTeX: 1\leq i < j \leq n.

See also atmost constraint.

Atmost constraint

is a global constraint that states the maximum number of occurrences of a particular value among an array/set of variables. Typically it is written as atmost(LaTeX: n, LaTeX: \langle x_1, \dots, x_m \range, LaTeX: v), which is true if value LaTeX: v is assigned at most LaTeX: n times among the variables LaTeX: \langle x_1, \dots, x_m \rangle.

See also the atmost1 constraint and the atleast constraint.

Auction algorithm

See bidding algorithm.

Augmented Lagrangian

The Lagrangian augmented by a term that retains the stationary properties of a solution to the original mathematical program but alters the Hessian in the subspace defined by the active set of constraints. The added term is sometimes called a penalty term, as it decreases the value of the augmented Lagrangian for LaTeX: x off the surface defined by the active constraints. For example, the penalty term could be proportional to the squared norm of the active constraints:

L_a (x, y) = L(x, y) + p G(x)^T G(x),

where LaTeX: p is a parameter (negative for maximization), LaTeX: G(x) is the vector of active constraints (at LaTeX: x), and LaTeX: L is the usual Lagrangian. In this case, suppose LaTeX: x^* is a Kuhn-Tucker point with multiplier LaTeX: y^*. Then, LaTeX: L_a(x^*, y^*) = L(x^*, y^*) (since LaTeX: G(x^*) = 0). Further, the gradient of the penalty term with respect to LaTeX: x is LaTeX: 0 at LaTeX: x^*, and the Hessian of the penalty term is simply LaTeX: p A(x^*)^T A(x^*), where LaTeX: A(x^*) is the matrix whose rows are the gradients of the active constraints. Since LaTeX: A(x^*)^TA(x^*) is positive semi-definite, the penalty term has the effect of increasing the eigenvalues (decreasing them if maximizing).

Augmenting path

In network flows, an augmenting path is a path from a source to sink such that every forward arc has flow less than capacity.

Automatic differentiation

A process for evaluating derivatives of a function that depends only on an algorithmic specification of the function, such as a computer program. (The term "automatic" stems from having the derivatives produced when executing the program to evaluate the function.)

There are two modes:

  1. Forward - direct parse and evaluation, introducing intermediate variables in the usual way, as rules of differentiation are applied.
  2. Reverse - evaluates entries in the code list first, and lastly differentiates the intermediate and independent variables in reverse order.

While the forward mode is straightforward and easier to implement, the reverse mode is computationally superior for scalar functions, as its complexity is independent of the number of intermediate variables. More generally, if LaTeX:  f : \mathbb{R}^n \rightarrow \mathbb{R}^m, the forward mode has complexity of O(n), and the backward mode has complexity of O(m).

BFGS method

This is a method to solve an unconstrained nonlinear program that proceeds as follows.

  1. Start with any symmetric, negative definite matrix, say LaTeX: B^0, and any initial point LaTeX: x^0. Set LaTeX: k = 0
  2. Direction: Solve LaTeX:  B^k d^k = -\nabla f(x^k).
  3. Step size: LaTeX: s^k \in \arg\max \{ f(x^0 + td^k) : t \ge 0 \}.
  4. Change in position: LaTeX:  p^k = s^k d^k.
  5. New point: LaTeX:  x^{k+1} = x^k + p^k.
  6. Change in gradient: LaTeX: q^k = \nabla f(x^{k+1}) - \nabla f(x^k).
  7. Update LaTeX: B^k to form LaTeX: B^{k+1} with the BFGS update.
  8. Let LaTeX: k = k+1 to complete the iteration.

Compare this with the DFP method, and note a key difference is that DFP estimates the inverse hessian, whereas BFGS estimates the hessian (and solves the linear system). There is a correspondance, and empirical evidence suggests, that the BFGS method has less of a problem with error.

BFGS update

A way to update an approximation of the hessian, used for unconstrained optimization. The update is

B' = B + \frac{(q - Bp)(q - Bp)^T}{p^T(q - Bp)},

where LaTeX: B' is the update from LaTeX: B and LaTeX: p and LaTeX: q are defined in the BFGS method for the LaTeX: k-th iteration. Taking the inverse to compare with the DFP update,

H' = H + \frac{1 + q^T H q}{(q^T p)^2} \left( pp^T \right) - 
\frac{1}{p^T q} \left( pq^T H + H qp^T \right)

where LaTeX: H' is the update from LaTeX: H.


See basic feasible solution.


Originally, this was a set of propositions that must be true in every solution of a satisfiability problem. If there is no solution, one seeks a set of truth values that maximizes the number of clauses that are true. Then, the backbone is the set of propositions that must be true in every optimal solution. This has been extended to various combinatorial programs, for which the backbone is

\{j : x_j = 1 \mbox{ for all } x \in X^* \},

where LaTeX: x is a vector of binary variables in the IP formulation and LaTeX: X^* is usually the optimality region, although it could allow some ball around the objective value. Some results suggest that the difficulty of an NP-hard problem can be further analyzed by the size of itsbackbone.


A backtracking algorithm that, in contrast to classic backtracking algorithms, records partial justifications for inferences that are made during the tree search. These justifications allow the search to soundly jump back several levels in the search tree in case of an inconsistency.


An algorithm applied in tree search that builds up candidate solutions by making one decision at a time (i.e., branching, see branch and bound) and checks for feasibility at each decision. In case an infeasible state is reached,the algorithm "backtracks", returning to a previous decisions and changing it.

In constraint programming a narrower definition is often used: backtracking is a recursive algorithm that traverses the search tree depth-first, such that at each step a variable is assigned a value (all possible values are assigned in turn) and then the partial assignment is checked for consistency, i.e. the algorithm checks whether all constraints whose variable are fully assigned, are satisfied. If there is an inconsistency, the algorithm prunes the sub-tree and tracks back one step.

Backward Transformation

Backward Transformation (abbr. BTRAN). This applies to the factored system, LaTeX: u E_1 E_2 \ldots E_n = c, where each LaTeX: E_i is an elementary matrix. The recursion is:

LaTeX: u_n E_n = c
LaTeX: u_{n-1} E_{n-1} = u_n
LaTeX: u_{1} E_{1} = u_2,
for which LaTeX: u_1 is a solution to the original system.

Backward substitution

The recursion to solve a nonsingular upper triangular system, LaTeX: Ux = b. It starts with LaTeX: x_n = b_n / U_{n,n}, then

x_j = \frac{1}{U_{j, j}} \left( b_j - \sum_{i > j} U_{i,j} x_i \right)  \mbox{ for } j = n-1, \ldots ,1.

Backward triangularization

An algorithm to rearrange a matrix by recursively assigning a singleton column to the rear (with its only row as its pivot). The recursion applies to the submatrix defined by omitting the assigned column and row (and reducing remaining column counts accordingly). This results in a lower-triangular rearrangement if and only if such a rearrangement exists. Otherwise, the algorithm reaches a submatrix will all columns having more than one element, of which one is selected to be a spike that is assigned to the rear of the matrix. The process then continues on the submatrix.

Example 1: Let the nonzeros be denoted by *:

<pre> A = \begin{bmatrix}
               * &   &   &     \\
               * & * & * & *   \\
               * &   &   & *   \\
               * &   & * & *

The column counts are LaTeX: (4,1,2,3). We thus put column LaTeX: 2 at the rear, assigned to pivot on row LaTeX: 2, and consider the remaining LaTeX: 3\times 3 submatrix:

<pre>A^1 = \begin{bmatrix}
               * &   &   & \cdot &   \\
               * &   & * & \cdot &   \\
               * & * & * & \cdot &   \\
               \cdot & \cdot & \cdot & \cdot & \cdot \\
               * & * & * & \cdot & * 

The remaining column counts (for LaTeX: A^1) are LaTeX: (3,1,2), so column LaTeX: 2 is placed at the (new) rear, pivoting on row LaTeX: 3:

<pre>A^2 = \left[\begin{array}{cc|cc}
               * &   &   &     \\
               * & * &   &     \\  \hline
               * & * & * &     \\
               * & * & * & *

The remaining column counts are LaTeX: (2,1), so we reach the final rearrangement, which is triangular:

<pre>A^3 = \left[\begin{array}{cc cc}
               * &   &   &     \\
               * & * &   &     \\
               * & * & * &     \\
               * & * & * & *

Example 2: Column counts are shown below the matrix.

      A = \left[\begin{array}{cccc}
               * &   &   &     \\
               * & * & * & *   \\
               * & * &   & *   \\
               * &   & * & *
           \end{array}\right] \\
  \phantom{A = \left[\right.}~
              4 & 2 & 2 & 3

All column counts are greater than one, so we select a column to be a spike, and we assign it to the rear with a pivot row (some algorithms delay the pivot row assignment). For this example, we choose a column with minimum count and pick the pivot row with maximum count.

      A^1 = \left[\begin{array}{ccc | c}
               * &   &   &     \\
               * & * &   &     \\
               * &   & * & *   \\ \hline
               * & * & * & *
    & A^2 = \left[\begin{array}{cc |cc}
               * &   &   &     \\
               * & * &   & *   \\ \hline
               * &   & * &     \\
               * & * & * & *
           \end{array}\right] = A^3.

The final rearrangement (LaTeX: A^3) has one spike.

Balance equation

A constraint of the form LaTeX: I(x) - O(x) = 0, where LaTeX: I(x) is the input and LaTeX: O(x) is the output, each as a function of the decision variables, LaTeX: x. Typically this is linear, conserving flows, measured in units of volume or mass. In an ordinary network in which there are no gains or losses, this appears as:

I(x) - O(x) = \sum_{i} x_{i,j} - \sum_{k} x_{j,k} = 0.

Barrier function

LaTeX: P is a barrier function on the strict interior, say LaTeX: I, of the set LaTeX: S if LaTeX: P is continuous, and LaTeX: P(x) \rightarrow \infty as LaTeX: x approaches any point in LaTeX: S\backslash I. Barrier functions arises in penalty function methods for certain classes that include convex programs of the form:

\min \{ f(x) : g(x) \ge 0 \},

where LaTeX: I = \{ x : g(x) > 0 \}. Two popular choices are

P(x) = \sum_i 1 / g_i(x),


P(x) = -\sum_i \ln (g_i(x)).


This is the center of mass and is the point at which a mass (body) can be considered as being concentrated without altering the effect of the attraction that the earth has upon it. This is the average position of particles located at postions LaTeX: x^i weighted by their masses LaTeX: m_i,

\frac{\sum_i m_i x^i}{\sum_i m_i}.

In mathematical programming, this is usually used to mean the simple average of the extreme points of a polytope,

\frac{1}{m} \sum_i x^i,

where LaTeX:  \{x^1, \ldots, x^m\} is the set of extreme points. This could be a weighted average,

\sum_i w_i x^i,

where LaTeX:  w \in S_m.


Associated with a submatrix of LaTeX: A, say LaTeX: B, whose columns comprise a basis for LaTeX: \mathbb{R}^m (i.e., LaTeX: B consists of LaTeX: m linearly independent columns of LaTeX: A, which is a basis for LaTeX: \mathbb{R}^m).

Here are some related terms that arise in linear programming.

  • Adjacent basis. One that differs in exactly one column from a given basis.
  • Basic column. A column of the basis matrix.
  • Basic variable. The variable, say LaTeX: x_j, associated with the LaTeX: j-th column of the basis matrix.
  • Basic level. The value of a basic variable.
  • Basic solution. The solution, LaTeX: x, obtained by setting nonbasic values to some bound value, like 0, resulting in a unique solution for the basic variables. That is, LaTeX: Ax=b is equivalent to LaTeX: Bu + Nv = b, where LaTeX: A=[B \; N] and LaTeX: x=[u^T \; v^T]^T. Upon fixing the value of LaTeX: v, the nonsingularity of LaTeX: B gives the basic solution with LaTeX: u = B^{-1}(b - Nv). In a standard linear program, LaTeX: v=0, and hence LaTeX: u = B^{-1} b.
  • Basic feasible solution. A basic solution that is feasible --i.e., the basic values satisfy their bounds. (In a standard LP, this means LaTeX: B^{-1}b \ge 0.)
  • Basis kernel. After performing forward triangularization, if the basis does not triangularize completely, backward triangularization is applied. The result is a (rearranged) blocking of the basis into three segments:
                    | \ <--- Forward triangle
                    |__\ ______
                    |   |      |
                    |   |      | <--- Kernel
                    |   |______|
                    |          |\
                    |          | \ <--- Backward triangle
    Each row and column in the kernel has at least 2 nonzeroes.

Basic solution

See terms associated with being basic.

Benders decomposition

This decomposition separates complicating variables. The (original) semi-linear model is:

\max \{ c^Tx + F(y) : x \in X, \; y \in Y, \; Ax + G(y) \le b \}.

Generalized Benders decomposition considers the following mathematical program:

\max \{ f(x, y) : g(x, y) \le 0, \; h(x, y) = 0, \; x \in X, \; y \in Y \}.

Suppose this is not a convex program, but that with each fixed LaTeX:  y \in Y, the remaining mathematical program (with LaTeX: x being the only variable) is convex. The idea is to consider

v(y) = \sup \{ f(x, y) : g(x, y) \le 0, \; h(x, y) = 0, \; x \in X \},

and solve LaTeX: \max \{v(y) : y \in Y\}, with evaluations of LaTeX: v requiring the solution to the convex program.

Biconcave function

A function whose negative is biconvex.

Biconvex function

A function LaTeX: f(x, y) on LaTeX: X \times Y that is convex on LaTeX: X for each LaTeX: y \in Y and convex on LaTeX: Y for each LaTeX: x \in X. An example is LaTeX: f(x,y) = xy on LaTeX:  \mathbb{R}^2, which is not convex in both LaTeX: x and LaTeX: y together.

Biconvex program

A nonlinear program with each function biconvex or biconcave, such that if LaTeX: x or LaTeX: y is fixed, the resulting mathematical program in LaTeX: y or LaTeX: x, resp., is a convex program. A special case is the bilinear program.

Bidding algorithm

Also called an auction algorithm. This proceeds by generating a sequence of partial solutions (e.g., a partial assignment), terminating when the solution is complete. There is a pricing vector and a bidding increment whereby the solution is revised and extended by a myopic optimization scheme (which depends on the particular problem).

Big-M method

An alternative to using Phase I, a large number LaTeX: M is used as a linear penalty in the composite objective:

\min \{ c^T x + M \; e^T v: Ax + v = b, \; x, v \ge 0 \},

where LaTeX: v is an artificial variable and LaTeX: e = (1, 1, ..., 1)^T.

Bilevel program

A multilevel program with two levels (sets of variables).

Bilinear function

A special quadratic function, LaTeX: x^T Q y + c^Tx + b^Ty, that is linear in LaTeX: y for each LaTeX: x, and linear in LaTeX: x for each LaTeX: y.

Bilinear program

A nonlinear program whose objective and/or constraint functions are bilinear. An example is the pooling problem.

Bin packing problem

Partition a set of integers, LaTeX: \{c_1, \ldots, c_n\}, into bins such that the sum of all integers in each bin does not exceed the bin capacity, say LaTeX: b, so as to minimize the number of bins required. The problem is NP-hard.

Binary CSP

A constraint satisfaction problem is binary if all constraints are either unary or binary relations (wrt the decision variables). That is, the cardinality of the scope of the constraint is either 1 or 2. For example, LaTeX: x > 2 (unary) or LaTeX: x \neq y (binary).

Binary constraint

A constraint is binary if there are exactly two variables in its scope.

Binary relation

A subset of LaTeX: S \times T, say LaTeX: R. When LaTeX: T=S, we refer to just the relation on the set LaTeX: S.

Here are 5 properties that distinguish relations on a set (LaTeX: x, LaTeX: y, LaTeX: z are in LaTeX: S):

  1. antisymmetric - LaTeX: (x, y) \in R and LaTeX: x \ne y imply LaTeX: (y, x) \not\in R.
  2. irreflexive - LaTeX: (x, x) \not\in R.
  3. reflexive - LaTeX: (x, x) \in R.
  4. symmetric - LaTeX: (x, y) \in R implies LaTeX: (y, x) \in R.
  5. transitive - LaTeX: (x, y) and LaTeX: (y, z) both in LaTeX: R imply LaTeX: (x, z) \in R.

Binary variable

A decision variable with two feasible values, usually 0 or 1.

Binding constraint

A constraint whose removal changes the optimality region. (Some authors require that the removal results in a strict improvement in the objective value.)


Taking the midpoint of endpoints known to contain a solution. If the solution is in the interval (a,b), bisection chooses x = (a+b)/2 as the next point to evaluate. The result reduces the interval to (a,x) or (x,b), so the length is cut in half with each evaluation.

Bland rule

This is for pivot selection in the simplex method to avoid cycling:

  • If more than one (nonbasic) column has a negative (for minimization) reduced cost, choose the one with lowest index.
  • If more than one (basic) column has the same determining value to leave the basis, select the one with the lowest index.

Beale's example of cycling shows that Bland's Rule must apply to all candidates for entering the basis, not just those with most negative reduced cost, see the supplement on cycling. The fifth tableau is the first time Bland's Rule is needed to break the cycle. Variable LaTeX: x_1 is chosen to enter the basis rather than LaTeX: x_5.

Blending problem

A blend is some combination of materials to make another material. Given raw materials, their blends make intermediate materials, called stock, and/or final materials, called products (there can be other blending activities that make products from stocks). The raw materials have purchase costs and the blending activities have costs of operation and maintenance. The products have either fixed demands or selling prices, and the problem is to find blends that minimize total net cost (or maximize profit). This arises in refinery operations, in the food industry, and in other process industries. The problem can sometimes be solved with linear programming, but there can be complicating relations that require nonlinear programming, such as the pooling problem.

Block pivot

Given a detached coefficient form, a pivot is performed on a nonsingular submatrix of nonbasic variables (rather than just one element). Suppose the detached coefficient form is partitioned into 2 row blocks and 5 column blocks (with the last column corresponding to the right-hand side):

A = \left[ \begin{array}{ccccc}
I & 0 & P & R & e \\
0 & I & Q & S & f
\end{array} \right].
(Note: each LaTeX: I is an identity matrix of appropriate dimension.) If LaTeX: P is nonsingular, the basis can change by entering all (nonbasic) variables associated with the columns of LaTeX: P for all basic variables associated with the rows of LaTeX: P.

Blocking polyhedron

The blocking polyhedron of a polyhedron LaTeX: P is LaTeX:  \{ y \in \mathbb{R}^n_+ : y^T x \ge 1 \mbox{ for all } x \in P\}.

Bolzano-Weierstrass theorem

A fundamental existence theorem guaranteeing a maximum and a minimum.

A continuous function on a non-empty compact set achieves a minimum and a maximum over the set.

A useful corollary to the Bolzano-Weierstrass theorem is:

Suppose LaTeX: f is continuous on its non-empty feasible region, LaTeX: F, and that LaTeX: F is a closed set. Then, LaTeX: f achieves a maximum on LaTeX: F if there exists LaTeX: x^0 such that LaTeX:  \{ x \in F : f(x) \ge f(x^0) \} is bounded.

In particular, if we have a quadratic objective of the form LaTeX: f(x) = x^T Q x + c^T x , it is sufficient for LaTeX: Q to be negative definite. Although often credited to Weierstrass alone, it was proven by Bolzano in 1817, and it was known to Cauchy near the end of the 19th century.

Bordered hessian

The complete Hessian of the Lagrangian:

\left[ \begin{array}{cc}
H_x[L(x,u,v)] & G^T \\
G &  0
\end{array} \right],
G = \left[ \begin{array}{cc}
\nabla g(x) & 0 \\ 0 & \nabla h(x)
\end{array} \right].
Here we let LaTeX: H_x[L(x,u,v)] be the Hessian of the Lagrangian (L) with respect to LaTeX: x only. The southwest and northeast blocks are the constraint gradients because they give the cross derivative between LaTeX: x and the Lagrange multipliers LaTeX: (u,v).

Bottleneck TSP problem

The problem asks if there a traveling salesman problem tour whose value is less than some given value?

Bottleneck assignment problem

This is the assignment problem, where the "cost", LaTeX: c_{i, j}, is the rate at which person LaTeX: i can perform job LaTeX: j. The objective is to maximize the rate of production of the entire assembly line.

Bottleneck transportation problem

This is the transportation problem with a traversal time instead of arc cost, and the objective is to minimize the time it takes to satisfy the demands:

\min \left\{ \max \{ t_{ij} : x_{ij} > 0 \} : x \ge 0, \sum_j x_{ij} \le s_i, \sum_i x_{ij} \ge d_j \right\},

where LaTeX: x_{ij} is the flow from source LaTeX: i to destination LaTeX: j, and LaTeX: t_{ij} is the time it takes to ship across arc LaTeX: (i, j) (independent of the quantity shipped).

Bounds consistency

A consistency property for variables over integer domains where consistency is only defined on the minimum and maximum values of the integer domain.

Box constraint

Simple bounds of the form LaTeX: L \le x \le U.

Braess paradox

The phenomenon that in a traffic network in which travel times depend on link (arc) flows the addition of a link (arc) can increase travel time for every user if all users minimize their own travel time. Hence, the paradox is that building a new road can make traffic congestion worse. See Myths and Counter Examples in Mathematical Programming for an example.

Branch and bound

Born in integer programming (IP), branch and bound (B&B) is more generally used in global optimization, including continuous functions. A prototype is as follows.

  1. Initialize set list with feasible region (LaTeX: X plus all constraints), say LaTeX: S, with associated bounds LaTeX: -\infty, LaTeX: \infty (finite bounds on the objective LaTeX: f can be computed, if computationally preferable, and the initial set could be a superset of the feasible region). Initialize LaTeX: f^* equal to the lower bound (LaTeX: -\infty, unless an initial feasible solution is known).
  2. Branch by selecting the LaTeX: k-th set from list, say LaTeX: S_k, and splitting into a partition, say LaTeX: S_k = A \vee B such that the interior of LaTeX: A does not intersect the interior of LaTeX: B. (Could partition LaTeX: S_k into more than 2 subsets.)
  3. Bound LaTeX: f in LaTeX: A and/or in LaTeX: B, say LaTeX: L(S) \le f(x) \le U(S) for all LaTeX: x \in S, for LaTeX: S = A and/or LaTeX: B. Define LaTeX: U(S) = -\infty if it is determined that LaTeX: S is empty (in which case LaTeX: S will be discarded in the next step).
  4. Test if LaTeX: U(S) \le f^* + t, for LaTeX: S = A and/or LaTeX: B. If not, put LaTeX: S on list. (If the test passes, LaTeX: S is discarded by not being put on the list. Note: LaTeX: f^* is best solution value to date, and LaTeX: t is a tolerance that says we want to get a solution whose objective value is within LaTeX: t of the optimal value.) Further, if LaTeX: L(S) > f* and the associated point, LaTeX: x(S), is feasible (where LaTeX: f(x(S)) = L(S)), replace LaTeX: f^* with LaTeX: L(S) and LaTeX: x^* with LaTeX: x(S). If the list is not empty, go to branch.

B&B terminates when the list becomes empty, and the best feasible solution obtained is LaTeX: x^* with value LaTeX: f^*, which is within LaTeX: t of optimality. (If LaTeX: f^* = -\infty, then there is no feasible solution.) Notes:

  1. A particular bounding rule for IP is to solve an LP relaxation, whose value is an upper bound on the integer-valued maximum. The lower bounds do not change unless an integer solution is obtained from the LP, in which case the best solution to date can change.
  2. Branching rules can range from breadth-first, i.e. expand all possible branches from a node in the search tree before going deeper into the tree, to depth-first, i.e. expand the deepest node first (and backtrack, as necessary). For example, suppose we branch on the value of LaTeX: x_j, which could be LaTeX: 0 or LaTeX: 1. The breadth-first rule would solve the LPs for each of the two cases. The depth-first rule would solve for LaTeX: x_j=0 and postpone the complementary case, LaTeX: x_j=1 to consider another variable dichotomy, going deeper into the search tree. To illustrate, the following are search progressions in LaTeX: \{0, 1\}^2;.


             ____.                 ____.____          ____.____
            |                     |         |        |         |
          (x=0)                 (x=0)     (x=1)    (x=0)     (x=1)
                 1                     2                  3
             ____.____             ____.____          ____.____
            |         |           |         |        |         |
          (x=0)     (x=1)       (x=0)     (x=1)    (x=0)     (x=1)
           /\                    /\         /       /\         /\
       (y=0)(y=1)            (y=0)(y=1) (y=0)   (y=0)(y=1) (y=0)(y=1)
                 4                     5                  6


              ____.                ____.              ____.
             |                    |                  |
           (x=0)                (x=0)              (x=0)
                                 /                  /\
                             (y=0)              (y=0)(y=1)
                 1                     2                  3
              ____.____            ____.____          ____.____
             |         |          |         |        |         |
           (x=0)     (x=1)      (x=0)     (x=1)    (x=0)     (x=1)
            /\                   /\         /       /\         /\
        (y=0)(y=1)           (y=0)(y=1) (y=0)   (y=0)(y=1) (y=0)(y=1)
                 4                     5                  6
  3. A best-first search is to select the node (i.e., set on list) that seems most promising, such as one with the greatest upper bound. Typically, there are multiple criteria for branching, and these change depending on whether a feasible solution has been obtained.
  4. Backtracking need not go in exactly the reverse order (i.e., need not have a LIFO rule), but care must be taken when reordering the path.
  5. When the problem is not finite (not IP), branching rules need to consider the hyper-volume of each set in a partition to ensure convergence.
  6. The B&B methods we consider in OR are special cases of heuristic search in Artificial Intelligence (AI).

Branch and cut

A hybrid of Branch and bound and cutting plane methods.

Branch and price

The price operation is the same as column generation.

Branching heuristic

See branching rule.

Branching rule

A branching rule (or branching heuristic) specifies how a search tree is constructed during search. More specifically, it is a procedure or set of conditions for choosing the next variable to be branched on. It is part of the search strategy.

In constraint programming a branching rule is commonly referred to as a variable ordering heuristic.


A point where some property of a function changes, such as its slope. For example, a piece-wise linear, univariate function with LaTeX: k breakpoints has the form:

f(x) = a_i x + b_i \mbox{ for } c_i \le x < c_{i+1}
where LaTeX: i=1, ...,k and LaTeX: c_{k+1} is defined to be infinity and LaTeX: c_1 is the least value of LaTeX: x. This function is continuous if it is equal at breakpoints, but the slope changes if LaTeX: a_i \neq a_{i+1} for LaTeX: i=1, ..., k-1.

Broyden-Fletcher-Goldfarb-Shanno method

See BFGS method

Broyden-Fletcher-Goldfarb-Shanno update

See BFGS update.

Broyden family

This is a family of algorithms to solve an unconstrained nonlinear program where the objective function is in LaTeX: C^2;. The algorithms in this family differ by how they update the approximation of the hessian inverse (assumed to exist). These are of the form:

H = aH^{\mbox{BFGS}} + (1-a)H^{\mbox{DFP}},

with the two matrices being the Broyden-Fletcher-Goldfarb-Shanno update (BFGS) and the Davidon-Fletcher-Powell update (DFP), respectively. If LaTeX: a is in LaTeX: [0, 1], this is called the Broyden convex family.

Bundle method

A class of nonsmooth-optimization methods for the numerical minimization of a convex function, LaTeX:  \theta : \mathbb{R}^n \rightarrow \mathbb{R}. At the LaTeX: kth iteration the technique uses the available information contained in a bundle LaTeX: 
(\theta(v^1), g_1, \ldots , \theta(v^k), g_k) that is obtained from an oracle; here each LaTeX: g_i is a subgradient of LaTeX: \theta at LaTeX: v^i.

Important applications are Lagrangian relaxation and column generation applied to a primal problem

\max \{ p(x) : x \in X, \; h(x) = 0 \}.

In this context, for a given LaTeX: v \in \mathbb{R}n ,

\theta(u) = \max \{ p(x) - v^T h(x) : x \in X \}

results from the maximization of the Lagrange function; each LaTeX: g_i is LaTeX: -h(x_i), with LaTeX: x^i maximizing the Lagrangian at LaTeX: v^i. Minimizing LaTeX: \theta is the dual problem.

If minimizing LaTeX: \theta(v), a basic method to construct the next iterate LaTeX: v^{k+1} uses the cutting-plane paradigm: one minimizes the piecewise-linear function

\theta_k(v) = \max \{ \theta (v^i) + g_i^T (v - v^i) : i = 1, ... ,k \},

which can be viewed as a model of LaTeX: \theta (in fact LaTeX: \theta_k \le \theta). Minimizing LaTeX: \theta_k is a linear program, which is called the restricted master program in the context of column generation. One of the deficiencies of this method is instability -i.e. the sequence LaTeX: v^k usually oscillates uncontrollably. Several possibilities exist to stabilize the process; one, adopted in the bundle approach, replaces the minimization of LaTeX: \theta_k by the quadratic program

\min_v \{ \theta_k(v) + (1/(2t)) |v - C|^2 \}.

Here the positive LaTeX: t is a stability parameter that is usually adjusted at each iteration, and the stability center LaTeX: C is

some LaTeX: v^i (LaTeX: i \le k) taken from the bundle, typically one that gave the best LaTeX: \theta-value.


A collection of mathematical programming software solvers.


See constraint satisfaction problem.

Calculus of variations

See variational calculus.

Capacity expansion

This is an augmentation to any mathematical program that has constraints representing a capacity limit. Suppose LaTeX: g(x) \le b represents the limitation of using more than LaTeX: b units of capacity, where LaTeX: g(x) is the actual amount used for policy LaTeX: x. Then, this becomes a capacity expansion model by replacing the constraint with

g(x) - vC \le b \;\mbox{ and }\; 0 \le v \le 1.
Here LaTeX: C is the maximum amount of new capacity, with LaTeX: vC the portion brought online by choice of LaTeX: v. This is also reflected in the objective function with an added cost term, say LaTeX: F(v), such that LaTeX: F(0)=0 and LaTeX: F increasing on LaTeX: [0, 1]. If continuous use of the capacity is not possible (i.e., one builds a new plant or not), the model further requires LaTeX: v to be binary-valued (0 or 1), and LaTeX: F(1) is called the fixed charge.

Capital budgeting problem

In its elementary form there is a fixed amount of capital, say LaTeX: C, that can be allocated to any of LaTeX: n investments. Each investment has a minimum level, say LaTeX: L, and a maximum level, say LaTeX: U. The expected return on investment is a function, LaTeX: v_j(x_j), where LaTeX: x_j is the level of the LaTeX: j-th investment opportunity (LaTeX: L_j \le x_j \le U_j). Risk is measured by a standard deviation from the expected return, say LaTeX: s_j(x_j). The problem is to maximize total expected return, subject to a budget constraint: LaTeX: \sum_j x_j \le C, and a risk constraint: LaTeX: \sum_j v_j(x_j) + a_j s_j(x_j)} \le b, where LaTeX: a_j and LaTeX: b are parameters. The returns on the investments could be correlated. Then, if LaTeX: Q is the variance-covariance matrix, the risk constraint is quadratic: LaTeX: v^Tx + x^T Q x \le b. (Also see the portfolio selection problem.)

Caratheodory conditions

For the classical Lagrange form, LaTeX: \min \{f(x): x \in \mathbb{R}^n, \; h(x) = 0 \}, where LaTeX: f and LaTeX: h are smooth, the following conditions are necessary for a feasible LaTeX: x to be optimal: there exists LaTeX: (y_0, y) \in \mathbb{R}^{m+1}\backslash \emptyset, called multipliers, such that

</p><p>y_0 \nabla f(x) - y^T \nabla h(x) = 0.

This reduces to the Lagrange Multiplier Rule when LaTeX: y_0 is not zero (divide by LaTeX: y_0), which must be the case if LaTeX: \nabla h(x) has full row rank.

Caterer problem

A caterer has booked his services for the next LaTeX: T days. He requires LaTeX: r_t fresh napkins on the t-th day, LaTeX: t=1,\ldots,T. He sends his soiled napkins to the laundry, which has 3 speeds of service: LaTeX: s=1, 2, or LaTeX: 3 days. The faster the service, the higher the cost, LaTeX: c_s, of laundering a napkin. He can also purchase new napkins at a cost, LaTeX: c_0. With an initial stock of LaTeX: N napkins, the caterer wishes to minimize his total cost. (This can be formulated as a transportation problem.)

Cauchy-Schwarz inequality

This is inequality

|x^T y| \le \|x\| \|y\|,

where LaTeX: \|*\| is the Euclidean norm. (Note: for LaTeX: \|x\| = \|y\| = 1, LaTeX: x^T y is the cosine of the angle between vectors LaTeX: x and LaTeX: y.) .


The integer round-up of a value, LaTeX: x:

\mbox{Ceiling}(x) = \min \{ z : z \in \mathbb{Z}, \; z \ge x \}.</P>

Examples: LaTeX: \mbox{Ceiling}(5)=5, LaTeX: \mbox{Ceiling}(5.001)=6, LaTeX: \mbox{Ceiling}(-1.2) = -1. Also, see floor.

Central path

A differentiable curve where each point is the analytic center of a polyhedron associated with a linear program. Specifically, suppose we have the primal-dual pair:

P: \; \max \left\{c^Tx : x \ge 0, \; Ax \le b \right\}.
\; D: \; \min \left\{ y^Tb :  y \ge 0, \; A^Ty \ge c \right\}.

Let LaTeX: s = b - Ax be the primal slack. Then, the primal central path is:

x^*(\mu) = \arg\max \left\{c^Tx + \mu \left(\sum_j \ln(x_j) + \sum_i \ln(s_i)\right) : 
x > 0, \; Ax + s = b, \; s > 0 \right\},
for LaTeX: u > 0, where LaTeX: \mu is the penalty parameter. Let LaTeX: d = -(c - A^y) be the dual surplus. Then, the dual central path is:

y^*(u) = \arg\min \left\{ y^Tb - \mu \left(\sum_j \ln(d_j) + \sum_i \ln(y_i)\right): 
y > 0, \; A^Ty - d = c, \; d > 0 \right\},
for LaTeX: u > 0. A key fact is that if the LP optimality region is nonempty and bounded, then the central path converges to its analytic center as LaTeX: \mu \downarrow 0.

Certainty equivalent

See stochastic program.


A set of sufficient conditions for some property to hold.

A certificate of optimality is a set of conditions that certify some feasible solution, LaTeX: x, is optimal. One example comes from duality: let LaTeX: y be feasible in any dual with objective value LaTeX: F(y). Then, LaTeX: F(y)=f(x) is a certificate of optimality. In particular, if we have a linear program:

\max \{ cx : Ax = b, x \ge 0,

a certificate that a feasible LaTeX: x is optimal is the set of dual conditions:

LaTeX: A^Ty \ge c \;\;\mbox{ and }\;\;  c^Tx = y^T b.

(Note the general use of duality does not require any convexity assumption, and the dual can be weak.)

A certificate of unboundedness is a collection of conditions that certify a mathematical program is unbounded. An example from duality is that a primal feasible solution is found and the dual is determined to be infeasible. In particular, if we have a linear program:

LaTeX: \max \{cx : Ax = b, \; x \ge 0 \},

a certificate that this is unbounded is the existence of a feasible x and the determination that LaTeX: A^Ty \ge c implies a contradiction.

A certificate of infeasibility is a set of conditions that certify a mathematical program is infeasible. One class comes from duality: a dual sequence is found whose objective diverges. In particular, if we have a linear program:

\max \{ c^T x : Ax = b, \; x \ge 0\},

then a certificate that this is infeasible is the existence of a sequence LaTeX: y^k such that LaTeX: A^T y^k \ge c for all LaTeX: k and LaTeX: y^k \rightarrow \infty as LaTeX:  y \rightarrow \infty.

Chance constraint

See stochastic program.

Character of solution

The idea is to define character by solution status. In linear [integer] programming, columns might be partitioned into

  • basic vs. nonbasic at Lower bound vs. nonbasic at Upper bound
  • positive in some optimal solution vs. zero in every optimal solution

Rows might be partitioned into

  • inactive vs. active
  • non-binding vs. binding
  • positive dual price in some optimal solution vs. zero dual price in every optimal solution

Only some rows and columns might be partitioned, and others are free to change. The character is the property we want to fix by the partition. For example, if the LP has activities for operating a plant, we might want to specify the plant operation as a character of the solution, never mind specific levels (or maybe some minimal throughput is specified).

Then, one conducts sensitivity analysis by asking for preservation of the character. Classical LP sensitivity analysis does this by preserving the optimality of the basis; interior point analysis seeks to preserve the optimality of the partition. Both are special cases, and one might want to focus on only a portion of the LP. Here are some properties of interest:

  1. If the character holds at [c', b'] and at [c", b"], it holds for all [c, b] in their line segment.
  2. The stability region is a convex set.

This notion of character extends naturally to combinatorial optimization, such as preserving a key portion of a TSP tour or job-order in a schedule.

Characteristic cone

The Characteristic cone of a polyhedron, LaTeX: P is

\{y: x + y \in P \;\mbox{ for all }\; x \in P\}.

If LaTeX: P = \{x : Ax \le b\}, then its characteristic cone is LaTeX: \{y: Ay \le 0\}, which is the same as the recession cone.

Chemical equilibrium problem

The problem is

\min \left\{ c^Tx + \sum_j \left(x_j \ln(x_j) \right) : x > 0, \; \sum\limits_j x_j = M, \; Ax=b \right\},
where the objective is the Gibbs free energy function with LaTeX: x_j being the number of moles of species LaTeX: j, LaTeX: b_i being the atomic weight of atom of type LaTeX: i, and LaTeX: A_{i,j} being the number of atoms of type LaTeX: i in one mole of species LaTeX: j. The equation, LaTeX: Ax=b, is mass balance. (Its significance is that it was an early application to show how an LP approach can apply to a nonlinear mathematical program.)

Chinese postman problem

A mailman starts from the post office, delivers the mail on his route, covering each street at least once, then returns to the post office. The problem is to find a traversal such that the total distance travelled is minimized. (Some streets are one-way, and the postman is driving.) Abstractly, the problem is to find a least-cost cycle in a graph that uses each edge and arc (forward direction only) at least once. This differs from the Traveling Salesman Problem in that the postman can traverse an edge more than once. A related problem is to determine whether the postman can cover his route within a prescribed distance (while starting and ending at his post office). If so, the cycle is produced. This is computationally equivalent to the minimization problem (in a complexity sense), and is sometimes taken as the problem definition.

Chinese remainder theorem

Let LaTeX: m_1, m_2, \ldots, m_k be relatively prime positive integers, and let LaTeX: b_1, b_2, \ldots, b_k be any integers. Then, there exists LaTeX: x such that LaTeX:  x = b_i(\hspace*{-8pt}\mod x_i) for each LaTeX: i. The value of LaTeX: x is uniquely determined by LaTeX: m_1, m_2, \ldots, m_k and LaTeX: b_1, b_2, \ldots, b_k.

Cholesky factorization

Given an LaTeX: m \times m symmetric matrix LaTeX: A, a lower triangular matrix, LaTeX: L, is obtained, called the Cholesky factor, such that LaTeX: A = LL^T. This is particularly useful for solving linear systems, LaTeX: Ax=b, by using forward substitution, LaTeX: Ly=b, then backward substitution, LaTeX: L^T x = y. The original algorithm assumes LaTeX: A is positive definite, but it can apply more generally. This is also called Cholesky decomposition.

Chvatal cut

A cutting plane of the form,

LaTeX: \sum_j F(A_j) x_j \le F(b),

where LaTeX: F is a Chvatal function.

Chvatal function

This class is defined recursively:

  • LaTeX: F is a Chvatal function (of rank 0) if it is a linear function.
  • If LaTeX: F and LaTeX: G are Chvatal functions, then LaTeX: aF+bG is a Chvatal function for any nonnegative LaTeX: a and LaTeX: b, and its rank is LaTeX:  \max \{\mbox{rank}(F), \mbox{rank}(G)\}. (Some require LaTeX: a and LaTeX: b to be rational.)
  • If LaTeX: F is a Chvatal function of rank LaTeX: r, then LaTeX: | F | is a Chvatal function, and its rank is LaTeX: r+1 if LaTeX: | F | is not equal to LaTeX: F.

This arises in integer linear programming in several ways, see Gomory function.

Closed form solution

This is where one could express a solution (e.g., optimal) by an equation of the


x^* = F(p),

where LaTeX: F is some function of the parameters, LaTeX: p. This is typically meant to suggest that no algorithm is needed, but one must be careful since LaTeX: F is not limited in any way, such as being "easy" to evaluate. For example, under mild assumptions, the coordinates of a global maximum of a function, LaTeX: f, on the domain LaTeX: X, is given by the following integral:

x^*_j = \lim_{w \rightarrow \infty}
\frac{\int_{X} x_j e^{wf(x)} dx} {\int_{X} e^{w f(x)} dx}.

(The idea is to weight "mass" at the global maximum, and it is valid if the set of global maxima is convex.) This is not too useful, and the integral would typically have to be evaluated with a numerical method - i.e., an algorithm.

Closed function

In convex analysis, this is where its epigraph is a closed set. (If f is concave, it is its hypograph that must be closed.)

Closed map

Let LaTeX: A: X \rightarrow 2^X be a point-to-set map and suppose LaTeX: x^k \rightarrow x and LaTeX: y^k \rightarrow y are such that LaTeX: y^k \in A(x^k) for all LaTeX: k. Then, LaTeX: A is closed at LaTeX: x if LaTeX: y \in A(x). The map LaTeX: A is closed on a subset of LaTeX: X, say LaTeX: S, if LaTeX: A is closed at each LaTeX: x in LaTeX: S.

Closed set

A set is closed if it contains all its limit points, i.e. if LaTeX: x^k \rightarrow x and each LaTeX: x^k is in LaTeX: X, then LaTeX: x is in LaTeX: X. By the logic of the definition, an empty set is closed (it is also open). A finite number of intersections of closed sets is a closed set, so the feasibility region of a mathematical program in standard form is closed if each constraint function is continuous. (LaTeX: g can be only upper semi-continuous for the constraint, LaTeX: g(x) \le 0.) An example of a set that is not closed is: LaTeX: {x: x < 0}. The point 0 can be approached by a sequence, say LaTeX: x^k = 1/k, but 0 is not in the set. The closure of a set is the union of the set and all of its limit points.

Closure condition

The conditinos that the closure of a nonempty strict interior be equal to the level set: LaTeX:  \mbox{cl}( \{x \in X: g(x) < 0 \} = \{x \in X: g(x) \le 0\}. Here is an example where the closure condition fails:

g(x) = \left\{ \begin{array}{cl}
<pre>      -1-x, & x < -1 \\
       0,   & -1 \le x < 0 \\
       -1 + (x-1)^2, & 0 \le x < 2 \\
        x-2, & 2 \le x.
       \end{array} \right.

Note that LaTeX: g is continuous and quasiconvex. Its level set is LaTeX: \{ x: g(x) \le 0 \} = [-1, 2], but the strict interior, is LaTeX: (0, 2), so its closure is only LaTeX: [0, 2]. We lose the flat portion, LaTeX: [-1, 0).

This is important in the use of interior point methods and in stability. Here are some functions that satisfy the closure condition:

When equality constraints are present, there are two forms of extension of the closure condition: to consider the relative strict interior, and to consider "feasible sequences". The first generally assumes LaTeX: h is affine, so the closure condition becomes:

LaTeX: \mbox{cl}( \{ x: Ax=b, \; g(x) < 0 \}) = \{ x: Ax=b, g(x) \le 0\}.
The second defines a family of maps:

I_i - = \{ x \in X: g(x) < 0, \; h_k(x) = 0 \; \mbox{ for } k \neq i, \; h_i(x) < 0 \}


I_i+ = \{ x \in X : g(x) < 0, \; h_k(x) = 0 \; \mbox{ for } k \neq i, \; h_i(x) > 0 \}.

Then, the closure condition is that LaTeX: I_i- and LaTeX: I_i+ are not empty, and

\mbox{cl}(I_i-) = \{ x \in X : g(x) \le 0, \; h_k(x) = 0 \; \mbox{ for } k \neq i, \; h_i(x) \le 0 \}


\mbox{cl}(I_i+) = \{ x \in X : g(x) \le 0, \; h_k(x) = 0 \; \mbox{ for } k \neq i, \; h_i(x) \ge 0 \}.

Coercive function

The function LaTeX: f:\mathbb{R}^n \rightarrow \mathbb{R}^n is coercive with respect to LaTeX: X \subseteq \mathbb{R}^n if there exists a vector LaTeX: x^0 \in X such that for any sequence LaTeX: x^k \in X having the property that LaTeX: \|x^k \| \rightarrow \infty, we also have

\lim_{k \rightarrow \infty} | (f(x^k) - f(x^0))^T (x^k - x^0) | = \infty,
where any norm can be used. This arises in variational inequalities (and complementarity problems).

Some people use a different definition, where LaTeX: f : \mathbb{R}^n \rightarrow \mathbb{R} (i.e., a scalar, real-valued function):

|f(x)| / \| x \| \rightarrow \infty,
\;\mbox{ as }\; \|x\| \rightarrow \infty
Note that the two definitions differ, even for LaTeX: n=1. For example, LaTeX: f(x)=|x| is coercive under the first definition and is not under the second. For the bilinear function, LaTeX: f(x,y)=x^TAy, for some square matrix LaTeX: A, LaTeX: f is coercive if there exists LaTeX: a > 0 such that
y^TAy \ge a||y||^2.

Column generation

This pertains to solving a linear program whose columns are generated during pricing. Typically, the number of columns is astronomically large, possibly infinite. An example is when solving the randomized program, as with the Generalized Lagrange Multiplier method. In that case, column generation consists of maximizing the Lagrangian. A similar view applies to Dantzig-Wolfe decomposition. From the dual view, this is a cutting plane method since generating a column in the primal corresponds to generating a constraint in the dual.

The concept applies to any mathematical program, and the randomized program model hightlights the duality and how a completely general mathematical program can be considered with (generalized) linear programming.

Combinatorial program

Let LaTeX: N = \{1, ..., n\} and consider a finite collection of subsets, say LaTeX: {S_1, S_2, ..., S_m}. For each subset there is an objective function value, LaTeX: f(S_k), and the problem is to optimize LaTeX: f(S_k). Typically, the feasible subsets are represented by inclusion or exclusion of members such that they satisfy certain conditions. This then becomes a special class of integer programs whose decision variables are binary valued: LaTeX: x_{i,k}=1 if the i-th element is in LaTeX: S_k; otherwise, LaTeX: x_{i,k}=0. (IP formulations are not always easy, and often there is more than one formulation, some better than others.)

Also see integer program.

Compact formulation

In integer programming, this refers to having a polynomial number of constraints. For example, look at the travelling salesman formulations. The linear form has an exponential number of "subtour elimination consgtraints," so it is not compact. The

quadratic assignment formulation is compact.

Compact set

This has a general meaning in mathematical analysis. For mathematical programming, where we work in finite-dimensional Euclidean vector spaces, it means the set is closed and bounded. This is often an assumption about the feasibility region in order to ensure the existence of an optimum value for a continuous objective function – see Weierstrass' theorem. Any finite set is compact. An open interval LaTeX: (a,b) is not compact because its endpoints, LaTeX: a and LaTeX: b are limit points but are excluded from the set.

Compatibility theory

The idea that a solution's character does not change for a particular perturbation. In linear programming the character could be an optimal basis, and the theory is concerned with whether a particular basis remains optimal when the data is changed in a prescribed direction. A Fundamental Theorem of Basis Compatibility is the following:

LaTeX: h is an admissible direction for perturbing LaTeX: (b, c) if, and only if, it is compatible with some equilibrium basis.

The range of compatiblity of a basis, LaTeX: B, for a direction, LaTeX: h, is the greatest step for which LaTeX: B remains optimal:

\sup \{t : B \;\mbox{ is optimal for the LP defined by }\; r + th\}.

The basis spectrum is the greatest range:

\sup\{ \mbox{range}(B; h) : B \;\mbox{ is optimal for the LP defined by }\; r\}.

Complementarity condition

Satisfying complementary slackness.

Complementarity problem

Let LaTeX: F: \mathbb{R}^n \rightarrow \mathbb{R}^n. The complementarity problem (CP) is to find LaTeX: z \ge 0 such that LaTeX: F(z) \ge 0 and LaTeX: F(z)z' = 0. It is complementary because every solution has the property that either LaTeX: z_j=0 or LaTeX: F_j(z)=0 (or both) for each j. The linear complementarity problem (LCP) is when LaTeX: F(z)=Mz+q.

The problem generalizes to allow bounds so that LaTeX: L \le z \le U. Then, LaTeX: F(z) is required to satisfy:

LaTeX: F_j(z) \ge 0 if LaTeX: z_j = L_j

LaTeX: F_j(z) \le 0 if LaTeX: z_j = U_j

LaTeX: F_j(z) = 0 if LaTeX: L_j < z_j < U_j.

Complementary slackness

The condition that two non-negative vectors are orthogonal. It arises in the Kuhn-Tucker conditions, where the Lagrange multiplier, say LaTeX: y, must be orthogonal to the (inequality) constraint functional value: LaTeX: y^T g(x)=0. This means either LaTeX: y_i=0 or LaTeX: g_i(x)=0 for each LaTeX: i -- that is, if a constraint is not active, its Lagrange multiplier must be zero.

Complementary variables

If LaTeX: u and LaTeX: v are the two vectors that satisfy complementary slackness, each pair, LaTeX: u_i and LaTeX: v_i are called complementary variables. In linear programming, primal levels and dual prices are complementary variables.


A measure of computer time or space to solve a problem by an algorithm as a function of the problem's dimensions. Suppose LaTeX: T(n) is the time it takes to solve an instance of the problem with dimension LaTeX: n. Then, the algorithm has (worst-case) time complexity LaTeX: K(n), if the greatest time it could take to solve an instance of the problem is LaTeX: O(K(n)). When LaTeX: K(n) is a polynomial, we say the algorithm has polynomial time complexity. The Klee-Minty polytope shows that the elementary simplex method does not have polynomial time complexity.

The average time complexity is the average (rather than worst) time of an algorithm (or class of algorithms) over some class of problems, for which some probability distribution is assumed. Absent a modifier, the complexity of an algorithm is taken to mean its worst-case time complexity.

Whereas complexity refers to the performance of an algorithm, see the notion of NP-completeness for the related meaning of problem complexity. The standard notation to describe complexity in terms of problem dimensions, say LaTeX: n, is LaTeX: O(K(n)). This ``big-O notation means the following: a function, LaTeX: f:Z^+ \rightarrow R, is LaTeX: O(K(n)) if there exists a constant, LaTeX: c, and LaTeX: N in LaTeX: Z^+, such that LaTeX: f(n) \le cK(n) for all LaTeX: n \ge N. For example, if an algorithm requires LaTeX: 5n^3 + 2n + 10 fundamental operations on a problem of size LaTeX: n, its time complexity is LaTeX: O(n^3). See the supplement on complexity for more information.

Complicating variables

Those variables which, when fixed at particular values, render the remaining mathematical program relatively easy to solve. One example is the use of binary variables for fixed charges in an otherwise linear program. More generally, see Benders' decomposition.


In a graph (or network), this is a maximal connected subgraph. (The number of components is 1 if the graph is connected.)

Composite concave program

A global optimization problem whose objective is a composite concave function.

Composite function

A function of a function, of the form LaTeX: g(f(x)).

Concave function

The real valued function LaTeX: f is concave if its domain, say LaTeX: X, is a convex set, and for LaTeX: x, y \in X and LaTeX: 0 \le a \le 1:

f(ax + (1-a)y) \ge af(x) + (1-a)f(y).
Equivalently, LaTeX: f is concave if its hypograph is a convex set or if its negative is a convex function.

Condition number

This is LaTeX: \|A\| \|A^{-1}\| when LaTeX: A is nonsingular and LaTeX: \|\cdot\| is some matrix norm. This arises in convergence analysis, where LaTeX: A is the hessian. Whenever LaTeX: A is symmetric and positive definite (as the hessian of a strictly convex function), and the matrix norm is that induced by the Euclidean norm (i.e., LaTeX: \|A\| := \max \{\|Ay\|: \|y\|=1\}), the condition number is the ratio of the extreme values of the eigenvalues. It is often used to measure the convergence rate of an ascent algorithm, due to Kantorovich's inequality. (Unfortunately, it is erroneously used to measure the goodness of an algorithm – see Myths and Counterexamples in Mathematical Programming.)


A set, LaTeX: C, with the property that if LaTeX: x \in C, then LaTeX: a x \in C, for all positive real LaTeX: a. A convex cone is a cone that is also a convex set. Equivalently, LaTeX: C is a convex cone if LaTeX: C = C + C. (An example of a cone that is not convex is the union of the axes.) A polyhedral cone is a cone that is also a polyhedron; equivalently, LaTeX: C is a polyhedral cone if there exists a matrix LaTeX: A such that LaTeX: C = \{ x : Ax \le 0\}. An example of a cone that is not polyhedral is LaTeX: \{(x,y,z): x^2+y^2-z^2=0\}.

A quadratic cone is of the form LaTeX: \{x: x^T Q x \le 0\}, where LaTeX: Q is any (real) matrix. If LaTeX: Q is negative semi-definite, the cone is all of LaTeX: \mathbb{R}^n space. If LaTeX: Q is positive definite, the cone is just the origin, LaTeX: \{0\}. So, quadratic cones usually arise when LaTeX: Q is indefinite. Example: LaTeX: \{x : x \ge 0, \; 2x_1x_2 \ge \|(x_3,...,x_n)||^2\}. See each of the following special cones:

Cone of optimality

Given an LP in standard form, let LaTeX: B be an optimal basis (for which the dual is feasible). Then, the cone of optimality of LaTeX: B is LaTeX: {b : B^{-1}b& \ge 0}. It is so called because LaTeX: B remains an optimal basis for any LaTeX: b in this cone.

Conic program

The standard form is

LaTeX: \min \{ c^T x : Ax=b, \; x \in K\},

where LaTeX: K is a cone (not necessarily convex). This is more general than it looks. Suppose LaTeX: S is any non-empty set, and we have the mathematical program:

LaTeX: \min \{ c^T x : x \in S\}.

Then, defining LaTeX: K = \{t(1,x): x \in S, \; t \ge 0\}, we have that the above problem is equivalent to the conic program:

LaTeX: \min \{ c^Tx : (x_0, x) \in K, \;  x_0 = 1 \}.

This problem class also includes the semi-definite program, as LaTeX: x could be a matrix.

Conjugate directions

Direction vectors chosen in an algorithm that are conjugate with previous directions, with respect to the hessian of the objective. For the unconstrained quadratic, LaTeX: x^T Qx + c^Tx (where LaTeX: Q is symmetric), direction vector LaTeX: d is conjugate if LaTeX: d^T Q d^k = 0 for all previous direction vectors, LaTeX: d^k.

Conjugate duality

See the supplement on particular duals.

Conjugate function

The convex conjugate of LaTeX: f on LaTeX: X, denoted LaTeX: f^* on LaTeX: X^*, is the greatest convex approximation from below:

LaTeX: f^*(x^*) = \sup \{ x^T x^* - f(x): x \in X\},

and LaTeX: X^* = \{x^*: f^*(x^*) < \infty\}, i.e. LaTeX: X^* is the effective domain of LaTeX: f^*). The concave conjugate of LaTeX: f on LaTeX: X, denoted LaTeX: f^{\wedge} on LaTeX: X^{\wedge}, is the least concave approximation from above:

LaTeX: f^{\wedge}(x^*) = \inf\{ x^T x* - f(x): x \in X\},

and LaTeX: X^{\wedge} = \{x^*: f^{\wedge}(x^*) > -\infty}. This is a foundation for Lagrangian duality, viewed in response space.

Conjugate gradient method

An ascent method for unconstrained optimization such that successive directions are conjugate with respect to the hessian for a quadratic objective function.

Conjugate vectors

With respect to a symmetric matrix, LaTeX: A, LaTeX: u and LaTeX: v are conjugate if LaTeX: u^TAv = 0. (In particular, if LaTeX: A=I, LaTeX: u and LaTeX: v are conjugate if they are orthogonal.)

Conjunctive normal form

A logical expression that is written as a conjunction,

LaTeX: C_1 \wedge C_2 \wedge \ldots \wedge C_m,

where LaTeX: \wedge is the conjunction (logical and). Each LaTeX: C_i is called a clause and has the form

LaTeX: L_{i_1} \vee L_{i_2} \vee \ldots \vee L_{i_n},

where LaTeX: \vee is the disjunction (logical inclusive or) and each LaTeX: L_i is a literal -- a proposition or its negation.

Connected network

In the undirected case, a graph is connected if, for any pair of nodes, there is a path that contains both of them. In a directed graph, or network, the path may be required to be directed (i.e., follow the orientations of the arcs), in which case the network is strongly connected; or, the path may be allowed to ignore the arc orientations, in which case the network is weakly connected.


In general, a logical theory is consistent, if it contains no contradictions.

In constraint programming, a (partial) variable assignment is called consistent with respect to a set of constraints if none of the constraints is violated.

More generally, in CP, different definitions of consistency are used as a basis for inference. See:


Pertains to a system of equations and inequalities, for which there exists a feasible solution.


Usually this is a relation of the form of an inequality, g(x) <= 0, or equation, h(x)=0. More generally, it can be any restriction the decision variables must satisfy. For example, some regard "x must be integer-valued" as a constraint, while others would say that this is simply the domain of the decision variables in the mathematical program. There are other relations, such as the logical form of a precedence constraint: IF x=0 THEN y=0.

In constraint programming, a constraint is an arbitrary relation imposed on a set of variables LaTeX: X = \{x_1, \dots, x_n\} where each variable must be assigned on value from its domain. A constraint holds if all variables in LaTeX: X are assigned values from their domain and the relation is satisfied.

Constraint graph

Constraint graphs are used to conceptualize the set of constraints and their scopes in a CSP.

A primal constraint graph represents variables by nodes and associates an arc with any two nodes in the scope of in the same constraint.

A dual constraint graph represents each constraint with a node and associates a labeled arc with any two nodes whose scopes share variables. The arcs are labeled by the shared variables.

Constraint optimization problem

The general aim for constraint optimization problem is to find an optimal solution to a set of constraints subject to some objective function obj. For example, consider a CSP LaTeX:  P = \{ C, x_1 \in D_1, \dots, x_n \in D_n\} together with a function LaTeX:  obj : Sol \to \mathbb{R} . LaTeX: obj is a mapping from solutions to LaTeX: P (i.e., all elements the set LaTeX: Sol) to the set of real numbers, LaTeX: \mathbb{R}.

A solution, LaTeX: d, to a constraint optimization problem is a complete assignment of variables that satisfies all constraints LaTeX: C \in P and for which the value LaTeX: obj(d) is optimal (either minimal or maximal, depending on the sense of the optimization).

See also constraint satisfaction problem.

Constraint programming

Constraint Programming (CP) is a paradigm to solve satisfaction or optimization problems by a two-step approach: propagation and tree search. Given a CSP the decision variables’ domains are tightened by first propagating the constraints. If after this step all domains contain only one value then a solution has been found. If at least one domain is empty, the CSP has no solutions. Otherwise, the second step, tree search, is initiated: by specifying a variable ordering heuristic and a value ordering heuristic, a search tree is constructed which is then traversed according to one of various search strategies. If the problem is a satisfaction problem, the first solution found is returned or it is proved that no solution exists. For an optimization problem the whole tree is traversed in a branch-and-bound search and the best solution returned.

Constraint propagation

Constraint propagation is a general term for inference techniques used in constraint programming. The basic technique consists in explicitly forbidding values or combinations of values for some variables of a problem because a given subset of its constraints cannot be satisfied otherwise. Constraint propagation appears under different names depending on both historical period and author including constraint relaxation, constraint inference, local consistency enforcing, and rule iteration.

Typically, the propagation process is repeated until a "fixed point" is reached where no further values can be removed from the domain of any variable.

See constraint propagator.

Constraint propagator

A constraint propagator is an algorithm (often associated with a global constraint) which implements a constraint propagation algorithm. Typically, the algorithm prunes the domains of the variables involved in the constraint, removing values for which it can be shown that they participate in no solutions given the current search node.

Constraint propagator is also now as filtering algorithm.

Constraint qualification

Conditions on the constraint functions (LaTeX: g and LaTeX: h) that are sufficient to make the Lagrange Multiplier Rule valid. Here is an example to illustrate what can go wrong:

LaTeX: \max x : x^2 \le 0.

Since LaTeX: x=0 is the only feasible solution, it is optimal. The Lagrange Multiplier Rule requires that there exist LaTeX: u for which LaTeX: f' - u^T g' = 0, but LaTeX: f' = 1 and LaTeX: g'= 0, so no such LaTeX: u exists. Slater used this example in illustrating his interiority condition. The classical qualification, given by Lagrange's multiplier theorem without inequality constraints, is that LaTeX: \nabla h(x) have full row rank, which stems from the Implicit Function Theorem. Another constraint qualification is that all constraint functions be affine (even with redundant constraints). Each constraint qualification gives a sufficient condition for the Lagrange Multiplier Theorem to be valid. A constraint qualification is necessary if it must hold in order to guarantee that the Lagrange Multiplier Rule is valid for all smooth LaTeX: f having optimal value at LaTeX: x.

Constraint satisfaction problem

A classical Constraint Satisfaction Problem (CSP) is defined by a finite set of variables LaTeX: X = \{x_1, \dots, x_n\} where each variable LaTeX: x_i is associated with a finite domain of integers LaTeX: D_i= \{1, \dots, N_i\}. Furthermore, a finite set of constraints LaTeX:  \mathcal{C} = \{C_1, \dots, C_m \} is imposed on the variables, where a constraint LaTeX: C_i is an arbitrary relation on a subset of variables LaTeX: X. A solution to the CSP is a value assignment to each variable such that the value lies within the respective domain and all constraints are satisfied.

There exist different variants and sub-classes of constraint satisfaction problem:

Constraint solver

A constraint solver typically consists of two parts:

A solution is found as soon as every variable is assigned a value from its domain and all constraints are satisfied.

A constraint solver can typically search for one solution (returning the first solution it finds, if one exists) or for all solutions.

Continuous program

A mathematical program with continuous variables. Moreover, the term is commonly used to further imply that the objective and constraint funtions are continuous, an assumption that allows certain analytical results. For example, Weierstrass first proved that continuous functions attain both a maximum and a minimum over a compact set. If the problem contains both continuous and integer variables, then the problem is said to be a mixed integer program.


Also called an isoquant of a function, LaTeX: f. The projection of its graph into its domain: LaTeX: \{x \in X : f(x) = c\}, where LaTeX: c is the contour value.

Contraction map

Also called a contractor. A function that brings points closer together, forming the foundation for convergence by successive approximation. Mathematically, we have LaTeX: f: X \rightarrow X, where LaTeX: X is in a normed space, with norm of LaTeX: x denoted by LaTeX: \|x\|. LaTeX: f is a contractor if there exists a constant LaTeX: K in LaTeX: [0,1) such that LaTeX: \|f(x)-f(y)\| \le K \|x-y\| for all LaTeX: x and LaTeX: y in LaTeX: X. See the supplement on fixed points.


Specifically, convergence of an algorithm. The algorithm is represented as the point-to-set map, LaTeX: x' \in A(x), where there is some selection function to choose LaTeX: x' if LaTeX: A(x) has more than one member. Convergence means that the sequence, LaTeX: x^k, has a limit point, say LaTeX: x, such that LaTeX: x satisfies certain conditions. Ideally, the conditions are that LaTeX: x is an optimal solution, at least locally, but this is often not the definition used in a convergence theorem. (For example, in unconstrained optimization, most algorithms converge to a stationary point, which need not be even a local optimum.)

Let LaTeX: \textstyle s^k \rightarrow 0, where LaTeX: s^k is a scalar series pertaining to the sequence LaTeX: x^k. Typically, LaTeX: s^k = \|x^k - x\|, which is sometimes called policy convergence (to a solution, LaTeX: x). We could also have LaTeX: s^k = f(x^k) - f^*, where LaTeX: f is the objective function, in which case the concern is with value convergence to an optimal value, LaTeX: f^*.

Some related terms and concepts:

  • Dual convergence. Dual values, notably Lagrange multipliers, converge to a dual solution.
  • Geometric convergence. Same as linear convergence, but usually used when the sequence is exactly a geometric progression: LaTeX: s^k = r^k(s^0) - LaTeX: r^k is LaTeX: r to the power of LaTeX: k
  • Global convergence. Usually means the same as globally convergent, but some mean that the algorithm convergences to a global solution.
  • Globally convergent. Convergence to a solution from any starting point.
  • Linear convergence. Order is 1 and rate is less than 1 (see below).
  • Local convergence. Some mean locally convergent, but some mean that the limit point is a local optimum (or just satisfies necessary conditions, see Myths and Counter Examples in Mathematical Programming to avoid misconception).
  • Locally convergent. There exists an open neighborhood of 0 such that {s^k}-->0 from any s^0 in the neighborhood.
  • Order of convergence. The order is LaTeX: \sup\{ p : \lim_{k \rightarrow \infty} \|s^{k+1}\| / \|s^k\|^p < \infty \}. The order being 1 is linear convergence and the order being 2 is quadratic convergence. Most (non-finite) algorithms to solve mathematical programs are between linear and quadratic.

    Let LaTeX: v^k = \|s^k\| and suppose LaTeX: v^0 = 1 (for notational convenience). The term order derives from the approximate equation, LaTeX: v^{k+1} := r(v^k)^p, where LaTeX: r is the rate. If this equation were exact, we would have LaTeX: v^k = r^k if LaTeX: p=1, and LaTeX: v^k = r^{(p^k-1)/(p-1)} if LaTeX: p > 1, for all LaTeX: k. If LaTeX: r=0.1 and LaTeX: p=1, we gain one decimal place each time: LaTeX: v^1 = 0.1, LaTeX: v^2 =0.01, LaTeX: v^3 = 0.001, etc. If LaTeX: p=2, the number of accurate decimal places approximately doubles each iteration: LaTeX: v^1 = 0.1, LaTeX: v^2 = 0.0001, LaTeX: v^3 =0.0000001, etc. Unfortunately, the underlying equation is not exact – see Myths and Counter Examples in Mathematical Programming to avoid misconception. Some authors call this q-order (or Q-order) convergence to distinguish it from variations of the definition of order. Each definition is designed to capture the notion of how many digits of accuracy are added as the sequence approaches its limit.

  • Rate of convergence. This is generally used to mean the limiting ratio: LaTeX: \lim_{k \rightarrow \infty} \|s^{k+1}\| / \|s^k\|^p, given the order is LaTeX: p.
  • Sublinear convergence. Order is 1 and rate is 1 (slower than all linear convergence), e.g., LaTeX: s^k = 1/k.
  • Superlinear convergence. Order is 1 and rate is 0 (faster than all linear convergence), e.g., LaTeX: s^k = (1/k)^k.
  • Stochastic convergence. This applies when the successor point is a random variable, as in simulated annealing. Here are the most common types of convergence for a sequence of random variables, LaTeX: X_n \rightarrow X.
    • with probability LaTeX: 1 or almost everywhere (abbr., a.e.). LaTeX: P\{\lim_{n \rightarrow \infty} X_n = X\} = 1.
    • in probability. LaTeX: P\{\|X_n - X\| > e\} \rightarrow 0 for all LaTeX: e > 0.
    • in distribution. The sequence of cumulative distribution functions (cdf), converges point-wise: LaTeX: F_n(x) \rightarrow F(x) for all LaTeX: x at which LaTeX: F is continuous, where LaTeX: F_n is the cdf of LaTeX: X_n and LaTeX: F is the cdf of LaTeX: X.
    • in p-th mean. LaTeX: E\{\|X_n - X\|^p\} \rightarrow 0. For LaTeX: p=2, this is called convergence in quadratic mean or in mean square.

Convex combination

An affine combination of vectors whose coefficients are non-negative, i.e., LaTeX: \textstyle\sum_k a_k x^k, where LaTeX: a_k \ge 0 and LaTeX: \textstyle\sum_k a_k = 1.

Convex cost flow problem

The min-cost network flow problem with a nonlinear convex cost function.

Convex function

Let LaTeX: f:X\rightarrow R. Then LaTeX: f is convex if LaTeX: X is a convex set, and for LaTeX: x, y \in X and LaTeX: 0 \le a \le 1:

LaTeX: f(ax + (1-a)y) \le af(x) + (1-a)f(y).

Equivalently, its epigraph is convex.

Convex hull

(of a set). The intersection of all convex supersets (which can be limited to halfspaces). Equivalently, the set of all convex combinations of points in the set (which can be limited to convex combinations of at most n+1 points, in n dimensions, which is known as Carathéodory's Theorem).

Convex program

The mathematical program LaTeX: \min \{f(x) : x \in X \} is convex if the objective function LaTeX: f is convex, and the feasible region LaTeX: X is a convex set. In particular, the mathematical program LaTeX: \min \{f(x) : g(x) \le 0, \; h(x) = 0 \} is convex if LaTeX: g is convex, and LaTeX: h is affine. See the supplement, Convex Cones, Sets, and Functions.

Here are some key properties of a convex program:

Convex set

If any two points are in the set, so is their line segment, i.e.

LaTeX: x, y \in X \Rightarrow (1 - a) x + a y \in X,

for any LaTeX: 0 \le a \le 1. See Myths and Counter Examples in Mathematical Programming.

Convex simplex method

This extends the general strategy of the simplex method in linear programming to maximize a concave objective over linear constraints of the form LaTeX: Ax=b and LaTeX: x \ge 0. (A form of local convergence applies when the objective is not concave, but is smooth.) A tableau is maintained, but nonbasic variables need not be at zero level. The partition is used to compute a generalized reduced cost of the form LaTeX: d(x) = \nabla f(x) - y^T A. The LaTeX: y vector is determined by the basic portion: LaTeX: 0 = \nabla_B f(x) - y^TB (so LaTeX: y = (\nabla_B f(x))B^{-1}).

As in the simplex method, pricing is used to determine whether there is an improving nonbasic vector. If not, the algorithm terminates (satisfying the Kuhn-Tucker conditions); otherwise, a line search is used to determine the new point, changing only the basic variables to compensate for the change in the one nonbasic level. If this causes a basic variable to reach zero, the basis is changed with the pivot operation.

Convexity cut

A class of cutting planes derived by considering a convex set in a polyhedron, P. The form of the cut is

LaTeX: \sum_i \frac{y_i}{t_i} \ge 1,

and it is derived as follows.

Let LaTeX: x_0 be an extreme point of a given polyhedron, which contains a given set, LaTeX: S. Suppose LaTeX: C is a convex set in LaTeX: \mathbb{R}^n whose interior contains LaTeX: x_0 but does not contain any point of LaTeX: S. Let LaTeX: v_1, v_2, \ldots, v_n be linearly independent vectors in LaTeX: \mathbb{R}^n, and let LaTeX: t_i be such that LaTeX: t_i > 0 and LaTeX: [x_0, x_0 + t_i v_i] \in C for all LaTeX: i=1, 2, \ldots, n (e.g., the edges emanating from LaTeX: x_0 in LaTeX: V). Define LaTeX: V = [v_1,..., v_n] and LaTeX: M = [\mbox{diag}(t_i)V]^{-1} (i.e., LaTeX: M(i,.) = V^{-1} / t_i). Then, the cutting plane LaTeX: \{x: M(x-x_0) \ge 1\} excludes LaTeX: x_0 without excluding any other LaTeX: x in LaTeX: S for which LaTeX: V^{-1}(x-x_0) \ge 0. The cut, LaTeX: M(x-x_0)\ge 1, is equivalent to the above form, LaTeX: y \, \mbox{diag}(1/t_i) \ge 1, where LaTeX: y = x_0 - Vw for some LaTeX: w \ge 0.

One special case is the intersection cut for a 0-1 integer program:

  • LaTeX: S = \{(x, s): x \in \{0, 1\}^n, \; s \in \mathbb{Z}^m, \; Ax + s = b \};
  • LaTeX: P = {(x, s): x \in [0, 1]^n, \; s \in \mathbb{Z}^m, \; Ax + s = b\} \cap \{x: Cx \le c\}, where LaTeX: \{Cx \le c\} are the cuts already added;
  • LaTeX: x_0 is an extreme point of LaTeX: P, obtained by solving the LP relaxation with a simplex method, so LaTeX: x_0 = (u, w) is a basic optimum.
  • LaTeX: V = B^{-1}N, where LaTeX: B is the basis that transforms the original equations to LaTeX: u + B^{-1}Nw = x_0 LaTeX: ( =B^{-1}b );
  • LaTeX: C is a sphere, LaTeX: \{y: \|y\|^2 \le 1/4\}.

Corner polyhedron problem

Gomory's relaxed IP that drops the non-negativity constraints of basic variables. Suppose LaTeX: x = (u,0) is a basic solution to the LP, LaTeX: \mbox{opt}\{c^Tx : Ax=b, \; x \ge 0\}, where LaTeX: u is the vector of basic variables. Then, the corner polyhedron problem is

LaTeX: \mbox{opt} \{d^Tv: u \in \mathbb{Z}^m, \; v \in \mathbb{Z}^{n-m}+, \; Bu + Nv =b\},

(dropping LaTeX: u \ge 0), where LaTeX: d is the reduced cost of LaTeX: v in the LP solution. (Note: LaTeX: c^Tx = d^Tv for all LaTeX: x=(u,v) such that LaTeX: Ax=b.)

Correlation matrix

See elliptope.

Cost of captial

Used in computing as an objective function, it is the interest rate. The units are currency per time period (e.g., dollars per year). One must be careful in computing this. For example, the marginal cost of capital can be greater than the marginal interest rate because the latter applies to the entire borrowing, not just the last dollar borrowed. (This is the "classical" definition, which does not consider advances in option theory.)

Covering problem

The idea is to select enough members in each of a specified collection of sets; that defines covering the sets. Subject to this, there is a cost for the elements selected, and the objective is to minimize total cost. The IP form of the usual set covering problem is

LaTeX: \min \{c^Tx: Ax \ge 1, \; x \in \{0,1\}^n\},

where LaTeX: x_j=1 if element LaTeX: j is selected; else, LaTeX: x_j=0. The matrix LaTeX: A has 0's and 1's, where the LaTeX: i-th row corresponds to the LaTeX: i-th set to be covered: LaTeX: A_{i, j}=1 means element LaTeX: j is in set LaTeX: i; else, LaTeX: A_{i, j}=0. The constraint LaTeX: Ax \ge 1 means that at least one element of each set must be selected. This could be extended to require LaTeX: b_i elements of set LaTeX: i be selected by writing LaTeX: Ax \ge b. Also see vertex cover.

Cramer rule

To solve LaTeX: Ax=b, where LaTeX: A is nonsingular, the j-th coordinate is given by:

LaTeX: x_j = \frac{ \mbox{det}(A^j)}{\mbox{det}(A)},

where LaTeX: det(\cdot) is the determinant, and LaTeX: A^j is the matrix obtained if column LaTeX: j of LaTeX: A is replaced by LaTeX: b:

LaTeX: A^j = [A(., 1) A(., 2) \ldots A(., j-1) b A(., j+1), \ldots A(., n)].


This is a heuristic to generate an initial point for an algorithm. It is from the early linear programming computer systems that used a variety of heuristics to generate an initial basic solution.

Criss-cross method

Criss-cross method. A method in linear programming that chooses a pivot by possibly crossing from primal to dual, and vica versa. The least-index criss-cross method is a finite pivot method that chooses a pivot based on the least index of a column or row for which there is improvement in either the primal or dual infeasibility. It terminates at a basis when one of the following termination conditions is reached:

  1. Both the associated primal and dual solutions are feasible (implies optimality because complementary slackness holds by construction).
  2. There is evidence of primal and/or dual infeasibility (like the tests in the simplex method).
(Note: any basis is used to start and there is only one phase.)

Critical path

A longest path in a network, where length units are time durations. The nodes represent tasks, and the arcs represent precedence constraints. The path is critical because the associated tasks determine the total completion time of the project. Moreover, at least one of their duration times must decrease in order to decrease the total completion time.

Critical point

A critical point of a smooth function is an element of the domain for which the first partial derivatives are zero or infinite.

Crossover operation

See genetic algorithm.

Cumulative constraint

is a global constraint that models scheduling problems as follows. Given are a collection of tasks LaTeX: T=t_1 \dots t_n, such that each task LaTeX: t_i is associated with four variables:

  • release time, LaTeX: r_i, is the earliest time at which it can begin executing,
  • deadline (or due date), LaTeX: d_i, is the time by which it must complete
  • processing time, LaTeX: p_i, is the length of time it takes to complete
  • capacity requirement, LaTeX: c_i, is the capacity of the resource that it uses while it executes.

In addition, we are given the capacity variable LaTeX: C of the resource.

The LaTeX: cumulative(\{s_1 \dots s_n\},\{p_1 \dots p_n\}, \{c_1 \dots c_n\}, C) constraint assigns a starting time LaTeX: s_i for each task LaTeX: t_i such that:

  • LaTeX: r_i\leq s_i \leq d_i - p_i: the task starts at or after its release date and end at or before its deadline, and
  • LaTeX: \forall u \sum_{i|s_i\leq u\leq s_i+p_i} c_i\leq C: at any time unit LaTeX: u, the capacity of the resource is not exceeded.

Cut search

An algorithm strategy for global optimization (notably for integer programming) that consists of two alternating phases: search and cut. The search phase finds linearly independent vectors emanating from a root point, LaTeX: x_0, to setup a probe for the cut phase. Usually, LaTeX: x_0 is an extreme point, so the search is an edge probe phase that extends edges of the cone rooted at LaTeX: x_0 until it intersects the candidate solution points. Then, a cutting plane is added (usually a convexity cut) to exclude the root point. (A new root point is obtained by solving the new approximating problem on the smaller polyhedron.)


In a weakly connected network, this is a set of arcs whose removal decomposes the network into exactly two components. Separating node LaTeX: s from node LaTeX: t, the value of the cutset is the sum of the capacities of its arcs from the component containing LaTeX: s to the one containing LaTeX: t.

Cutting plane

A hyperplane whose halfspace cuts off a particular point, such as a non-integer extreme point solution of a linear program relaxation of an integer program. The cut is such that it does not eliminate any solution to the original problem. For example, suppose we want to solve

LaTeX: \max \{x: (x,y) \in \{0,1\}^2, \; 2x + 2y \le 1 \}.

The only feasible point is LaTeX: (0, 0). The LP relaxation is

LaTeX: \max \{x: (x,y) \in [0,1]^2, \; 2x + 2y \le 1\},

and an optimal solution is at LaTeX: (1/2, 0). One cutting plane is LaTeX: \{(x,y): x + y \le 1/3\}. A deeper one is LaTeX: \{(x,y): x + y \le 0\}. (When adding a cutting plane to the LP, a new relaxation is obtained, and the solution is closer to the IP solution.)

Cutting stock problem

Determine a way to cut standard sheets of material into various shapes (like clothes parts) to minimize waste. This is a (linear) integer programming model: patterns are specified, and LaTeX: A_{i, j, k} is the amount of LaTeX: i-th stock (e.g., sheet or roll of material) used to generate the LaTeX: j-th output by the LaTeX: k-th pattern. Then, let LaTeX: x_k be level of LaTeX: k-th pattern used and LaTeX: y = Ax. Thus, LaTeX: \textstyle s_i = \sum_j y_{i,j} is the amount of the LaTeX: i-th stock used, which is limited by its availability: LaTeX: s \le S; and LaTeX: \textstyle v_j = \sum_i y_{i, j} is the amount of LaTeX: j-th output generated, which is required to be in some range, say LaTeX: L \le v \le U (allowing some demand overruns or underruns). Some models seek to minimize the total waste: LaTeX: \textstyle \sum_i S_i - s_i. Other models consider cost too. The most common problems are 2-dimensional (odd shapes from sheets of material); the 1-dimensional case is called the trim problem. In the latter case, the stock index LaTeX: i is not needed. For example, consider a stock of rolls of paper with a given width, which must be slit into rolls of various widths. Then, LaTeX: A_{j, k} is how much of a stock roll is used by the LaTeX: k-th pattern to generate a roll of the LaTeX: j-th width. Moreover, LaTeX: \textstyle y_j = \sum_k A_{j, k} x_k is the amount of a stock roll used by pattern LaTeX: j to generate a roll of width LaTeX: j.

Cyclic descent

An algorithm that seeks to optimize a multivariate function by optimizing each coordinate with a univariate optimization technique (keeping the other coordinates fixed). This is repeated until a fixed point is reached. In general, it is possible for such a fixed point not to be an optimum (even locally) because a simultaneous change in variables could result in an improvement. An example is given by:

LaTeX: f(x, y) = (y - x^2)(y - 2x^2) \; \mbox{ at } \; (x, y) = (0,0).

LaTeX: f(0, y) has a minimum at LaTeX: y=0, and LaTeX: f(x, 0) has a minimum at LaTeX: x=0. However, LaTeX: f decreases with simultaneous change, LaTeX: y=(3/2)x^2. Along this parabola, LaTeX: f(x,y) = -(x^4)/4, which is negative for LaTeX: x nonzero. Thus, LaTeX: 0 is not a local minimum of LaTeX: f. For further discussion about this example, see Myths and Counter Examples.


In linear programming this means revisiting a basis, mainly referring to a simplex method, so that the algorithm would cycle through the same sequence of bases, not converging to an optimal one. See the cycling supplement

DFP method

This is a method to solve an unconstrained nonlinear program that proceeds as follows.

  1. Start with any symmetric, negative definite matrix, say LaTeX: H (e.g., LaTeX: -I), and any point, say LaTeX: x. Compute LaTeX: g=\nabla f(x), and set each of the following:
  2. direction: LaTeX: d = -Hg.
  3. step size: LaTeX: s \in \arg\!\max \{ f(x + td): t \ge 0\}.
  4. change in position: LaTeX: p = sd.
  5. new point and gradient: LaTeX: x' = x + p and LaTeX: g' = \nabla f(x').
  6. change in gradient: LaTeX: q = g' - g.
  7. Replace LaTeX: x with LaTeX: x' and update LaTeX: H by the DFP update to complete the iteration.

DFP update

This is a way to update an approximation of an inverse Hessian, used for unconstrained optimization. Using the notation in the DFP method, the update is:

LaTeX: H' = H + \frac{pp^T}{p^Tq} - \frac{H q q^T H}{q^T H q}.

Note: LaTeX: \textstyle pp^T is a rank 1 matrix (the outer product of the vector LaTeX: p with itself).

Damped Newton method

See Modified Newton method.

Dantzig-Wolfe decomposition

The principle applied here is that the total problem consists of:

  1. subprograms corresponding to its almost independent parts;
  2. master program that ties together the subprograms.
In linear programming   this applies to the following block diagonal structure, linked by master constraints.

           | Master constraints  |
           | B1 |
                 | B2 |
                            | Bk |

The procedure operates on the master LP with blocks used to generate extreme points as part of the pricing.

Dead end elimination

Same as fathoming. Used by scientists who have combinatorial optimization problems, this is more descriptive than the older term of "fathoming" a node. The idea is that a condition has been met that ensures that following some branch (i.e., setting values of variables) cannot lead to an optimal solution. This is often just a dominance property.

Decision variable

A decision variable represents a problem entity for which a choice must be made. For instance, a decision variable might represent the position of a queen on a chessboard, for which there are 100 different possibilities (choices) on a 10x10 chessboard or the start time of an activity in a scheduling problem. Each possible choice is represented by a value, hence the set of possible choices constitutes the domain that is associated with a variable.

Decomposition principle

The idea of decomposing a mathematical program into two or more sets of variables and associated constraints. The purpose is to separate some portion with special structure from the rest of the mathematical program.

Here are some particular ones that are in this glossary:

Decoupling principle

Arises in sensitivity analysis, as a corollary to the Compatibility Theorem:

An equilibrium basis is compatible with the change LaTeX: (db, dc) if, and only if, it is compatible with both LaTeX: (db, 0) and LaTeX: (0, dc).


The solution LaTeX: (x, y, s) to the primal-dual pair of linear programs:

LaTeX: \min \{c^Tx : x \ge 0, \; Ax = b\} and LaTeX: \max \{y^Tb: s \ge 0, \; yA + s = c\}

is degenerate if it is not strictly complementary ---i.e. LaTeX: x_i = s_i = 0 for some LaTeX: i. The pair LaTeX: (x_i, s_i) is primal degenerate if there is an optimal solution LaTeX: x' such that LaTeX: x'_i > 0. Similarly, the pair is dual degenerate if there is a dual optimal solution LaTeX: (y', s') such that LaTeX: s'_i > 0.

With regards to a basic optimal solution, such a solution is (primal) degenerate when some basic variable is at one of its bound values (canonically zero). A basic optimal is dual degenerate if one of its nonbasic variables has a zero reduced cost. Geometrically, this corresponds to a degenerate polyhedron. Suppose we have LaTeX: \{x: Ax \le b\} (where LaTeX: A is LaTeX: m by LaTeX: n). This polyhedron is degenerate if there exists an extreme point that is an element of the intersection of more than LaTeX: n hyperplanes. The pyramid is degenerate because four planes meet at a point. (Example.)

Algorithmically, degeneracy can cause cycling in the simplex method.

Degeneracy graph

An undirected graph, where the nodes represent bases and edges their adjacency. The degeneracy graph is a way to probe deeply into a degenerate extreme point (represented by more than one basic feasible solution). Among other things, it reveals good pivot steps through the point enroute to an optimum, and it provides a clear, efficient approach to sensitivity analysis.

  • Here are some terms for a degeneracy graph associated with a (basic) solution, LaTeX: x:
  • degeneracy power of LaTeX: x: number of bases corresponding to LaTeX: x.
  • internal node: a node whose neighbors are all in this degeneracy graph.
  • transition column: a tableau column such that a pivot exists for which that column enters the basis, and the new basis corresponds to a transition node.
  • transition node: a node with at least one member outside this degeneracy graph (so a pivot moves to a different solution).
  • Transition Node Pivoting(TNP) rule: a lexicographic rule using a special column to avoid internal nodes (but cycling is possible).

Degenerate polyhedron

See Degeneracy

Degree-2 inequality

Arises in mixed integer linear programming models for combinatorial programs on graphs, where each row of the LaTeX: A matrix has 2 non-zeroes (usually 1's), as in the node packing problem.

Degree of difficulty

See geometric program.


The portion of nonzeroes, see also sparsity.

Detached coefficient form

Given the system LaTeX: Ax=b, we augment the objective equation: LaTeX: c^{T}x - z = 0. Then, detaching the variables, the form is:

LaTeX: \left[ \begin{array}{rrr}
A & 0 & b \\ c & {-1} & 0
\end{array} \right].

Upon multiplying LaTeX: Ax = b by the inverse of a basis, say LaTeX: B, where LaTeX: A=[B \; N], the detached coefficient form becomes

LaTeX: \left[
I & B^{-1}N & 0 & B^{-1}b \\
c_B & c_N & -1 & 0 
\end{array} \right],

where LaTeX: c=[c_{B} \; c_{N}] (conformal with the partition of the columns of LaTeX: A). Now if we subtract LaTeX: c_B times the first LaTeX: m rows from the last row, this becomes

LaTeX: \left[
I & B^{-1}N & 0 & B^{-1}b \\
0 & c_N - c_B B^{-1}N & -1 & -c_B B^{-1} b
\end{array} \right]

Recognizing that LaTeX: B^{-1}b is the vector of basic levels, and the associated objective value LaTeX: (z) is LaTeX: c_B B^{-1}b, we can drop the first LaTeX: m columns plus column LaTeX: n+1 (corresponding to LaTeX: z) and form the tableau associated with this basis (adding labels to identify the basic variables in the rows and nonbasic variables in the columns).

Dichotomous search

This finds the maximum of a unimodal function on an interval, LaTeX: (a, b), by evaluating points placed near the center, approximating the bisection method. With LaTeX: \varepsilon being a small positive value, let LaTeX: x = (a+b)/2 - \varepsilon and LaTeX: y = (a+b)/2 + \varepsilon. Then, if LaTeX: f(x) < f(y), the new interval of uncertainty is LaTeX: (x, b); if LaTeX: f(x) > f(y), it is LaTeX: (a, y); if LaTeX: f(x) = f(y), it is LaTeX: (x, y). With LaTeX: N=2M function evaluations, the interval of uncertainty can be reduced to within LaTeX: (b-a)r, where

LaTeX: r = 2^{-M} + [1 - 2^{-M}]\varepsilon .

The same reduction occurs with LaTeX: N=2M+1, as an even number of evaluations is required. For example, with LaTeX: N=20, LaTeX: 2^{-10}=0.0009765, so the interval reduction is LaTeX: 0.0009765 + 0.999756\varepsilon. See fibonacci search   for a more efficient method.

Diet problem

Select levels of food consumption so as to minimize total cost subject to nutrient requirements. As a linear program,

LaTeX: \min \{c^Tx : Ax \ge b, \; x \ge 0\},

LaTeX: x_j is the level of food LaTeX: j at unit cost LaTeX: c_j. The LaTeX: i-th nutrient requirement (e.g., amount of vitamin C) is LaTeX: b_i, and LaTeX: A_{i, j} is the amount of nutrient LaTeX: i in each unit of food LaTeX: j.


Directed graph.

Dijkstra algorithm

See labeling algorithm.


  • Of a set: the dimension of the smallest affine space that contains the set. The dimension of an affine space is LaTeX: k if the maximum number of affinely independent points in the set is LaTeX: k+1. Of particular interest is the column space of a matrix LaTeX: A,
    LaTeX: \mbox{col}(A) = \{x: x=Ay \mbox{ for some } y \in \mathbb{R}^n\}.
    The dimension of this is the rank of LaTeX: A. Another is the row space of LaTeX: A,
    LaTeX: \mbox{row}(A) = \{x: x=A^Tu \mbox{ for some  } u \in \mathbb{R}^m\}.
    The dimension of this is also the rank of LaTeX: A.
  • Of an expression: the units of measurement (e.g., tons of apples, meters of rope, hours of labor). In a relation, such as LaTeX: y = ax + b, the units of LaTeX: y, LaTeX: ax and LaTeX: b must all be the same. Dimensional analysis is concerned with determining such consistency and inferring whatever units are missing.

Diophantine equations

A system of equations whose variables must be integer-valued. A famous one is in Fermat's Last Theorem:

x^n + y^n = z^n.
In general, solving the linear Diophantine equations, LaTeX: Ax = b for LaTeX: x \in \mathbb{Z}^n can be solved in polynomial time. However, with bounds, LaTeX: 0 \le x \le u, the problem is NP-complete.

Directed tree search

A class of algorithms designed to systematically search a decision space, where nodes correspond to partial solutions (sometimes called ``states). Examples are branch and bound and implicit enumeration. Also see heuristic search.

Directional arc consistency

Directional arc consistency is a form of arc consistency.

An ordering is associated to the variables in the constraint network and constraints are made arc consistent only in in the direction of the ordering. For instance, if LaTeX: x_i \le_{order} x_j and constraint LaTeX: c_{ij} exists, directional arc consistency enforces that each value in the domain of LaTeX: x_i is consistent with at least one value in the domain of LaTeX: x_j but not necessarily the other way around.

Directional derivative

The limit (if it exists) of the rate of change along a specified direction, say LaTeX: h,

LaTeX: \lim_{t \rightarrow 0^+} \frac{f(x+th) - f(x)}{t}.

In particular, the LaTeX: i-th partial derivative is with LaTeX: h=e_i. (Note that the directional derivative, as defined above, depends on the scale: the derivative for LaTeX: kh is LaTeX: k times the derivative for LaTeX: h. To avoid scale dependence, some authors require LaTeX: \|h\| = 1.) Recently, some people have called this a B-derivative (or Bouligand derivative), and functions that have directional derivatives in all feasible directions are B-differentiable. Some require the convergence to be uniform.

Discount rate

Also a discount factor. This accounts for the time value of money and arises naturally in financial models, such as a portfolio selection problem. A discount rate of 7% means $1 earned a year from now has a present value of approximately $93.46 (1/1.07). If $1 is earned n years from now, and the discount rate is LaTeX: r, the present value is $LaTeX: 1/(1+r)^n. In continuous-time models, there are variations, such as defining the present value of LaTeX: K dollars at time LaTeX: t to be LaTeX: K(1-e^{-rt}). In infinite horizon dynamic programming, the discount factor serves to make value iteration a contraction map. In that case, the fixed point of the stationary equation,

LaTeX: F(s) = \mbox{opt} \{r(x, s) + a F(T(s, x)): x \in X(s)\},

is obtained by successive approximation - i.e., value iteration for an infinite horizon DP. This converges to a unique fixed point, LaTeX: F, if LaTeX: 0 < a < 1. Here, the DP notation is used, where LaTeX: a is the discount factor.

Discrete program

A mathematical program in which the variables are restricted to a discrete set, commonly the integers. Subcases include integer programs and combinatorial programs.

Disjunctive constraint

This is a global constraint enforcing the fact that two activities LaTeX: a_i and LaTeX: a_j requiring the same unary resource cannot overlap in time. Therefore, either LaTeX: a_i precedes LaTeX: a_j or LaTeX: a_j precedes LaTeX: a_i. In general the disjunctive constraint achieves arc_consistency on the formula: LaTeX: end(t_i)\leq start(t_j)  \vee end(t_j)\leq start(t_i) .

For discrete (cumulative) resource capacity, the disjunctive constraint holds when LaTeX: c_i +c_j > C where LaTeX: c_i is the capacity requirement of task LaTeX: a_i, LaTeX: c_j is the capacity requirement of task LaTeX: a_j and LaTeX: C the capacity of the resource.

See also the Cumulative_constraint.

Disjunctive normal form

A logical expression that is written as a disjunction,

LaTeX: D_1 \vee D_2 \vee \ldots \vee D_m,

where LaTeX: \vee is the disjunction (logical inclusive or). Each LaTeX: D_i has the form

LaTeX: L_{i_1} \wedge L_{i_2} \wedge \ldots \wedge L_{i_n},

where LaTeX: \wedge is the conjunction (logical and) and each LaTeX: L_i is a literal – a proposition or its negation.

Disjunctive program

In its most general form, this seeks a solution to one of a finite list of mathematical programs:

LaTeX: x^* \in \mbox{argopt} \{f_k(x): x \in X_k \},

for some LaTeX: k=1,...,K. Usually, they have the same objective functions, LaTeX: f_k=f for all LaTeX: k, so the mathematical program with disjunctive constraints has the following form:

LaTeX: \mbox{opt} \left\{ f(x): 
x \in X, \left(g_1(x) \le 0 \mbox{ or } g_2(x) \le 0 \mbox{ or } 
\ldots \mbox{ or }  g_k(x) \le 0 \right) \right\}.

This can be reformulated as a mixed-integer program, using binary variables to represent the disjunction.

Divide and conquer

An algorithm strategy, such as dynamic programming, that divides a problem into parts. Suppose LaTeX: T(n) is the time complexity for a problem of size LaTeX: n, and that dividing the problem into LaTeX: s subproblems, each of size LaTeX: b, takes LaTeX: S(n) additional time. Then, LaTeX: T(n) = sT(n/b) + S(n) is called the divide-and-conquer recurrence relation. If LaTeX: S(n)=c > 0 (a constant) and LaTeX: b > 1, LaTeX: T(n) = O(n^[\log_b(s)]) for LaTeX: n=b,2b,3b, \ldots, and LaTeX: T(n) = O(log(n)) if LaTeX: s=1. For example, if a bisection method is used (as in searching a sorted list), the divide and conquer recurrence is LaTeX: T(n) = T(n/2) + 2 for LaTeX: n=2,4,6,8, \ldots. This implies LaTeX: T(n) = O(log(n)) since LaTeX: s=1 and LaTeX: b=2.


The domain of a variable is a set of all possible values that can be assigned to the variable. When the domain contains numbers only, the variables are called numerical variables. The domain of a numerical variable may be further restricted to integers, rational numbers or real numbers. When the domain contains boolean values only, the variables are called boolean variables. When the domain contains an enumerated type of objects, the variables are called symbolic variables (e.g. a variable that represents a day of the week).

A common propagation technique in constraint programming is to dynamically remove values from the domain of a variable when it can be inferred that the value does not belong to any solution given the set of variable assignments that have already been made. See


This is used in many contexts, but the general meaning is that something is uniformly better than something else. For example, consider two activities in a linear program, say LaTeX: j and LaTeX: k, such that:

  • LaTeX: j has greater cost: LaTeX: c_j \ge c_k
  • LaTeX: j produces less of each requirement: LaTeX: A_{i, j} \le A_{i, k} for LaTeX: i such that we require LaTeX: A_{i,*}x \ge b_i
  • LaTeX: j consumes more of each resource: LaTeX: A_{i, j} \ge A_{i, k} for LaTeX: i such that we require LaTeX: A_{i,*}x \le b_i
  • LaTeX: j produces or consumes at the same rate of goods to be conserved: LaTeX: A_{i, j} = A_{i, k} for LaTeX: i such that we require LaTeX: A_{i,*}x = b_i

Then, activity LaTeX: k dominates  activity LaTeX: j.

Doubly stochastic matrix

A non-negative matrix such that each row and each column sums to 1.


Another mathematical program with the property that its objective is always a bound on the original mathematical program, called the primal. Suppose the dual is LaTeX: \min \{ F(y): y \in Y\}. Then, LaTeX: F(y) \ge f(x) for all feasible LaTeX: x in the primal and all LaTeX: y in Y. This immediately implies that if the primal is feasible, the dual cannot be unbounded, and vice versa: if the dual is feasible, the primal cannot be unbounded. It also implies that if the dual is unbounded, the primal must be infeasible (and vice versa). A dual provides a sufficiency test for optimality, for if feasible LaTeX: x and LaTeX: y can be found such that LaTeX: f(x)=F(y), it follows that LaTeX: x is optimal in the primal and LaTeX: y is optimal in the dual. Weak duality is when this sufficient condition is all that can be guaranteed. Strong duality is when a finite optimal value to one problem ensures the existence of an optimal solution to the other and that their optimal objective values are equal. There are particular duals of significance.

Dual degeneracy

See Degeneracy.

Dual method

An algorithm for which each iterate is feasible in a dual mathematical program. An example is that the class of cutting plane methods, associated with the Lagrangian dual, is dual to the column generation method of Dantzig-Wolfe decomposition.

Dual norm

Given the norm is LaTeX: L_p, where LaTeX: p \ge 1, its dual norm is LaTeX: L_q, where LaTeX: (1/p) + (1/q) = 1. Note, if LaTeX: p=1 then LaTeX: q=\infty, and if LaTeX: q=1 then LaTeX: p=\infty. This is based on Hölder's inequality.

Dual price

A dual variable associated with a primal constraint.


The study and use of dual mathematical programs.

Duality gap

Numerically, it is the difference between primal and dual objective values. The phrase by itself means the presence (or existence) of a gap -- i.e., that the value is not zero.

Duality theorems

These establish the fundamental duality relations (weak or strong). Several significant ones are given here.

Dynamic CSP

Dynamic CSPs deal with the problems that are subject to exogenous change over time. A changing problem is viewed as a sequence of (static) CSPs in which one problem in the sequence is transformed to another through restrictions (i.e. additions of constraints) and/or relaxations (removal of constraints). Various solution definitions exist, most dealing with some notion of robustness (i.e., a solution to one static CSP is likely to remain a solution after a dynamic change in the problem) or ease of modification (i.e., a solution to one static CSP can be easily modified to become a solution for the next one).

Dynamic program

Based on the Principle of Optimality, this was originally concerned with optimal decisions over time. For continuous time, it addresses problems in variational calculus. For discrete time, each period is sometimes called a stage, and the dynamic program (DP) is called a multi-stage decision process. Here is the Fundamental Recurrence Equation for an additive process:

LaTeX: F(t, s) = \mbox{opt} \{r(t, s, x) + aF(t', s'): x \in X(t, s) \mbox { and } s'=T(t, s, x)\},

where LaTeX: F(t, s) is the optimal total return upon reaching time LaTeX: t in state LaTeX: s, and LaTeX: x is decision variable(s); LaTeX: s, LaTeX: s' are state variables; LaTeX: t, LaTeX: t' are time; LaTeX: X(t, s) is the decision space (usually depends only on state); LaTeX: r(t, s, x) is the immediate return; LaTeX: T(t, s, x) is the state transform.

In words, the optimal return upon reaching time LaTeX: t in state LaTeX: s equals the optimal return from an immediate decision plus the optimal return for the remainder of the process upon the transition to state LaTeX: s'. This new state is a function of the current time, state, and decision. For discrete time, LaTeX: t'=t+1 for the forward equation, and LaTeX: t'=t-1 for the backward equation. The multiplier, LaTeX: a is generally a positive scalar, such as a discount factor to yield the present value of money. (For some problems LaTeX: a=1, in which case the infinite horizon model might be ill posed.) More generally, the process need not be additive, so that '+' could be another composition operation.

A decision rule is a function, LaTeX: x^*(t, s), that specifies an optimal value of LaTeX: x upon reaching time LaTeX: t in state LaTeX: s. For a completely deterministic process, backtracking is used to obtain the usual form of an optimal solution from a known initial or final state. The process could be stochastic, in which case LaTeX: r and LaTeX: F are expected values, and the state transform is random with probability distribution, LaTeX: P[T(t,s,x) = s' | s,x]. Thus, LaTeX: F(t',s') is replaced by LaTeX: \sum_{s'} F(t',s')P[T(t,s,x)=s' | s,x].

DP is a technique that applies to static problems too. For example, consider the following recurrence equation for the knapsack problem:

LaTeX: F(n, s) = \max \{c(n) x + F(n-1, s-a(n)x): x \in \{0,1, ..., |_s/a(n)_|\} \}.

In words, this says that the max total return with LaTeX: n objects having total knapsack space LaTeX: s equals the max choice, LaTeX: x of how much of the LaTeX: n-th object we put into the knapsack, with immediate return LaTeX: c(n), plus the max total return from the remaining LaTeX: n-1 objects with the remaining space, LaTeX: s - a(n)x. The value of LaTeX: x (the stage decision variable) is any non-negative integer up to the space allowed: LaTeX: \lfloor s/a(n) \rfloor is the max number of objects of this type that could fit in the knapsack. There could also be explicit bounds on LaTeX: x, such as the 0-1 knapsack problem, in which case the policy set, LaTeX: X(n,s) simply has only those values.

Economic order quantity

Abbreviated EOQ. This is the level of inventory to order that minimizes the sum of holding and ordering costs. The inventory function, LaTeX: I(t), is the following periodic sawtooth function, where LaTeX: T is the time between orders, and LaTeX: Q is the ordering quantity:

LaTeX: I(t) = Q - dt \; for LaTeX:  \; 0 \le t \le T,

where LaTeX: d is the rate of demand (inventory units per time units), and LaTeX: I(t) = I(t-T) for LaTeX: t > T. The inventory becomes zero at LaTeX: T = Q/d, which requires a new order of LaTeX: Q units. The model is thus:

LaTeX: \min \; (1/2) hdT + K/T,

where LaTeX: h is the holding cost (currency per time units), so LaTeX: (1/2) hdT is the average holding cost, and LaTeX: K is the fixed cost of ordering, so LaTeX: K/T is the average ordering cost. The solution is LaTeX: T^* = (2K/hd)^{1/2}, which yields the Economic Order Quantity (EOQ): LaTeX: Q^* = (2Kd/h)^{1/2}. See the more general production scheduling problem.


The edge-finding rule is a constraint propagation technique that consists of deducing whether an activity LaTeX: a_i can, cannot, or must be executed before or after a set of activities LaTeX: \Omega, a_i \notin \Omega that all (including LaTeX: a_i) require the same resource. Two types of conclusions can then be drawn: new ordering relations (“edges” in the graph representing the possible orderings of activities) and new time-bounds (i.e., tightening of the earliest and latest start and end times of activities).

An edge-finding algorithm is a procedure that performs all such deductions.

Effective domain

The domain of a function for which its value is finite.

Efficient frontier

The collection of efficient points, see multiple objectives.


If a (scalar) value, LaTeX: \lambda, satisfies LaTeX: Ax = \lambda x for some vector, LaTeX: x \neq 0, it is an eigenvalue of the matrix LaTeX: A, and LaTeX: x is an eigenvector. In mathematical programming this arises in the context of convergence analysis, where LaTeX: A is the hessian of some merit function, such as the objective or Lagrangian.

Elastic program

Same as Phase I, the artificial variables are called elastic with the idea that the constraints can "stretch".

Element constraint

is a global constraint that is true iff a variable LaTeX: z is the LaTeX: yth element of an array LaTeX: X. More specifically, it involves a 1-dimensional variable array LaTeX: X = \langle x_1, \dots, x_n \rangle, an integer variable LaTeX: y and variable LaTeX: z, where LaTeX: X[y] = z, typically written as "element(X,y,z)".

Elementary matrix

A matrix formed by performing a single row operation on the identity matrix. These arise in linear programming, particularly for the product form of the basis: LaTeX: B = E_1 E_2 ... E_m, where each LaTeX: E_i is an elementary matrix.

Elementary simplex method

See simplex method.

Elementary vector

Let LaTeX: V be a subspace of LaTeX: \mathbb{R}^n. For LaTeX: v \in S, let LaTeX: S(v) denote its support set: LaTeX: \{j: v_j \neq 0\}. Then, LaTeX: v is an elementary vector of LaTeX: V if there does not exist LaTeX: v' \in V such that LaTeX: v' \neq 0 and LaTeX: S(v') \subseteq S(v). This extends to where LaTeX: V is not a subspace, such as the collection of non-negative vectors inLaTeX: \mathbb{R}^n, in which case LaTeX: S(v) is the set of coordinates with positive value. This is of particular importance in defining the optimal partition of an LP solution.


An ellipsoid centered at LaTeX: c is the set

LaTeX: E(c,Q) = \{x: (x-c)^T Q (x-c) \le 1\},

where LaTeX: Q is a symmetric, positive definite matrix.

Ellipsoid method

This algorithm seeks to solve LaTeX: Ax \le b by iterations that begin with LaTeX: x=0 and LaTeX: Q=\mbox{diag}(2^{-L}), where LaTeX: L is a measure of the smallest datum that can be represented -- i.e., LaTeX: x is a solution if, and only if, it satisfies LaTeX: Ax \le b + 2^{-L}. If the current iterate, LaTeX: x, does not satisfy this, a new pair LaTeX: (x', Q') is defined by choosing one of the violated inequalities, say LaTeX: A_{i,*}x > b_i. Then, let LaTeX: v = A_{i,*}^T, and apply the following rules:

  1. LaTeX:  x' = \frac{1} {(n+1)\sqrt{v'Qv}} \; (x - Qv)
  2. LaTeX:  Q' = \frac{n^2}{(n^2 - 1)(n+1) (v^T Q v)} \; (Q - 2 v v^T)
Now LaTeX: E(x',Q') defines an ellipsoid centered at LaTeX: x', and it contains all the feasible points in LaTeX: E(x,Q). After at most LaTeX: O(n^2 L) iterations, the algorithm terminates with a solution, or with the fact that no solution exists.

By treating the objective as a constraint, say LaTeX: c^T x \le c^Tx^* - e and/or LaTeX: c^T x \ge c^Tx^* + e, the ellipsoid method can be used to solve a linear program. Its significance is that it was the first algorithm for linear programming shown to have polynomial time complexity.


This arises in semi-definite programming. It is the set of all real, square symmetric matrices whose diagonal entries are all equal to one and whose eigenvalues are all non-negative. The dimension of the set is LaTeX: n(n-1)/2 (where the matrix is LaTeX: n \times n). For example, consider LaTeX: n=2:

LaTeX: M = \left[ \begin{array}{cc} 1 & a \\ a & 1 \end{array} \right].

Then, LaTeX: M is in the elliptope for LaTeX: n=2 if, and only if, LaTeX: -1 \le a \le 1, i.e. it is diagonally dominate (note the dimension is 1). An elliptope is also called a set of correlation matrices.


This is the region on or above the graph of a function LaTeX: f:

LaTeX: \mbox{epi}(f,X) = \{(x, z): x \in X, \; z \ge f(x)\}.

Equilibrium basis

In linear programming this is a basis that is optimal in both the primal and dual. It arises in compatibility theory and degeneracy graphs.

Equivalent CSPs

Consider two constraint satisfaction problems LaTeX: P_1 and LaTeX: P_2 and a sequence LaTeX: X of their common variables. We say that LaTeX: P_1 and LaTeX: P_2 are equivalent w.r.t LaTeX: X if

  • for every solution LaTeX: d to LaTeX: P_1 a solution to LaTeX: P_2 exists that coincides with LaTeX: d on the variables in X, and
  • for every solution LaTeX: e to LaTeX: P_2 a solution to LaTeX: P_1 exists that coincides with LaTeX: e on the variables in X.

Euclidean norm

See norm.

Euler-Lagrange equation

This is a necessary condition for LaTeX: y to solve the classical problem in variational calculus:

LaTeX:  \mbox{div}(F_{\zeta} (x, y(x), \nabla y(x))) = F_{\lambda}(x, y(x), \nabla y(x)),

where LaTeX: F(x, \lambda, \zeta): 
\Omega \times \mathbb{R} \times \mathbb{R}^n \rightarrow \mathbb{R}.

Evolutionary algorithm

This is a term that applies to any metaheuristic based on some biological metaphor that is a dynamical system whose evolution rules have a stochastic component. In particular, it includes genetic algorithms, scatter search, Particle Swarm Optimization, and some neural networks. It emerged as a unification of related computational methods, including cellular automata, which was conceived by Ulam and von Neumann in the 1940s. This unification is not universal. Some researchers cling to the GA metaphor but say that evolutionary algorithms have more operators; some say that the GA population is composed of bit strings, whereas evolutionary algorithms can have a broader range of population structures. It is the concept of evolution dynamics, which is population-based and stochastic, that distinguishes this class of algorithms from other methods of global optimization. Evolutionary computation uses the evolutionary algorithm as its foundation; typically, it is massively parallel.

Exact penalty function

A penalty function is exact if there exists a finite parameter value such that its maximum is a solution to the original mathematical program.

Existence of solution

This pertains to whether a mathematical program has a solution. One source of non-existence is that the feasible region is empty. Another is that it is unbounded, and it is possible to have a sequence of feasible points whose objective value diverges. In linear programming, these are the only sources of non-existence. In nonlinear programming, however, we can have asymptotes such that the objective is bounded but cannot achieve its infimum or supremum. See the Bolzano-Weierstrass theorem, which is fundamental. For linear programming, see theorems of the alternative.

Expanding subspace theorem

Let LaTeX: S_k be the subspace spanned by vectors LaTeX: d^k, which are Q-conjugate. Let LaTeX: x^0 be any initial point in LaTeX: \mathbb{R}^n, and let LaTeX: x^k be generated by these conjugate directions with optimal line search:

LaTeX: \displaystyle x^{k+1} = x^k + s_k d^k \;\; \mbox{ and } \;\;
<pre>s_k = \frac{-(Qx^k +c)}{(d^k)^T Q d^k}.

Then, LaTeX: x^k minimizes LaTeX: x^{T} Q x + c^{T} x on the affine set, LaTeX: x^0 + S_k.

Explicitly quasiconcave function

Negative is explicitly quasiconvex.

Explicitly quasiconvex function

A quasiconvex function that is also strictly quasiconvex. Its main property is that it rules out flat portions that cause the closure condition to fail. (Every convex function is explicitly quasiconvex.)

Exponent matrix

Powers of variables in posynomials in a geometric program.

Extended reals

Typically denoted by LaTeX: \mathbb{R}^*, this is the set of real numbers augmented with positive and negative infinity, i.e. LaTeX: \mathbb{R}^{*} = \mathbb{R} \cup \{\infty, -\infty\}

Extensional constraint

Constraints on a set of variables LaTeX: X =\{x_1, \dots, x_n\} can be stated in an extensional fashion by specifying a table that summarizes all allowed (or in an alternative formulation, all disallowed) combinations of assignments to the variables in LaTeX: X.

For instance, the extensional constraint LaTeX: table(\{x,y\},\{[1,2],[1,3],[2,3]\}) expresses the intensional constraint LaTeX: x < y where LaTeX:  x and LaTeX: y can take values LaTeX: \{1,2,3\}.

Also called a table constraint.

Exterior penalty function

A penalty function in which the associated algorithm generates infeasible points, approaching feasibility in the limit towards a solution. An example is to maximize LaTeX: f(x) - u h(x)^2 over LaTeX: x \in X and let LaTeX: u \rightarrow \infty in order to approach the solution to

LaTeX: \max \{f(x): x \in X, \; h(x)=0\}.

If, during the penalty iterations, LaTeX: h(x^*)=0 for some finite LaTeX: u, then LaTeX: x^* solves the original mathematical program. Otherwise, the idea is that LaTeX: x^*(u) approaches the feasibility condition, LaTeX: h(x^*)=0, as LaTeX: u gets large (though this need not happen without assumptions on LaTeX: f and LaTeX: h).


The idea of estimating a value by extending information at hand outside its immediate range. In linear programming, an extrapolation estimate of the optimal objective value uses dual price LaTeX: (y) as a slope: LaTeX: z(b + h) = z(b) + yh. For a sequence LaTeX: x^k, an extrapolation is an estimate of a limit point.

Extreme point

A point in the closure of a set, say LaTeX: S, that is not the midpoint of any open line segment with end points in closure of LaTeX: S. Equivalently, LaTeX: x is an extreme point of a closed set, LaTeX: S, if there do not exist LaTeX: y, z \in S \backslash {x} for which LaTeX: \textstyle x = (y + z)/2. When LaTeX: S is a polyhedron of the standard form, LaTeX: S=\{x: Ax=b, \; x \ge 0\}, with LaTeX: A of full row rank, we have one of the fundamental theorems of linear programming that underlies the simplex method:

LaTeX: x is an extreme point of the feasible region if, and only if, LaTeX: x is a basic feasible solution.

Extreme ray

An extreme ray of a closed set LaTeX: S is a ray in LaTeX: S that cannot be expressed as a (simple) sum of other rays in LaTeX: S. For example, the axes, LaTeX: \{t e_j: t \ge 0\}, are extreme rays of the non-negative vectors in LaTeX: \mathbb{R}^n. The ray LaTeX: \{t e: t \ge 0\}, however, is the sum of the axes since LaTeX: (t,...,t) = t e_1 + \ldots + t e_n. The recession direction of an extreme ray is sometimes called an extreme direction.

Extreme value

An extremum (pl. extrema). A minimum or maximum.


This applies to the factored system,

LaTeX: E_1 E_2 ... E_n x = b,

where each LaTeX: E_i is an elementary matrix. The recursion is:

E_1 x_1 & = & b \\
E_2 x_2 & = & x_1 \\
& \vdots & \\ 
E_n x_n & = & x_{n-1}

In the end, LaTeX: x = x_n is the solution. Each elementary system is solved as follows. For notational convenience, suppose the system is LaTeX: Ex = y, and LaTeX: v is the distinguished LaTeX: p-th column of LaTeX: E. Then,

LaTeX: x(p) = y(p)/v(p), and LaTeX: x(i) = y(i) - v(i)x(p) for LaTeX: i \ne p.

This is what is done in the (revised) simplex method, and each elementary solution is a pivot operation.


A convex subset, say LaTeX: S, of a convex set, say LaTeX: C, such that for any LaTeX: x and LaTeX: y in LaTeX: C

LaTeX: \{(1 - \alpha) x + \alpha y : 0 \le \alpha \le 1 \} \cap \mbox{ri}(S) \neq \emptyset
\Rightarrow x, \; y \in S.

The set LaTeX: C is itself a face of LaTeX: C, and most authors consider the empty set a face. The faces of zero dimension are the extreme points of LaTeX: C. When LaTeX: C is a polyhedron, i.e. LaTeX: \{x : Ax \le b\}, the faces are the subsystems with some inequalities holding with equality: LaTeX: \{x: Bx = c, \; Dx \le d\}, where LaTeX: A = [B \; D] and LaTeX: b = [c^T \; d^T]^T.


A face of a convex set, not equal to the set, of maximal dimension. If the set is polyhedral, say LaTeX: P = \{x: Ax \le b\}, where the defining inequalities are irredundant, then the facets are in one-to-one correspondence with LaTeX: \{x \in P: A_{i,:} x = b_i\} for LaTeX: i such that the equality is not forced – i.e., there exists LaTeX: x in LaTeX: P for which LaTeX: A_{i,:} x < b_i. Here LaTeX: A_{i,:} is the i-th row of LaTeX: A.

Facet-defining inequality

In integer programming, an inequality, LaTeX: a^Tx \ge b, such that LaTeX: \{x \in P: a^Tx \ge b\} is a facet of the integer polyhedron, LaTeX: P.

Facility location problem

Find a location (say x-y coordinates) that minimizes a weighted sum of distances to each of several locations. An example is to locate a hospital or school to serve a community defined by population centers. Originally considered by Fermat in the 17th century, there are now many variations of this problem. One variation is the location of an obnoxious facility, such as a garbage dump. In that case the objective is to maximize total (weighted) distance from the given points (there are constraints, such as cost, that require the location to be within some distance of the points).

Factorable function

This has a recursive definition, relative to a collection of elementary operations (e.g., sum, product) on other factorable functions, starting with some set of primitive functions. For example, starting with the power functions, LaTeX: ax^n, for any LaTeX: a and LaTeX: n, and allowing only the sum operation, we can obtain all polynomials. The polynomial LaTeX: x + 2x^2 + 9x^3 factors into 3 parts.

Factorable program

All functions (objective and constraints) are factorable. An example is the separable program, provided that each of the univariate functions is factorable. Another is the geometric program, where each posynomial is factorable.

Factored form of basis

Originally for linear programming, this pertains to writing a basis LaTeX: B in the form, LaTeX: B = F_1 F_2 ... F_k , where LaTeX: F_i are factors. Two forms of interest are: elementary factors, where each LaTeX: F_i is an elementary matrix, and LU decomposition: LaTeX: B=LU, where LaTeX: L is lower triangular and LaTeX: U is upper triangular. (LaTeX: L and LaTeX: U might be factored, notably into elementary matrices, which are lower and upper triangular, resp.) These have inexpensive updates when LaTeX: B changes to an adjacent basis.

Fail first principle

The fail first principle is an idea of what a variable ordering heuristic should attempt to do in tree search. It suggests that the variable to be assigned next should be the one which is most likely to lead to a dead-end. The heuristic aims at proving that the search is in a sub-tree with no feasible solutions as soon as possible.

One way of operationalizing this principle, is to choose the next variable to assigned to be the variable which is the most constrained. The extent to which a variable is constrained can be measured in different ways. One simple measure being the current size of the domain. Under this measure, the variable which has the smallest domain should be assigned next. Another common measure is to select the variable that appears in the largest number of constraints. More sophisticated estimates exist.

Faithfully convex

A convex function is faithfully convex if it is constant along some segment only if it is constant along the whole line containing this segment.

Fallacy of averages

Imagine standing with one foot on a keg of ice and the other in a fire. Your average body temperature may be fine, but the extremes will kill you! The fallacy of averages is the fallacious results you may get when replacing data with their expected values. Formally, the fallacy is stated as LaTeX: E(XY) \ne E(X)E(Y) – viz., the covariance is not zero. Another form of this fallacy is that LaTeX: E(f(X)) \ne f(E(X)) (unless LaTeX: f is linear). In particular, suppose we have

LaTeX: \max f(x; p): g(x; p) \le 0,

where LaTeX: p is a vector of parameters with some uncertainty. The fallacy of averages suggests that it is a mistake to replace LaTeX: p with its expected value for at least 2 reasons: (1) members of LaTeX: p may be correlated, and (2) the average values of LaTeX: f and LaTeX: g need not equal the functional values at the mean.

Farkas lemma

This result gives an alternative system, of which there are many. Farakas' Lemma states that exactly one of the following is consistent,

LaTeX: Ax = b, \; x \ge 0   (exclusive or)   LaTeX: A^Ty \ge 0, \; b^Ty < 0.

An often used variant is

LaTeX: Ax \ge b   (exclusive or)   LaTeX: A^Ty = 0, \; y \ge 0, \; b^Ty < 0.


Arises in directed tree search. A node is fathomed if it is determined that no completion from this point could produce a better solution than one already obtained. This could happen by bounding the objective, or it could happen by determining there is no feasible solution with the partial specifications corresponding to the node.

Feasibility map

The point-to-set map that maps a right-hand side to a feasible region:

LaTeX: (b, c) \mapsto \{x \in X: g(x) \le b, \; h(x) = c\}.


A point is feasible if it satisfies all constraints. The feasible region (or feasibility region) is the set of all feasible points. A mathematical program is feasible if its feasible region is not empty.

The term feasible doesn't imbue other properties such as convexity or connectedness. For example, a feasible region to a nonlinear program could be LaTeX: \{ x : x^2 \ge 1 \}, which is the disjoint union LaTeX:  \{x : x \ge 1 \} \cap \{x : x \le -1\}.

Feasible direction

Given a feasible point LaTeX: x, a direction vector, say LaTeX: d, is feasible if there exists LaTeX: s > 0 such that LaTeX: x + sd is feasible. The method of feasible directions is designed to choose a feasible direction at each iteration, along which (as LaTeX: s becomes positive) the objective value improves. Such directions exist for continuous mathematical programs (where the line segment LaTeX: [x, x + sd] is feasible for some LaTeX: s > 0) unless LaTeX: x is a local optimum. (Note: with nonlinear constraints, there is typically no feasible direction according to this (classical) definition. See the generalized reduced gradient method.)

Fermat-Weber problem

See facility location problem. .

Fibonacci search

This finds the maximum of a unimodal function on an interval, LaTeX: [a, b], by evaluating points placed according to a fibonacci sequence, LaTeX: F_N. If there are LaTeX: F_N points in the interval, only LaTeX: N evaluations are needed. In the continuous case, we begin with some interval of uncertainty, LaTeX: [a,b], and we reduce its length to LaTeX: (b-a)/F_N. The ratio, LaTeX: g_n = F_(n-1)/F_n, is the key to the placements.

Here is the method for the continuous case:

  1. Initialization. Let LaTeX: x = a + (1 - g_N)(b-a) and LaTeX: y = a + g_N(b-a). Evaluate LaTeX: f(x) and LaTeX: f(y) and set LaTeX: n=N.
  2. Iteration. If LaTeX: f(x) < f(y), reduce the interval to LaTeX: (x, b] (i.e., set LaTeX: a=x), decrement LaTeX: n to LaTeX: n-1, and set LaTeX: x = y and LaTeX: y = a + g_n(b-a). If LaTeX: f(x) \ge f(y), reduce the interval to LaTeX: [a, y) (i.e., set LaTeX: b=y), decrement LaTeX: n to LaTeX: n-1, and set LaTeX: y = x and LaTeX: x = a + (1-g_n)(b-a).

The fibonacci search method minimizes the maximum number of evaluations needed to reduce the interval of uncertainty to within the prescribed length. For example, it reduces the length of a unit interval LaTeX: [0,1] to 1/10946 = 0.00009136 with only 20 evaluations. In the case of a finite set, fibonacci search finds the maximum value of a unimodal function on 10,946 points with only 20 evaluations, but this can be improved – see lattice search.

For very large LaTeX: N, the placement ratio LaTeX: (g_N) approaches the golden mean, and the method approaches the golden section search. Here is a comparison of interval reduction lengths for fibonacci, golden section and dichotomous search methods. In each case LaTeX: N is the number of evaluations needed to reduce length of the interval of uncertainty to LaTeX: 1/F_N. For example, with 20 evaluations dichotomous search reduces the interval of uncertainty to LaTeX: 0.0009765 of its original length (with separation value near LaTeX: 0). The most reduction comes from fibonacci search, which is more than an order of magnitude better, at LaTeX: 0.0000914. Golden section is close (and gets closer as N gets larger).

    Evaluations  Fibonacci      Dichotomous     Golden section
        N          1/F_N         1/2^(N/2)      1/(1.618)^(N-1) 
        5         0.125            0.25            0.146
       10         0.0112           0.0312          0.0131
       15         0.00101          0.00781         0.00118
       20         0.0000914        0.000976        0.000107 
       25         0.00000824       0.0002441       0.0000096 

Fibonacci sequence

The numbers satisfying: LaTeX: F_{n+2} = F_{n+1} + F_n, with initial conditions LaTeX: F_0 = F_1 = 1. As shown in the following table, the fibonacci sequence grows rapidly after LaTeX: N=10, and LaTeX: F_N becomes astronomical for LaTeX: N > 50,

         N    F_N
         0      1
         1      1
         2      2
         3      3
         4      5
         5      8
         6     13
         7     21
         8     34
         9     55
        10     89
        20  10946
        50    2.0E10
       100    5.7E20

Named after the 13th century mathematician who discovered it, this sequence has many interesting properties, notably for an optimal univariate optimization strategy, called fibonacci search.


Arises in the context of sparse matrices subjected to certain operations. In particular, a basis may be factored in a product of elementary matrices to represent Gaussian elimination. The nonzeroes that appear in positions that were not in the original matrix are called fill-in coefficients. An example of a matrix that has no fill-in for this factorization is one that is lower triangular. In that case the factors appear as:

\left[ \begin{array}{ccc}
<ul><li> & 0 & 0 \\ * & * & 0 \\ * & * & *
<p>\end{array} \right]
\left[ \begin{array}{ccc}
<ul><li> & 0 & 0 \\ * & 1 & 0 \\ * & 0 & 1
<p>\end{array} \right]
\left[ \begin{array}{ccc}
1 & 0 & 0 \\ 0 & * & 0 \\ 0 & * & 1
\end{array} \right]
\left[ \begin{array}{ccc}
1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & *
\end{array} \right]

On the other hand, a lower triangular matrix could cause fill-in for some other factorization, such as the Cholesky factorization. A completely dense matrix also has no fill-in since there were no zeroes to begin with. Here is an example of fill-in, taking the original order of rows and columns in the product form:

\left[ \begin{array}{ccc}
<ul><li> & 0 & * \\ * & * & 0 \\ * & * & *
<p>\end{array} \right]
\left[ \begin{array}{ccc}
<ul><li> & 0 & 0 \\ * & 1 & 0 \\ * & 0 & 1
<p>\end{array} \right]
\left[ \begin{array}{ccc}
1 & 0 & 0 \\ 0 & * & 0 \\ 0 & * & 1
\end{array} \right]
\left[ \begin{array}{ccc}
1 & 0 & * \\ 0 & 1 & \# \\ 0 & 0 & *
\end{array} \right],

where LaTeX: \# is fill-in.

First-order conditions

This descends from classical optimization, using Taylor's Theorem. For unconstrained optimization, this is simply that LaTeX: \nabla f(x*) = 0. (In variational calculus, it is the Euler-Lagrange equation.) For constrained optimization, the first-order conditions of a regular mathematical program is given by the Lagrange Multiplier Rule.

Fixed charge

A cost that is some value, say LaTeX: C, regardless of the level as long as the level is positive; otherwise the fixed charge is zero. This is represented by LaTeX: Cv, where LaTeX: v is a binary variable. If LaTeX: v=0, the fixed charge is 0; if LaTeX: v=1, the fixed charge is LaTeX: C. An example is whether to open a plant LaTeX: (v=1) or not LaTeX: (v=0). To apply this fixed charge to the non-negative variable LaTeX: x, the constraint LaTeX: x \le Mv is added to the mathematical program, where LaTeX: M is a very large value, known to exceed any feasible value of LaTeX: x. Then, if LaTeX: v=0 (e.g., not opening the plant that is needed for LaTeX: x > 0), LaTeX: x=0 is forced by the upper bound constraint. If LaTeX: v=1 (e.g., plant is open), LaTeX: x \le Mv is a redundant upper bound. Fixed charge problems are mathematical programs with fixed charges. (See Myths and Counterexamples in Mathematical Programming to avoid a misconception.)

Fixed point

The fixed point of a function, LaTeX: f:X \rightarrow X, is an element LaTeX: x \in X such that LaTeX: f(x)=x. Of a point-to-set map, LaTeX: F:X \rightarrow 2^X, we instead have that LaTeX: x \in F(x). The study of fixed points has been at the foundation of algorithms. Moreover, forcing substructure.

Fixed variable

A variable whose value is fixed, either explicitly or implicitly. Explicit fixing arises for convenience of scenario design, particularly in linear programming, and during an algorithm's progression, particularly in integer programming. Implicit fixing occurs from a forcing substructure.

Fleet mix problem

To determine how many of each type of aircraft should be in a fleet to meet demands and minimize total cost. Here is an integer linear program model:

\min c^T x : a \ge Ax \ge b, \; x_j \in \{0,1,...,N_j\},

where LaTeX: N_j is the number of aircraft of type LaTeX: j available; LaTeX: A_{ij} is the capacity of aircraft type LaTeX: j for mission LaTeX: i; LaTeX: a_i is the least number of missions of type LaTeX: i that must be flown; LaTeX: b_i is the greatest number of missions of type LaTeX: i that must be flown. The variables are LaTeX: x_j are the number of aircraft of type LaTeX: j in the fleet, and LaTeX: c_j is its maintenance cost. If the aircraft must be purchased, binary variables are introduced, as LaTeX: x_j - N_j y_j \le 0, with a fixed charge, LaTeX: fy, in the objective LaTeX: (f > 0). There could be additional constraints, such as a budget on total purchases LaTeX: (fy \le f_0) or on total maintenance LaTeX: (gx \le g_0).


This is the integer round-down of a value, LaTeX: x:

LaTeX: \mbox{Floor}(x) = \max \{z: z \in \mathbb{Z}, \; z \le x\}.

Examples: LaTeX: \mbox{Floor}(5)=5, LaTeX: \mbox{Floor}(4.999)=4, LaTeX: \mbox{Floor}(-1.2) = -2. Also, see ceiling.

Flow augmenting path

This arises in the Ford-Fulkerson labeling algorithm to find a maximum flow in a network with arc flow bounds, LaTeX: L and LaTeX: U. Given a flow, LaTeX: f, from a source to a sink, a flow augmenting path, relative to LaTeX: f, is a path from that source to the sink such that

LaTeX: f(x, y) \lt U(x, y)

along all forward arcs, and

LaTeX: f(x, y) \ge L(x, y)

along all backward arcs.

The flow, LaTeX: f<, is a maximum flow from the source to the sink if and only if there is no flow augmenting path relative to LaTeX: f. If there is a flow augmenting path, it can be changed along the path such that total flow increases (by at least 1 if all data are integer).

Forced equality

This is an inequality constraint that must hold with equality in every feasible solution.

Forcing substructure

A subsystem of the constraints and the variables in them that forces some value. For example, the following system is a forcing substructure because each variable, which is required to be non-negative, is forced to be zero.

-x -y & = & 0 \\
y - z & = & 0 \\
y & \ge & 0 \\
z & \ge & 0.      

Forward checking

Forward checking is a propagation procedure that guarantees that at each step of the search, all the constraints between already assigned variables and not yet assigned variables are arc consistent.

Formally, let LaTeX:  N = (X, D, C) be a binary constraint network and LaTeX:  Y \subseteq X such that LaTeX:  |D(x_i)| = 1 for all LaTeX:  x_i \in Y. LaTeX: N is forward checking consistent according to the instantiation LaTeX: I on LaTeX: Y iff LaTeX: I is locally consistent and for all LaTeX: x_i \in Y, for all LaTeX: x_j \in X \setminus Y, for all LaTeX: c(x_i, x_j) \in C, LaTeX: c(x_i, x_j) is arc consistent.

Forward substitution

The recursion to solve a nonsingular lower triangular system, LaTeX: Lx=b. It starts with LaTeX: x_1 = b_1 / L_{1,1}, then

x_j = \left( b_j - \sum_{i < j} L_{i, j} x_i \right) / L_{j,j}.

for LaTeX: j=2,...,n.

Forward transformation


Forward triangularization

An algorithm to rearrange a matrix by recursively assigning a singleton row to the front (with its only column, as its pivot). The recursion applies to the submatrix defined by omitting the assigned row and column (and reducing other row counts accordingly). This results in a lower triangular rearrangement if and only if such a rearrangement exists.

Fourier-Motzkin elimination

A procedure by which a variable is eliminated in a system of linear inequalities. The remaining inequalities, which include some new ones, has a solution if and only if the original system has a solution. Eventually, this reduces to one variable or an inconsistency is determined during the elimination. The one variable can be assigned any value in its range, then a backtracking procedure can be executed to obtain values of the other variables, which were eliminated. Originally presented by Fourier in 1826, Motzkin analyzed it in 1933 in light of new developments of the theory of linear inequalities to solve linear programs. Its computational complexity is exponential, and it gave way to the simplex method.

Fractional program

Objective and/or constraint functions are sums of ratios of the form LaTeX: V(x)/D(x), where typically LaTeX: D(x) > 0 over some domain LaTeX: X. The case of one term is special, and the linear fractional program has affine numerator and denominator. The linear 1-term case, LaTeX: (ax+b)/(cx+b) (where LaTeX: cx+b > 0 over the feasible region), is equivalent to solving a parametric linear program:

\mbox{opt } (ax+b) - u(cx+b) : x \in X,
where LaTeX: u is the parameter.

Frank-Wolfe Theorem

If a quadratic function is bounded from below on a nonempty polyhedron, it attains a minimum there. In mathematical notation, we are given that LaTeX: X is a nonempty polyhedron. Then, if there exists LaTeX: L such that LaTeX: f(x) \ge L for all LaTeX: x \in X, there exists LaTeX: x^* \in X such that LaTeX: f(x^*) \le f(x) for all LaTeX: x \in X.

Free variable

A variable with no (explicit) sign restriction. (The term is generally used in linear programming because the standard form has x >= 0.)

Fritz John conditions

These extend Carathéodory's conditions to deal with inequality constraints:

Suppose LaTeX: x^* is in LaTeX: \arg\!\max\{f(x): x \in \mathbb{R}^n, \; g(x) \le 0\}, where LaTeX: f and LaTeX: g are in smooth. Then, there exists multipliers LaTeX: (y_0, y) \ge 0, not all zero, such that

y_0 \nabla f(x) - y^T \nabla g(x) = 0 \mbox{ and } y^Tg(x) = 0.

Fuzzy CSP

In a fuzzy constraint satisfaction problem each constraint LaTeX: C and instantiation LaTeX: I is assigned a degree of satisfaction LaTeX: sat(C,I), which is a value in the [0,1] real interval indicating the degree that LaTeX: C is satisfied by LaTeX: I. If this value is 1, LaTeX: C is satisfied and if this value is 0, LaTeX: C is violated. In the most common interpretation of fuzzy CSPs, the task is to find an instantiation LaTeX: I for which the minimum of LaTeX: sat(C,I) with LaTeX: C ranging over all the constraints (i.e. the smallest degree of satisfaction for the instantition LaTeX: I) is maximal.

Fuzzy math program

Some (or all) constraints are fuzzified by replacing the level set, LaTeX: G_i = \{x \in X : g_i(x) \le 0\}, with the requirement that LaTeX: u_G_i(x) \ge a_i, where LaTeX: u_G_i is a membership function that is given (or derived from primitive membership functions on the level sets of each LaTeX: g_i, and LaTeX: a_i is a parameter in LaTeX: [0,1] to be set for each instance of the model. Typically (but not necessarily), LaTeX: a_i = 1 means that the constraint must hold, and the lower the value of LaTeX: a_i, the more LaTeX: x's are admitted. (This appears similar to the chance-constraint model of stochastic programming, but it is more closely related to goal programming.) While fuzzy sets are used to represent uncertainty, they can also be used to represent preferences, e.g. a requirement, LaTeX: u_S(x) \ge u_T(x) could mean that it is preferred that LaTeX: x be in LaTeX: S at least as much as it is preferred that LaTeX: x be in LaTeX: T.

Fuzzy set

Given a universe set, LaTeX: X, and a membership function, LaTeX: u : X \rightarrow [0,1], a fuzzy set is a collection of pairs: LaTeX: \{(x, u(x)): x \in X\}. Often, the membership function is subscripted by the set name, say LaTeX: u_S. Generally, for all LaTeX: x \in X, LaTeX: u_{X}(x)=1, and LaTeX: u_{\emptyset}(x)=0. In the context of uncertainty, the value LaTeX: u_{S}(x) is used to model the statement of how possible it is for LaTeX: x to be in LaTeX: S. For this reason, LaTeX: u_S is sometimes called the possibility function of LaTeX: S. What we consider the usual set (without a membership function) is called a crisp set in fuzzy mathematics.

Fuzzy set operations, say on fuzzy sets LaTeX: S and LaTeX: T, with membership functions LaTeX: u_S and LaTeX: u_T, resp., are defined by the following:

  • Union: LaTeX: u_{S\vee T}(x) = \max \{u_S(x), u_T(x)\}.
  • Intersection: LaTeX: u_{S\wedge T}(x) = \min \{ u_S(x), u_T(x)\}.
  • Complement: LaTeX: u_{~S}(x) = 1 - u_S(x).

One must be careful when using fuzzy sets to represent uncertainty (which is not the only type of interpretation – see fuzzy mathematical program). In particular, if LaTeX: u_S(x) = 1/2, its complement is also LaTeX: 1/2. Thus, LaTeX: u_{S\vee \sim S}(x) = 1/2, despite the fact that LaTeX: S\vee \sim S = X (in ordinary set theory). Similarly, LaTeX: u_{S\wedge \sim S}(x) = 1/2, despite the fact that LaTeX: S\wedge \sim S = \emptyset. This illustrates the fact that LaTeX: u_S need not equal LaTeX: u_T even if LaTeX: S=T as crisp sets.

While the fuzzy set is fundamental for fuzzy mathematical programming, other concepts in fuzzy mathematics also apply, such as fuzzy arithmetic, fuzzy graphs and fuzzy logic.


See generalized arc consistency.


GAMS. Generalized Algebraic Modeling System.


GRASP stands for Greedy Randomized Adaptive Search Procedures. It is a metaheuristic that uses restart to search many portions of the solution space. Once it is started, some Heuristic is employed to reach a local optimum. The choice for the local search can be problem-specific, such as using an n-opt neighborhood for the TSP, or it can use another metaheuristic, like simulated annealing or tabu search.

Game theory

In general, a (mathematical) game can be played by one player, such as a puzzle, but its main connection with mathematical programming is if there are at least two players, and they are in conflict. Each player chooses a strategy that maximizes his payoff. If there are exactly two players and one player's loss is the other's gain, the game is called zero sum. In this case, a payoff matrix, LaTeX: A, is given where LaTeX: A_{ij} is the payoff to player 1, and the loss to player 2, if player 1 uses strategy i and player 2 uses strategy j. In this representation each row of LaTeX: A corresponds to a strategy of player 1, and each column corresponds to a strategy of player 2. If LaTeX: A is LaTeX: m \times n, this means player 1 has m strategies, and player 2 has n strategies. Here is an example of a LaTeX: 2 \times 3 payoff matrix:

A = \left[ \begin{array}{rrr}
10 & -5 & 20 \\ -15 & 5 & -15
\end{array} \right]

A primary connection with mathematical programming is duality. Suppose LaTeX: (x_1, x_2, \ldots, x_m) and LaTeX: (y_1, y_2, \ldots, y_n) are strategy probability vectors chosen by the two players, i.e. in a particular play, LaTeX: x_i is the probability player 1 chooses strategy i and LaTeX: y_j is the probability player 2 chooses strategy j. Then, the value, LaTeX: v of the game is the expected payoff:

v = x A y^T = \sum_{ij} x_i A_{ij} y_j

where player 1 seeks to maximize LaTeX: v and player 2 seeks to minimize LaTeX: v. A pure strategy is a unit vector. For example, if LaTeX: x = (1, 0, 0, \ldots, 0), then player 1 is using his first strategy 100% of the time; if LaTeX: y = (0, 1, 0, 0, \ldots, 0), player 2 is using his second strategy 100% of the time. The expected payoff is simply the cell entry, LaTeX: A_{12}, which is -5 in the example (player 2 pays 5 units to player 1). The strategies that are not pure are called mixed. For example, LaTeX: x = (1/2, 1/2) means player 1 plays each of the two strategies 50% of the time (flipping a coin on each play); LaTeX: y =(1/4, 3/4, 0) means player 2 plays his first strategy 25% of the time and his second the remaining 75%. The expected payoff is -5/8.

The fundamental duality theorem of linear programming, see the supplement on mathematical programming duals,

provides the equivalent primal-dual formulation:

Player 1 is Player 2 is
Primal Dual

LaTeX:  \max v : x \ge 0, \; \sum_i x_i = 1,          LaTeX:  \min v : y \ge 0, \; \sum_j y_j = 1,
LaTeX: v \le \sum_i x_i A_{ij} \; \forall j LaTeX: v \ge \sum_j A_{ij}y_j \; \forall i

This is the simplest 2-person game; more generally, there can be n persons, and subsets of the players can form cooperative agreements. If this is not allowed, the game is called non-cooperative.

A combinatorial game is a cooperative game with restrictions on the formation of coalitions that induce a combinatorial structure, such as a matroid.

Gauge function

A nonnegative, convex function, from LaTeX: \mathbb{R}^n to the extended reals that is positively homogeneous and LaTeX: f(0)=0. A norm is a gauge function. Here is an example that is not a norm:

f(x,y,z) = \left\{ \begin{array}{rl}
<pre>          \| (x-z,\; x-y) \|_2, & 2x - y - z = 0 \\ \\
          \infty, & \mbox{ otherwise.}
          \end{array} \right.

Gauss-Jordan elimination

A method to solve LaTeX: Ax=b that performs elementary row and column operations on LaTeX: A to annihilate successive elements of LaTeX: A in order to reduce LaTeX: A to an identity matrix. On paper, the same operations are applied to LaTeX: b, then the solution is obtained by solving the resulting identity system. In a computer, the matrices effecting the elementary operations are saved as elementary matrices, say LaTeX: E_i for the i-th operation. Then, the system is equivalent to LaTeX: E_1 E_2 \ldots E_n x = b, and forward transformation is applied to solve for LaTeX: x. This is what is done in the (revised) simplex method, and each iteration is a pivot operation.

Gauss-Seidel method

An iterative method to solve LaTeX: Ax=b using the latest values of the components of LaTeX: x during the updates. Specifically, LaTeX: A is split into LaTeX: L-T, where LaTeX: L is lower triangular, and the update solvesLaTeX:  Lx' = Tx + b. (Surprisingly, history indicates Gauss did not know about this method, and Seidel did not recommend it.) Often people refer to the method to mean the use of latest values, rather than the specific splitting. It arises, for example, in parallel computation, where using the latest value of LaTeX: x during the computation of LaTeX: x' is not done since it would not be able to take advantage of the parallelism.

Gaussian elimination

A method to solve LaTeX: Ax=b that performs elementary row operations on LaTeX: A to annihilate successive elements of LaTeX: A in order to reduce LaTeX: A to an upper triangular matrix, LaTeX: U. On paper, the same operations are applied to LaTeX: b, then the solution is obtained by solving the resulting upper triangular system. In a computer, the product of the matrices effecting the elementary row operations is a lower triangular matrix, LaTeX: L, with unit diagonal. Once this phase is completed, the system LaTeX: Ax=b becomes LaTeX: LUx=b. This is then solved in two steps: forward substitution solves LaTeX: Ly=b; then backward substitution solves LaTeX: Ux=y. (Of course, computer implementations vary.)

Generalized Lagrange multiplier method

The Generalized Lagrange Multiplier method (GLM) solves a sequence of Lagrangian optimization (relaxation) problems, searching the multiplier space by some method to seek a minimum of the maximum Lagrangian function.

  1. Given LaTeX: (u, v) with LaTeX: u \ge 0, find LaTeX: x \in \arg\!\max \{f(x) - u^T g(x) - v^T h(x) : x \in X\} = L^*(u,v).
  2. Test if 0 is in the convex hull of LaTeX: S(x, u, v) := \{(b, c): g(x) \le b, \; h(x) = c,\; u^T g(x) =u^T b\}. If so, stop (LaTeX: x solves the original mathematical program if 0 is in LaTeX: S(x, u, v); otherwise, search alternative optima; if no maximum generates 0 as a member of LaTeX: S, then the right-hand side (0) is in a duality gap).
  3. Given LaTeX: (x, u, v), choose LaTeX: (u', v') to reduce the value of LaTeX: L^*(u, v).

This can be viewed from several vantages:

Generalized arc consistency

An extension of arc consistency to constraints with more than two variables in their scope.

A variable LaTeX: x is generalized arc consistent (GAC) with a constraint if every value of the variable can be extended to all the other variables of the constraint in such a way the constraint is satisfied.

Generalized arc consistency is one of the most commonly enforced forms of consistency in constraint programming.

GAC has also been called: hyper-arc consistency and domain consistency.

Generalized equation

This has the form: 0 is in LaTeX: \{f(x)\} + T(x), where LaTeX: f is a function from LaTeX: \mathbb{R}^n into LaTeX: \mathbb{R}^n and LaTeX: T is a point to set map that maps LaTeX: x into subsets of LaTeX: \mathbb{R}^n. If LaTeX: T is absent, the condition reduces to an ordinary equation, LaTeX: f(x)=0. Usually, LaTeX: T is assumed to satisfy the monotonicity condition:

(x-x')^T(y-y') \ge 0 \mbox{ for } y \in T(x) \mbox{ and } y' \in T(x').

This includes a variational inequality (and hence the complementarity problem) as a special case.

Generalized inverse

Suppose LaTeX: A is any LaTeX: m \times n matrix. LaTeX: A^+ is a generalized inverse of LaTeX: A if LaTeX: A^+ is LaTeX: n \times m and LaTeX: AA^+A = A. Then, a fundamental theorem for linear equations is:

The equation, LaTeX: Ax = b, has a solution if and only if LaTeX: AA^+b = b for any generalized inverse, LaTeX: A^+, in which case the solutions are of the form:
LaTeX: x = A^+b + (I - A^+A)y   for any   LaTeX: y \in \mathbb{R}^n.

Here is an Example.

The Moore-Penrose class additionally requires that LaTeX: AA^+ and LaTeX: A^+A be symmetric (hermitian, if LaTeX: A is complex). In particular, if LaTeX: \mbox{rank}(A) = m, LaTeX: A^+ = A^T (AA^T)^{-1} is a Moore-Penrose inverse. Note that LaTeX: AA^+ = I, and LaTeX: A^+A is a projection matrix for the subspace LaTeX: \{x: x = A^T y \mbox{ for some } y \in \mathbb{R}^m\}, since LaTeX: x = A^T y implies LaTeX: A^+Ax = A^T (AA^T)^{-1} AA^T y = A^T y = x. Further, LaTeX: I-A^+A projects into its orthogonal complement, which is the null space of LaTeX: A, i.e. LaTeX: A(I-A^+A)x = (A-A)x = 0 for any LaTeX: x \in \mathbb{R}^n.

Generalized network

A network in which the flow that reaches the destination could be different from the flow that left the source. In the incidence matrix, instead of

\left[ \begin{array}{crc}
<pre>  \ldots & \vdots & \ldots \\
  \ldots & -1 & \ldots \\
  \ldots & \vdots & \ldots \\
  \ldots & +1 & \ldots \\
  \ldots & \vdots & \ldots
<p>\end{array} \right]
\leftarrow \mbox{ source row } (i) \\ \\ \\
\leftarrow \mbox{ destination row } (j) 

we have

\left[ \begin{array}{crc}
<pre>  \ldots & \vdots & \ldots \\
  \ldots & -1 & \ldots \\
  \ldots & \vdots & \ldots \\
  \ldots & g & \ldots \\
  \ldots & \vdots & \ldots
<p>\end{array} \right]
\leftarrow \mbox{ source row } (i) \\ \\ \\
\leftarrow \mbox{ destination row } (j) 

where LaTeX: g > 0. If LaTeX: g < 1, there is a loss; if LaTeX: g > 1, there is a gain.

Generalized reduced gradient method

The Generalized reduced gradient method (GRG) is a generalization of the reduced gradient method by allowing nonlinear constraints and arbitrary bounds on the variables. The form is:

\max f(x) : h(x) = 0, \; L \le x \le U,

where LaTeX: h has dimension LaTeX: m. The method supposes we can partition LaTeX: x = (v, w) such that:

  • LaTeX: v has dimension LaTeX: m (and LaTeX: w has dimension LaTeX: n-m);
  • the values of LaTeX: v are strictly within their bounds: LaTeX: L_v < v < U_v (this is a nondegeneracy assumption);
  • LaTeX: \nabla_v h(x) is nonsingular at LaTeX: x = (v,w).

As in the linear case, for any LaTeX: w there is a unique value, LaTeX: v(w), such that LaTeX: h(v(w),w)=0 (c.f., Implicit Function Theorem), which implies that LaTeX: dv/dw = (\nabla_v h(x))^{-1} \nabla_w h(x). The idea is to choose the direction of the independent variables to be the reduced gradient: LaTeX:  \nabla_w (f(x) - y^T h(x)), where LaTeX: y = dv/dw = (\nabla_v h(x))^{-1}\nabla_w h(x). Then, the step size is chosen and a correction procedure applied to return to the surface, LaTeX: h(x)=0.

The main steps (except the correction procedure) are the same as the reduced gradient method, changing the working set as appropriate.

Generalized upper bound

A Generalized upper bound (GUB) is an upper bound constraint on a sum of variables: LaTeX: \textstyle\sum_{j \in J} x_j \le U. A collection of these that do not overlap (i.e., index sets LaTeX: J are disjoint) comprise the foundation for the generalized upper bounding technique in linear programming. This is where the summation equation is marked in the data structure, rather than represented explicitly as a row in the LP, and the basis is handled in its reduced dimension to gain computational efficiency in both time and space.

Genetic algorithm

Genetic algorithms (GAs) are a class of algorithms inspired by the mechanisms of genetics, which has been applied to global optimization (especially for combinatorial programs). It requires the specification of three operations (each is typically probabilistic) on objects, called "strings" (these could be real-valued vectors):

  1. Reproduction - combining strings in the population to create a new string (offspring);

    Example: Taking 1st character from 1st parent + rest of string from 2nd parent:

[001001] + [111111] \rightarrow [011111]
  2. Mutation - spontaneous alteration of characters in a string;

    Example: Just the left-most string:

[001001] \rightarrow [101001]

  3. Crossover - combining strings to exchange values, creating new strings in their place.

    Example: With crossover location at 2:

[001001]  &amp;  [111111] ===&gt; [001111], [111001]
These can combine to form hybrid operators, and the reproduction and crossover operations can include competition within populations. Here is a generic GA (strategy):

0. Initialize population.

1. Select parents for reproduction and GA operators (viz., mutation and crossover).
2. Perform operations to generate intermediate population and evaluate their fitness values.
3. Select members of population to remain with new generation.

Repeat 1-3 until some stopping rule is reached.

The original GA (1973, by John Holland) used crossover and total population replacement. This means a population with 2N objects (called chromosomes) form N pairings of parents that produce 2N offsprings. The offsprings comprise the new generation, and they become the total population, replacing their parents. More generally, a population of size N produces an intermediate population of N+M, from which Ñ is kept to form the new population. One way to choose which Ñ survive is by those with the greatest fitness values – survival of the fittest.

Geometric convergence

See convergence.

Geometric mean

Given a nonnegative LaTeX:  x \in \mathbb{R}^n and a weight vector, LaTeX: a in the simplex, LaTeX: S_n, the geometric mean is the value:

x(1)^a(1) * x(2)^a(2) * ... * x(n)^a(n).

For example, for LaTeX: a=(1/2, 1/2), the geometric mean of LaTeX: x = (1, 9) is LaTeX: 1*3=3. A fundamental inequality that provides a foundation for geometric programming is that the geometric mean is never greater than the arithmetic mean:

\Pi_j x(j)^a(j) \le \textstyle\sum_j a(j)x(j).

for all nonnegative LaTeX: x \in \mathbb{R}^n and LaTeX: a \in S_n. Further, equality holds only if LaTeX: x(j) is constant for all LaTeX: j.

Geometric program

A mathematical program of the form

\min P_0(x) : P_k(x) \le 1, \;  x > 0,

where each LaTeX: P_k is a posynomial (k=0,...,m). Conventionally, the monomials are indexed sequentially, so that the posynomials appear as:

LaTeX: P_{0} = \sum_{i=1}^{N_0} c_{i} \prod_{j} x_{j}^{a_{i,j}}
LaTeX: P_{1} = \sum_{i=N_0 + 1}^{N_1} c_{i} \prod_{j} x_j^{a_{i,j}}
LaTeX: \vdots
LaTeX: P_m = \sum_{i=N_{m-1} + 1}^{N_m} c_i \prod_j x_j^{a_{i,j}}

For example, consider:

\min 1/y^3 + 5y/z^2:  x, y, z > 0, \; 1/(xyz) \le 1, \; x/y + z/x \le 1.

Then, LaTeX: n=3 (variables), LaTeX: m=2 (constraints), LaTeX: N_0=2 (monomials in objective), LaTeX: N_1=3 (LaTeX: N_0 + 1 monomial in constraint 1), and LaTeX: N_2 = 5 (LaTeX: N_1 + 2 monomials in constraint 2).

The exponent matrix is the LaTeX: N_m \times n matrix whose i-th row contains the exponents of the variables in the i-th monomial. The example has 5 rows and 3 columns:

A = \left[ \begin{array}{rrr}
<pre>  0 &  0 & -3 \\
  0 &  1 & -2 \\
 -1 & -1 & -1 \\
  1 & -1 &  0 \\
 -1 &  0 &  1
  \end{array} \right]

The degree of difficulty is the number of terms in all posynomials (LaTeX: N_m) minus the number of independent variables (one less than the rank of exponent matrix). In the above example, the rank of LaTeX: A is 3, so the degree of difficulty is 5-3-1 = 1. If the last constraint were not present, only the first three rows comprise the exponent matrix, and its rank is 2. In that case, the degree of difficulty is 0. (Using the <a href="second.php?page=duals.html#Geometric">geometric dual</a>, the solution reduces to solving a system of linear equations, which is what motivates the terminology.)

Also see superconsistent.

Since its inception, the posynomial form has been extended to signomials. In that case, the duality need not be strong, as there can be a duality gap. The general, or signomial, geometric program is of the form:

\min P_0(x) - Q_0(x): x > 0, \; P_k(x) - Q_k(x) \le 1

for LaTeX: i=1,\ldots,m, where LaTeX: P_k and LaTeX: Q_k are posynomials (LaTeX: k=0,\dots,m).

Global Cardinality constraint

This is a global constraint that is specified on LaTeX: n assignment variables LaTeX: \langle x_1 \dots x_n \rangle and LaTeX: n^{'} count variables LaTeX: \langle c_{v1} \dots c_{vn^{'}} \rangle, and enforces that each value LaTeX: v_i is assigned to exactly LaTeX: c_{vi} of the assignment variables, LaTeX: x_j. The global cardinality constraint LaTeX: gcc(x_1\dots x_n), c_{v1} \dots c_{vn^{'}}) is a generalization of the all different constraint which requires that every value is assigned to at most one variable.

Global constraint

A global constraint LaTeX: C={C(i)} is a family of scope-parameterized constraints (normally LaTeX:  i \geq 2 ), where LaTeX: i is the size of the scope of the constraint. The relation represented by the constraint is often defined implicitly (i.e., intensionally) via a compact set of characteristics that all satisfying tuples must exhibit. For example, the all-different constraint is satisfied by all assignments to its variables where no pair have the same value. Such implicit definitions are then expressed and enforced by the constraint propagator corresponding to the global constraint.

Global constraint may also be explicitly (i.e., extensionally) defined by providing a set of satisfying (or non-satisfying) tuples.

The most common global constraint are:

Global convergence

See convergence.

Global optimization

Generally refers to mathematical programming without convexity assumptions, which are NP-hard. In general, there could be a local optimum that is not a global optimum. Some authors use this term to imply the stronger condition there are multiple local optima. Some solution strategies are given as heuristic search methods (including those that guarantee global convergence, such as branch and bound). As a process associated with algorithm design, some regard this simply as attempts to assure convergence to a global optimum (unlike a purely local optimization procedure, like steepest ascent). See the Glover's linearization.

Global optimum

A solution to a mathematical program (as distinct from a local optimum). When the mathematical program is not convex, there could be (and typically are) many local optima that are not global optima.

Glovers linearization

Given a binary variable LaTeX: x and a linear function LaTeX: f(w) in discrete and/or continuous variables LaTeX: w in LaTeX: W for some set LaTeX: W, this linearization reformulates the product LaTeX: f(w)x with a (free) continuous variable LaTeX: z and enforces that LaTeX: z = f(w)x by adding four linear inequalities:

Lx \le z \le Ux, \; f(w) - U(1-x) \le z \le f(w) - L(1-x),

where the values LaTeX: L and LaTeX: U are defined as

L = \min \{ f(w) : w \in W^R\} \mbox{ and } U = \max \{f(w) : w \in W^R\},

and LaTeX: W^R is any relaxation of LaTeX: W.

Goal program

A goal is neither a constraint nor an objective because it is neither a firm requirement nor a function that must be at an extreme value. In English, a goal represents a relation that is desired, but it may be violated if there is adequate compensation. For example, we may have a budget of $1000, and we prefer to operate within it, but if spending just one more dollar yields a major improvement in operations, we would consider spending $1001.

A goal program is put into a standard form mathematical program in two steps:

  1. Add the variable, LaTeX: v, and the constraints, LaTeX: G(x) - v \le 0 and LaTeX: v \ge 0, to measure the level of violation. (If the goal were LaTeX: H(x) = 0, instead of LaTeX: G(x) \le 0, the added constraint would be LaTeX: H(x) + u - v = 0, where LaTeX: u and LaTeX: v are (non-negative) levels of violation below and above the goal, respectively.)
  2. Add the penalty term to the objective: LaTeX: P(v), where LaTeX: P(0)=0, and LaTeX: P is strictly increasing -- i.e., LaTeX: v' \ge v and LaTeX: v'_i > v_i for some LaTeX: i imply LaTeX: P(v') > P(v).

The resulting mathematical program represents the goal program. If the goal is satisfied, LaTeX: v=0; otherwise, the penalty term, LaTeX: P(v), reflects "adequate compensation" for the violation.

Golden mean

The golden mean, or golden ratio is the positive solution to the quadratic equation, LaTeX: x^2 + x - 1 = 0, which is LaTeX: (\sqrt{5}-1)/2, or approximately LaTeX: 0.618. This has the proportionality property: LaTeX:  x : 1 = (1-x) : x, which defines the placements of evaluations for the golden section search.

Golden section search

This finds an interval, LaTeX: [a,b], that contains a maximum of a unimodal function whose domain is LaTeX: [0,1], such that the length of the final interval, LaTeX: [a,b], satisfies: LaTeX: b-a < e (where LaTeX: e > 0 is specified). The golden ratio is denoted below by LaTeX: g, which is approximately 0.618.

  1. Initially compute LaTeX: f(g) and LaTeX: f(1-g), and set the interval of uncertainty to the original endpoints, LaTeX: a=0 and LaTeX: b=1. In general, we have LaTeX: x and LaTeX: y placed in LaTeX: [a,b], such that the distances from the endpoints are the same:
  2. Given the interval LaTeX: [a, b] contains evaluations LaTeX: f(x) and LaTeX: f(y), with LaTeX: x < y, compare: if LaTeX: f(x) > f(y), replace LaTeX: b with LaTeX: y; if LaTeX: f(x) < f(y), replace LaTeX: a with LaTeX: x. (If LaTeX: f(x)=f(y), this leaves the interval LaTeX: [x,y], in which case evaluate LaTeX: f(x+g(y-x)).) One of the following cases prevail:

                  *                                          *
                            *                     *
           a------x---------y------b       a------x---------y------b
                            : drop :       : drop :
                f(x) > f(y)                           f(x) < f(y)
                  x=g(y-a)                              y=g(b-x)
                                 *         *
                          : drop :         : drop :
                                 f(x) = f(y)

The golden ratio is the limit of successive fibonacci numbers: (LaTeX: F_{N-1} / F_{N}) \rightarrow g. Thus, the golden section search approaches Fibonacci search as the number of functional evaluations (N) becomes large.

Gomory cut

This is an early cutting plane for an integer program. Consider LaTeX: \{ (x, s) : x \in \mathbb{Z}^n, \; s \in \mathbb{R}^m, Ax + s = b, x \ge 0, s \ge 0 \}, where LaTeX: A and LaTeX: b have integer values. For any LaTeX: y \in \mathbb{R}^m, let LaTeX: a = A^T y - \lfloor A^T y \rfloor and LaTeX: a_0 = y^T b - \lfloor y^T b \rfloor (i.e., LaTeX: a is the fractional part of the linear combination of the equations, and LaTeX: a_0 is the fractional part of the same linear combination of the right-hand sides). Also, let LaTeX: c = y - \lfloor y \rfloor (fractional part of LaTeX: y). Then, Gomory's cut is:

a^T x + c^T s \ge a_0.

The vector LaTeX: y is chosen such that LaTeX: a_0 > 0 and the current solution (with LaTeX: x = 0) violates this inequality in the LP relaxation.

Gomory function

A recursively defined class of functions on LaTeX: \mathbb{Q}^n to LaTeX: \mathbb{Q} (rational n-vectors to rationals) using three operations: (1) non-negative, rational linear combinations, (2) integer rounding, and (3) maximum. Letting LaTeX: R(v) be the integer round-up of LaTeX: v, i.e. the least integer not less than LaTeX: v, we say LaTeX: f is a Gomory function if it is the identity function, LaTeX: f(x)=x, or if it has one of the following forms:

  1. LaTeX: f(x) = ag(x) + bh(x) for some LaTeX: a, b \ge 0 and rational;
  2. LaTeX: f(x) = R(g(x));
  3. fLaTeX: (x) = \max \{g(x), h(x)};
where LaTeX: g and LaTeX: h are Gomory functions. It can be shown this is equivalent to the point-wise maximum of finitely many Chvátal functions:

f(x) = \max{C_1(x), C_2(x), \ldots,C_k(x)\},

where each LaTeX: C_i is a Chvátal function.

This arises in integer linear programming in several ways. One fundamental result is that the optimal value as a function of LaTeX: b is a Gomory function:

f^*(b) = \inf\{c^Tx: Ax = b, x \ge 0, \; x \in \mathbb{Z}^n \}.

Here is an example:

f^*(b_1, b_2) & = & \min \{ 7x_1 + 4x_2 + 3x_3 : \\
<pre>             &   & \;\;\;\;\;\;\;\; 5x_1 + 3x_2 + 2x_3 = b_1 \\
             &   & \;\;\;\;\;\;\;\; x_1 + x_2 + x_3 = b_2 \\
             &   & \;\;\;\;\;\;\;\; x \in \mathbb{Z}^3, \; x \ge 0 \} \\
             & = & \max \{ b_1 + b_2, \; \lceil b_1/2 - b_2 / 2 \rceil \}

Note: The above definition assumes minimization; if the ILP is maximization, LaTeX: R(.) would be round-down (i.e., greatest integer not exceeding the value), and condition 3 would be the point-wise minimum.

Gomory group

The group associated with the corner polyhedron problem:

\left\{v \in \mathbb{Z}^n :  \sum_j r(A_j)v_j = r(b),\; v \ge 0 \right\},

where LaTeX: v corresponds to the nonbasic variables and LaTeX: r(d) is the remainder, LaTeX: d\!\mod |B|, where LaTeX: |B| is the determinate of the basis, LaTeX: B. The idea is that the nonbasic value chosen in this group, together with the basic levels imputed by the equations LaTeX: Ax=b, yields an integer solution (but a basic level might be negative).


Vector of first partial derivatives of a function (assumed to be differentiable at least once). Typically denoted LaTeX: \nabla f(x), where LaTeX: f is the function and LaTeX: x is a point in its domain.

Gradient projection method

A feasible direction method by projecting the gradient into the working surface, LaTeX: \{x: Ax=b\}. Suppose LaTeX: A has full row rank. Then, LaTeX: P = I - A^T(AA^T)^{-1}A projects any vector into the null space of LaTeX: A: LaTeX: APy = (A-A)y = 0 for all LaTeX: y \in \mathbb{R}^n. The form of an iteration is LaTeX: x' = x + sd, where LaTeX: d is the projected gradient, LaTeX: P \nabla f(x), and LaTeX: s is determined by line search. Since LaTeX: Ad=0, LaTeX: Ax'=Ax, thus staying in the working surface. (This extends to nonlinear constraints by using the same correction procedure as the generalized reduced gradient method.)


In one context, this is the functional value and domain: LaTeX: \{(x, z): x \in X, \; z=f(x)\}, where LaTeX: f:X \rightarrow R. In another context, this is a (maybe undirected) network. In the former context, see also epigraph and hypograph. In the latter context, the notation LaTeX: (V,E) is sometimes used to mean a graph with vertex (or node) set LaTeX: V and edge (or link) set LaTeX: E. We say an edge is incident with the two nodes that define it, and those two nodes are called the endpoints of the edge. The endpoints are said to be adjacent nodes.

An edge whose endpoints are the same node is called a loop. Two edges with the same endpoints are called parallel. A graph is simple if it has no loops or parallel edges. Otherwise, it is called a multigraph. The degree of a node is the number of edges incident to it (counting a loop twice). If two edges have a common endpoint, they are said to be adjacent. A path is a sequence of edges that are successively adjacent. (If the graph is simple, a path can also be represented by the associated sequence of nodes.)

When the edge is directed (or oriented), it is called an arc. When all edges are directed, the graph is called a directed graph, or digraph. With additional properties (e.g., arc weights), this is a network. A path in a digraph can traverse its arcs in either the forward or backward direction. If all arcs are forward, it is a directed path, or dipath.

Here are some special graphs of interest (not exhaustive).

  • bipartite. nodes decompose into 2 sets such that every link has its endpoints in opposite sets. This applies, for example, to the standard transportation problem, where there are sources in one set and destinations in the other.
  • complete. There is a link between each pair of nodes. A complete graph on n nodes is usually denoted LaTeX: K_n. In the case of a bipartite graph, the term bicomplete is used to mean all possible links are there: between all pairs of nodes in the two sets. A bicomplete graph on sets with LaTeX: m and LaTeX: n nodes, resp., is usually denoted LaTeX: K_{mn}.
  • connected. There is a path between each pair of nodes. If the graph is directed, this divides into:
    1. strongly connected - arcs must be traversed in forward direction;
    2. weakly connected - arcs can be traversed in either direction.
  • cycle (denoted LaTeX: C_n for LaTeX: n nodes). The links are LaTeX: (1,2),(2,3), \ldots, (n-1,n), (n,1).
  • planar. The graph can be drawn in a plane (2D) such that no two links cross (they meet only at incident nodes). This has special properties in multi-commodity flows and shortest paths.
  • subgraph. LaTeX: (V',E') such that LaTeX: V' is a subset of LaTeX: V and LaTeX: E' is a subset of LaTeX: E.
  • tree. Connected and acyclic. One problem is finding a spanning tree that is optimal in some sense.
  • undirected. No link is directed.

Greedy algorithm

Applies when the optimization problem is to decide whether or not to include some element from a given set. A greedy algorithm begins with no elements and sequentially selects an element from the feasible set of remaining elements by myopic optimization. (The elements could have been sorted by some criterion, such as associated weights.) This results in an optimal solution to the problem if and only if there is an underlying matroid structure (for example, see spanning tree). Further details are in the supplement, Greedy Algorithms.

Groebner basis

This arises in parametric integer (linear) programming, varying the right-hand side LaTeX: (b). A precise definition is involved, stemming from ideal theory in abstract algebra, and there are many other applications outside of integer programming. In the context of ILP, a Gröbner basis is a minimal set of change directions, LaTeX: \{h^1, h^2, \ldots, h^k\}, called a test set, such that the objective function improves along those directions. To elaborate, suppose we have

\mbox{ILP}(b): \min c^Tx : Ax = b, \; x \in \mathbb{Z}^n, \; x \ge 0,

where LaTeX: A and LaTeX: b have integer elements. Let LaTeX: X^*(b) = \arg\!\min \{c^Tx : Ax = b, \;x \in \mathbb{Z}^n, \; x \ge 0\} and LaTeX: B = \{b: X^*(b) \neq \emptyset\} (i.e., LaTeX: \mbox{ILP}(b) is feasible and bounded). A Gröbner basis for LaTeX: A, holding LaTeX: c fixed, is a set, LaTeX: G_c = \{h^i\}, for which the following conditions hold:

  1. LaTeX: Ah^i = 0.
  2. LaTeX: c^T h^i < 0 (for minimization).
  3. For LaTeX: b \in B and LaTeX: x \not\in X^*(b), there exists LaTeX: h^i \in G_c such thatLaTeX:  x + h^i \ge 0.

Condition 1 says that LaTeX: A(x+h^i)=b if LaTeX: Ax=b, and condition 2 says LaTeX: c^T(x+h^i) < c^Tx. Therefore, if LaTeX: x is any feasible solution to LaTeX: \mbox{ILP}(b), either LaTeX: x is optimal for a particular right-hand side LaTeX: b, or condition 3 ensures there is an improvement by adding some vector, LaTeX: h^i \in G_c.

Here is a numerical example.

Group problem

Solving the corner polyhedron problem from the perspective of the Gomory group.

Haar condition

See randomized program.

Hadamard inequality

For an LaTeX: n \times n matrix LaTeX: A,

\mbox{det} (A) \le \|A_{:,1} \| ... \|A_{:,n}\| = \prod_{j=1}^n \| A_{:,j} \|,
where LaTeX: A_{:,j} is the j-th column of LaTeX: A and norms are LaTeX: L_2.


A half-line is a collection of points of the form LaTeX: \{t \, {{\bf v}: t \ge 0\}, where LaTeX: {\bf v} is in LaTeX: {\mathbb R}^n such that LaTeX: {\bf v} \neq 0 (some state LaTeX: \|{\bf v}\|=1, without loss in generality). Note the half-line is rooted at the origin -- see ray for translation.


Consider the hyperplane, LaTeX: \{x \in \mathbb{R}^n: a^T x = b\} (where LaTeX: a \in \mathbb{R}^n \backslash {0}). This partitions LaTeX: \mathbb{R}^n into two (closed) half spaces: LaTeX: \{ x: a^Tx \le b\} and LaTeX: \{ x: a^Tx \ge b\}. The two associated open halfspaces are LaTeX: \{ x: a^t x < b\} and LaTeX: \{ x: a^Tx > b\}.

Hard constraint

A constraint is hard if it has to be satisfied in any feasible solution. Compare with soft constraint.

Hausdorff metric

This metric arises in normed spaces, giving a measure of distance between two (point) sets, LaTeX: S and LaTeX: T. Given a norm, let LaTeX: d be the distance function between a point and a set:

LaTeX:  d(x, T) = \inf \{\|x-y\|: y \in T\} and LaTeX:  d(S, y) = \inf \{\|x-y\|: x \in S\}.


D(S, T) = \sup \{d(x, T): x \in S\} and LaTeX:  D(T, S) = \sup \{d(S, y): y \in T\}.

Then, LaTeX: h(S, T) = \max \{D(S, T), D(T, S)\} is the Hausdorff metric. In words, this says that for each LaTeX: x \in S, let LaTeX: y(x) be its closest point in LaTeX: T. Then, maximize this distance among all LaTeX: x.

For example, let LaTeX: S be the interval, LaTeX: [-1, 1], and let LaTeX: T be the interval, LaTeX: [0, 3]. The Hausdorff distance between these two intervals is:

h(S, T) = \max \{D(S, T), D(T, S)\} = \max\{1, 2\} = 2.

Hessenberg matrix

An upper Hessenberg matrix is a square matrix with LaTeX: A_{i, j} = 0 for LaTeX: i > j+1. A lower Hessenberg matrix is one whose transpose is upper Hessenberg.


The matrix of second partial derivatives of a function (assumed to be twice differentiable). This is often denotedLaTeX: H_f(x), where LaTeX: f is the function and LaTeX: x is a point in its domain.


In mathematical programming, this usually means a procedure that seeks an optimal solution but does not guarantee it will find one, even if one exists. It is often used in contrast to an algorithm, so branch and bound would not be considered a heuristic in this sense. In AI, however, a heuristic is an algorithm (with some guarantees) that uses a heuristic function to estimate the "cost" of branching from a given node to a leaf of the search tree. (Also, in AI, the usual rules of node selection in branch and bound can be determined by the choice of heuristic function: best-first, breadth-first, or depth-first search.)

Heuristic function

A heuristic function is used to guide a heuristic search, much lika a human would do. In this sense, it is a rule of thumb. Formally, in AI, a heuristic function is defined on the state of a search tree and is used in the evaluation of nodes for expansion in what is sometimes called a best-first directed tree search strategy.

Heuristic search

In mathematical programming, this is any (purposeful) search procedure to seek a solution to a global optimization problem, notably to combinatorial programs. In AI, this is a (partially) informed search (vs. blind search), using a heuristic function for guidance.

Two types of (blind) search methods are:

  1. Breadth-first: expanding a node of least depth;
  2. Depth-first: expanding a node of greatest depth (requiring backtracking).
(See branch and bound.)

A specific class of local heuristic search algorithms is the greedy algorithm. Another type is the 1-Opt for the TSP.

Here are heuristic search strategies that are based on some biological metaphor:

  1. Ant colony optimization, based on how ants solve problems;
  2. Genetic algorithm, based on genetics and evolution;
  3. Neural networks, based on how the brain functions;
  4. Simulated annealing, based on thermodynamics;
  5. Tabu search, based on memory-response;
  6. Target analysis, based on learning.

Hirsch conjecture

If the basis of one basic feasible solution differs from another by k columns, there exists k feasible pivots to go from the first to the second. This was eventually shown to be false for general polyhedra and is an open problem for bounded polyhedra. It was posed by Warren M. Hirsch in 1957.

Holder inequality

Holder's inequality states

LaTeX:  \|x y\|_1 \le \|x\|_p \|y\|_q ,

where LaTeX: \|\cdot\|_k denotes the LaTeX: L_k norm, LaTeX: p > 1, and LaTeX: 1/p + 1/q = 1. Equality holds if and only if LaTeX: |x|^p = a|y|^q for some scalar LaTeX: a. (Note: this is the Cauchy-Schwartz inequality if LaTeX: p = q = 2.)

Homogeneous function

A homogeneous function of degree p satisfies

LaTeX: f(ax) = (a^p)f(x) for all LaTeX: a.

It is positively homogeneous if we restrict LaTeX: a > 0. When the degree is not specified (even by context), it is generally assumed to be 1. For example, LaTeX: x is homogeneous, LaTeX: xy + x^2 is homogeneous of degree 2, LaTeX: x + x^2 is not homogeneous, and LaTeX: xy/(x+y) is positively homogeneous on the positive elements of LaTeX: {\mathbb R}^2.


A continuous transformation from one mapping to another. One example is the convex combination: LaTeX: t f(x) + (1-t)F(x). As LaTeX: t varies from 0 to 1, the function changes continuously from LaTeX: F to LaTeX: f.

Hungarian method

An algorithm to solve the standard assignment problem (presented by H. Kuhn in 1955). Let C be the LaTeX: n\times n cost matrix.

  • Step 0 (initialize). Subtract the least element of each row from that row of LaTeX: C. Then, do likewise for each column. The resulting matrix, LaTeX: C^0 has a zero in every row and column. (Further, a least cost assignment for LaTeX: C^0 is also a least cost assignment for LaTeX: C.) Re-define LaTeX: C=C^0.
  • Step 1 (cover zeros). Draw the minimum number of lines through the rows and columns to cover all zeros in LaTeX: C. If that minimum is LaTeX: n, you can assign LaTeX: i to LaTeX: j such that LaTeX: C_{ij}=0; then you can remove row LaTeX: i and column LaTeX: j, repeating the process to obtain an optimal assignment. Otherwise, if that minimum is greater than LaTeX: n, continue.
  • Step 2 (revise LaTeX: C). Select the minimum uncovered element. Subtract this from each uncovered element and add it to each twice-covered element (i.e., to those with both a horizontal and vertical line intersecting). Return to step 1.


               _ 1  2  3  4  5 _ 
              |                 | 
              |  1  2  3  4  7  | 1
              |  7  4  2  6  5  | 2
          C = |  2  4  3  1  4  | 3
              |  2  3  1  2  3  | 4
              |  5  4  3  2  4  | 5
              |_               _| 

After subtracting min row elements:

               _ 1  2  3  4  5 _ 
              |                 | 
              |  0  1  2  3  6  | 1
              |  5  2  0  4  3  | 2
              |  1  3  2  0  3  | 3
              |  1  2  0  1  2  | 4
              |  3  2  1  0  2  | 5
              |_               _| 

After subtracting the minimum column elements:

               _ 1  2  3  4  5 _ 
              |        |  |  |  | 
              |--0--0--2--3--5--| 1
              |  5  1  0  4  1  | 2
              |  1  2  2  0  1  | 3
              |  1  1  0  1  0  | 4
              |  3  1  1  0  0  | 5
              |_       |  |  | _| 

The minimum number of lines to cover all zeros is 4: a horizontal line through row 1 and vertical lines through columns 3, 4 and 5. The minimum uncovered element is 1. Subtract from each uncovered element and add to each twice covered element (cells (1,3), (1,4) and (1,5)):

               _ 1  2  3  4  5 _ 
              |                 | 
              |  0* 0  4  5  7  | 1
              |  4  0* 0  4  1  | 2
              |  0  2  2  0* 1  | 3
              |  0  0  0* 1  0  | 4
              |  2  0  1  0  0* | 5
              |_               _| 

The minimum cover is now 5, by drawing a line through each column. A minimum assignment is indicated by *. For example, we assign person 1 to job 1, person 2 to job 2, person 3 to job 4, person 4 to job 3, and person 5 to job 5. (There are alternative optimal assignments.)


A set of nodes (or vertices), say V, plus a set of edges, say E, such that each member of E is a subset of V. When each member of E has exactly 2 nodes, [V,E] is a graph. The hypergraph is a convenient mathematical way to describe relations that involve more than two objects (nodes).

Specific cases:

  • An IIS hypergraph: each node represents an inequality and each edge represents an IIS.
  • A common graphical representation of non-binary constraint networks is as a hypergraph, where nodes represent variables and edges group variables together that belong to the same scope.


An affine set of dimension LaTeX: (n-1), i.e. the set LaTeX:  \{x: a^T x = b\}, where LaTeX: a is a nonzero vector called the normal of the hyperplane.


Abbreviated LaTeX: \mbox{hypo}\{f,X\}. The hypograph of LaTeX: f over LaTeX: X is LaTeX: \{(x, z): x \in X, \;z \le f(x)\}, which is the region on or below the graph of LaTeX: f.

Implicit enumeration

Applied to integer programming, this is a systematic evaluation of all possible solutions without explicitly evaluating all of them. For example, suppose a feasible solution is at hand with objective value 100. Now suppose we solve a relaxation, such as fixing some of the variables and solving the resulting linear program, and we obtain a value of only 90. If we seek a maximum, we can ignore all possible solutions having the fixed values in the relaxation since they must all have an objective value of at most 90. This is often said to be equivalent to branch and bound; however, the early versions of these methods were related, but different. The early branch and bound was a best-first (even breadth-first) heuristic search   strategy. Early implicit enumeration schemes were depth-first.

Implicit function theorem

Suppose LaTeX: h:R^n \rightarrow R^m, where LaTeX: n > m, and LaTeX: h is in smooth. Further, suppose we can partition the variables, LaTeX: x = (y, z), such that LaTeX: y is m-dimensional with LaTeX: \nabla_y h(x) nonsingular at LaTeX: x^* = (y^*, z^*). Then, there exists LaTeX: \varepsilon > 0 for which there is an implicit function, LaTeX: f, on the neighborhood, LaTeX: N_{\varepsilon}(z*) = \{z: \|z-z*\| < e\} such that LaTeX: h(f(z), z)=0 for all LaTeX: z \in N_{\varepsilon}(z*). Further, LaTeX: f is smooth with LaTeX: \nabla_z f(z^*) = -\left( \nabla_y h(x^*) \right)^{-1} \nabla_z h(z^*).

Implied constraint

An implied constraint is a redundant constraint that (typically) improves propagation if it is added to a constraint model.

Implied equality

Implied equality. An inequality constraint, say LaTeX: g(x) \le 0, such that LaTeX: g(x)=0 for all feasible LaTeX: x.

Inactive constraint

A constraint that is not active (at a point).

Incidence matrix

A matrix, say LaTeX: M, that represents the incidence of edges or arcs to nodes in a graph or network. In the undirected case, LaTeX: M is binary valued: if edge LaTeX: k has endpoints LaTeX: i and LaTeX: j, then LaTeX: M_{ik} = M_{jk} = 1 and LaTeX: M_{rk} = 0 for LaTeX: r \neq i,j. In the directed case, the entry LaTeX: -1 indicates the tail: if the arc is directed from LaTeX: i to LaTeX: j, LaTeX: M_{ik} =-1 and LaTeX: M_{jk} = 1.


A system of inequalities and/or equations for which there is no feasible solution.

Incumbent solution

The current best solution found during an algorithmic search procedure. Typically in (mixed) integer programming this refers to the best known feasible solution in the branching tree.

Independent set

Given a graph, LaTeX: G=(V,E), an independent set (also called a stable set) is a subgraph induced by a subset of vertices, LaTeX: S, plus all edges with both endpoints in LaTeX: S, such that no two nodes in the subgraph are adjacent. A maximal independent set is one such that adding any node causes the subgraph to violate independence -- the added node is adjacent to some node already in the set. Given weights, LaTeX: w(v) for LaTeX: v \in V, the weight of an independent set, LaTeX: S, is the sum of the weights in LaTeX: S. The maximum independent set problem is to find an independent set of maximum weight. This is equivalent to finding a minimum weight vertex cover, say LaTeX: C, since LaTeX: V\backslash C is an independent set. (One can also refer to an independent set of arcs (or edges) as a subgraph with no two arcs (or edges) adjacent.)

Indicator function

A function indicating membership in a set, say LaTeX: A. Such functions are usually denoted LaTeX: \chi_A or LaTeX: 1_A and are evaluated as

LaTeX:  \chi_A (x) = 1_A = \left\{ \begin{array}{cl} 1, & x \in A \\ 0, & x \not\in A. \end{array} \right.


A relation of the form LaTeX: g(x) \le 0 or LaTeX: g(x) \ge 0. Such constraints are typical of mathematical programs. With equality allowed, these are called weak inequalities; strict inequalities are LaTeX: g(x) < 0 and LaTeX: g(x) > 0.

Also see variational inequality.

Inequality of degree k

An inequality, usually linear, with LaTeX: k variables. This arises in integer programming, where the variables are binary. In particular, an inequality of degree 2 can represent a precedence constraint, and it has special computational advantages in the node packing problem.


Not feasible.

Inference dual

See it in the list of particular duals.


(abbr. Inf). The greatest lower bound on a (real-valued) function over (a subset of) its domain. If f is unbounded from below, Inf{f} = -infinity, and if the domain is empty, Inf{f} = infinity. Otherwise, suppose L is any lower bound: f(x) >= L for all x in X. L is a greatest lower bound if, for any e > 0, there exists x in the domain for which f(x) <= L+e. (That is, we can get arbitrarily close to L in the range of f.) Note that the infimum is always defined, and its range is in the extended reals. The infimum is the minimum, if it is attained by some point in its domain.

Infinite program

An infinite [dimensional] program is a mathematical program with an infinite number of variables and constraints (also see semi-infinite program). Generally, this is approached as an abstract program.

Inner approximation

This solves a mathematical program by a sequence of approximations whose feasible regions are contained in the original feasible region. One example is the use of a barrier penalty method. Another example is polyhedral annexation.

Integer equivalent aggregation

Integer equivalent aggregation is a reduction of a system of linear algebraic equations with non-negative integer solutions to a single equation, which is a linear combination of the equations of the system and has the same set of non-negative integer solutions. For example, consider the system:

S = \{ (x,y,z) \in \{0,1\}^3 : x + y = 1 , y + z = 1\}.

By simply adding the equations, we have the equivalent description:

LaTeX: S = \{(x,y,z) \in {0,1}^3: x + 2y + z = 2\}.

Both sets consist of two the points LaTeX: (0,1,0) and LaTeX: (1,0,1).

Integer polyhedron

The convex hull of the feasible region of a linear integer program: LaTeX: \{x \in \mathbb{Z}^n+ : Ax \ge b\}.

Integer program

Abbreviated IP. The variables are required to be integer-valued. Historically, this term implied the mathematical program was otherwise linear, so one often qualifies a nonlinear integer program vs. a linear IP. Also see mixed integer program and combinatorial program.

Integer rounding

Rounding a non-integer solution to an integer neighbor. This will generally not yield an optimal solution to an integer program -- see Myths and Counterexamples in Mathematical Programming.

Interior penalty function

Same as barrier function.

Interior point method

A family of algorithms that stays in the strict interior of the feasible region, possibly through the use of a barrier function. The term grew from Karmarkar's algorithm to solve a linear program. In that case, the resulting solution (if it exists) is strictly complementary, which defines the (unique) optimal partition.

Interior solution

An optimal solution to a mathematical program that is in the relative interior of its set of optimal solutions. This arises in interior point methods, and relates to the strict complementarity of the solution.


An estimate of a value between two others. In LP, an interpolation estimate of the optimal objective value, say LaTeX: z(\alpha b+(1-\alpha)b'), is LaTeX: \alpha z(b) +(1-\alpha)z(b'), where LaTeX: b and LaTeX: b' are right-hand side vectors.

Intersection cut

See convexity cut.

Inventory balance equation

The constraint that says that the amount of inventory in the next time period must equal the current inventory plus what is produced or purchased minus what is lost or sold. Let LaTeX: y(t) be the inventory at the beginning of period LaTeX: t, with LaTeX: y(0) given. Then, the inventory equation is:

y(t+1) = ay(t) + P(t) - S(t),

where LaTeX: P(t) is the level of production (or somehow acquired), and LaTeX: S(t) is the level of sales (or somehow consumed). Typically, LaTeX: a=1, but if LaTeX: a < 1, it is called a loss factor, and if LaTeX: a > 1, it is called a gain factor.

The language used is for the inventory control in the production scheduling problem, but this has become a standard system of equations that appears in many mathematical programs. Thus, the meaning of the variables can be substantially different. One example is the

steel beam assortment problem.

Inventory control problem

See production scheduling problem.

Inverse problem

Given a point in the decision space, find parameter values that render it optimal. For example, suppose we have the LP: LaTeX: \min \{c^T x : Ax = b, \,x \ge 0\}. Let LaTeX: B be a basis from LaTeX: A, and we ask for LaTeX: (b, c) for which this basis is optimal. This has a simple solution. Let LaTeX: A = [B \, N] and partition LaTeX: c and LaTeX: x conformally, so LaTeX: Ax = Bx_B + Nx_N and LaTeX: c^Tx = c^T_Bx_B + c^T_Nx_N Then, the set of LaTeX: (b, c) for which the associated basic solution is optimal is the following cone:

K_B = \{ (b, c) : B^{-1}b \ge 0, \, c_B B^{-1}N \le c_N\}.

A more difficult inverse problem is when there is some target value for the parameters. We might, for example, fix LaTeX: b and seek to minimize LaTeX: \|c - c^*\|^2 subject to LaTeX: (b, c) \in K_B, where LaTeX: c^* is a target value for LaTeX: c.

The problem can be combinatorial. We might want to minimize LaTeX: \|c - c^*\| for some norm where LaTeX: c's are costs on the arcs or edges of a network. The solution at hand might be a spanning tree or a TSP tour. We might also impose constraints on LaTeX: c directly, such as simple bounds.

The general inverse problem may thus be stated:

LaTeX: \min \{ \|p - p^* \| : p \in P, \, p \in \arg\min \{ f(x; p): x \in F(p)\},

where LaTeX: p^* is the target, LaTeX: ||\cdot|| is some norm, LaTeX: P is a non-empty, closed set, which generally does not contain LaTeX: p^*, and LaTeX: F(p) is the feasible region for LaTeX: x, given LaTeX: p.

There are variations, such as when only a partial solution is given. An example is a partial TSP tour or some partial scheduling of jobs, and we ask for cost values for which there is an optimal

solution that contains these subsets.

Involutionary property

Of a transformation, it means something is its own inverse. In linear programming, the dual of the dual is the primal.

Irreducible infeasible subsystem

Abbreviated IIS. A subsystem of inequalities and equations such that the subsystem is inconsistent and every proper subsystem is consistent.


A system of constraints is irredundant if it contains no redundant constraint.

Isoperimetric problem

Among all closed plane curves with a given perimeter find one that maximizes the area. This is also known as Queen Dido's problem and serves as a classical problem for variational calculus. Its significance in mathematical programming is that it led to Lagrange's multiplier theorem. Restricting shapes to rectangles, the problem is:

\max \{xy : x+y=p, \, x \ge 0, \, y \ge 0\},

where LaTeX: 2p is the perimeter. For positive LaTeX: x and LaTeX: y and multiplier LaTeX: u, Lagrange's multiplier conditions requires LaTeX: y-u=0 and LaTeX: x-u=0, so LaTeX: x=y, which means the solution is the square.


Same as Contour.

Isotonic function

An order preserving function LaTeX: f: X \rightarrow \mathbb{R}^m, i.e. for any LaTeX: x and LaTeX: y in LaTeX: X

x > y \Rightarrow f(x) > f(y) \; \mbox{ and } \;
x < y \Rightarrow f(x) < f(y).


For a transformation, LaTeX: y=f(x), such that LaTeX: f is differentiable, the Jacobian is the determinant, LaTeX: \mbox{det}\left( \nabla f(x) \right). Historically, the number of LaTeX: y-variables equals the number of LaTeX: x-variables, say LaTeX: n, so LaTeX: \nabla f(x) is LaTeX: n \times n. The Jacobian matrix is defined as the matrix, LaTeX: \nabla f(x), allowing the number of functions to be less than the number of variables.


Slow convergence, perhaps to a non-solution point -- see zigzag phenomenon.

Jensen inequality

Let LaTeX: f be convex on LaTeX: X, and let LaTeX: x be a random variable with expected value, LaTeX: E(x). Then, LaTeX: E(f(x)) \ge f(E(x)). For example, let LaTeX: X=R and LaTeX: f(x) = x^2. Suppose LaTeX: x is normally distributed with mean zero and standard deviation LaTeX: s. Then,

LaTeX: E(f(x)) = E(x^2) = s^2 \ge 0 = E(x) = f(E(x)).

Jeroslow formula

This arises in mixed integer linear programming as an extension of Gomory functions. Consider the following standard form

MILP: LaTeX: \max \{ c^T x + d^T y : Ax + Dy = b, \, (x, y) \ge 0, \,x \in \mathbb{Z}^n\},

where LaTeX: \mbox{rank}(D)=m (LaTeX: m is the number of rows). Let LaTeX: \{B^k\} be the set of bases of LaTeX: D such that there exists LaTeX: \{u^k\} to satisfy the dual conditions: LaTeX: u^k B \ge d and LaTeX: u^k B^k = d^k. Let LaTeX: \lfloor v \rfloor denote the floor (or round-down) function, and define LaTeX: \lfloor v\rfloor_k = B^k \lfloor (B^k)^{-1} v \rfloor. Let LaTeX: V_k be the set of integer linear combinations of vectors spanned by a dual feasible basis, LaTeX: B^k:

LaTeX: V_k = \{v: (B^k)^{-1} v \in \mathbb{Z}^m\}.

Let LaTeX: V = \cap_k V_k, and define

U_k = \{v \in V_k :
(B^k)^{-1}v \ge 0, \, (B^k)^{-1}u \neq 0, \, 0 \le (B^k)^{-1}u \le (B^k)^{-1}v \Rightarrow u \not\in V \},

Let G be a Gomory function on LaTeX: \mathbb{R}^m. A Jeroslow formula (or function) for given LaTeX: D is:

J(v) = \max_k \left( \max \{G(w) + u^k (v - w): w \in V, \,
\lfloor v \rfloor_k - w \in U_k \} \right).

Job scheduling/sequencing

See scheduling jobs and sequencing problems.


A consistency notion in constraint programming.

Let LaTeX: P = (X, D, C) be a CSP.

  • Given a set of variables LaTeX: Y \subseteq X with LaTeX: |Y| = k -1 , a locally consistent instantiation LaTeX: I on LaTeX: Y is k-consistent iff for any kth variable LaTeX: x_{i_{k}} \in X \setminus Y there exists a value LaTeX: v_{i_{k}} \in D(x_{i_{k}}) such that LaTeX: I \cap \{ (x_{i_k}, v_{i_k}) \} is locally consistent.
  • The CSP LaTeX: P is k-consistent iff for any set LaTeX: Y of LaTeX: k-1 variables, any locally consistent instantiation on LaTeX: Y is k-consistent.

For example, 3-consistency ensures that any instantiation any pair of variables can be extended to an instantiation involving any third variable without violating any constraint. It is equivalent to path consistency. Similarly, 2-consistency is also known as arc consistency.


See n-Opt.

Kantorovich inequality

Let LaTeX: Q be a symmetric, positive definite matrix, with eigenvalues in LaTeX: [a,b], and let LaTeX: x \in \mathbb{R}^n. Then,

\frac{\| x \|^2}{(x^T Q x)(x^T Q^{-1}x)} \le \frac{4 \, a \, b}{(a+b)^2}.

Its significance in mathematical programming is in convergence analysis: it bounds the convergence rate of Cauchy's steepest ascent.

Karmarkar algorithm

An interior point method for linear programming, which began a series of variations (e.g., see affine scaling). The original algorithm applies to the system LaTeX: Ax=0, for LaTeX: x \in S_n, i.e. x is an element of the n-dim. simplex, and assumes LaTeX: Ae=0 (so LaTeX: e/n is a feasible solution). It further assumes the optimal objective value is known to be zero. (All of these assumptions have been relaxed in subsequent developments.)

Here are the basic steps of Karmarkar's (original) algorithm, given LaTeX: x > 0, LaTeX: e^T x=1 and LaTeX: Ax=0.

  1. Form LaTeX: D = \mbox{diag}(x_j) and
B = \begin{bmatrix} AD \\ e \end{bmatrix}
    (assume LaTeX: \mbox{rank}(B)=m+1).
  2. Project: Project LaTeX: Dc to null space of LaTeX: B: LaTeX: c* = [I-B^T(BB^T)^{-1}B]Dc.
  3. Normalize Normalize the ascent direction: LaTeX: d = c* /(\|c*\| \sqrt{n(n-1)}. If c*=0, terminate since the solution is optimal.)
  4. Move Move in projected space to LaTeX: y = e/n - sd, where LaTeX: s is a fixed step size (= 1/4).
  5. Project Project back into x-space: LaTeX: x = Dy /(e^TDy).
The time complexity is LaTeX: O(n^{3.5}L^2 \ln(L)\ln(\ln(L))), where LaTeX: L is a measure of data storage for a given precision. The solution obtained is in the relative interior of the set of optimal solutions.

Kernel of basis

See basis.

Klee-Minty polytope

This is an example to show that the elementary simplex method does not have polynomial time complexity. The polytope is

\left\{ x : 
\left(\sum_{i=1}^{k-1} 2^{k-i+1} x_i \right) + x_k \le 5^k, 
<pre>               \; \mbox{ for } k = 1, 2, \ldots, n, \; x \ge 0 \right\}


\sum_{i=1}^n 2^{n-i} x_i

over this polytope shows the exponential complexity.

Knapsack problem

An integer program of the form,

LaTeX: \max \{c^Tx : x \in \mathbb{Z}^n, \, x \ge 0,\, \mbox{ and } a^Tx \le b \},

where LaTeX: a > 0. The original problem models the maximum value of a knapsack that is limited by volume or weight LaTeX: (b), where LaTeX: x_j is the number of items of type LaTeX: j put into the knapsack at unit return LaTeX: c_j, that uses LaTeX: a_j units per item.

The group knapsack problem has this form, except that it pertains to Gomory's corner polyhedron problem for general integer programming.

The multi-dimensional knapsack problem has more constraints (e.g., volume and weight), LaTeX: Ax <= b, where LaTeX: A >= 0 with LaTeX: Ae > 0 and LaTeX: e^TA > 0.

Kuhn-Tucker conditions

Abbreviated KT and KKT (latter credits Karush, who had the conditions in his earlier thesis).

This is the Lagrange Multiplier Rule, which Kuhn and Tucker published as an extension to allow inequality constraints. They also introduced the notion of a constraint qualification (which had been only the linear independence condition stemming from Lagrange's multiplier theorem). The most important contribution that sets KT apart from similar discoveries is the connection with saddle points that led to Lagrangian duality.

Kuhn-Tucker point

Abbreviated KKT point. A feasible point that satisfies the Kuhn-Tucker conditions.


A solver using a simplified problem specification for Linear, Integer, Nonlinear and Dynamic optimization. See [1]


A modeling language companion to LINDO.


A Linear Programming Language

LU decomposition

A factorization of a matrix LaTeX: A into a lower triangular matrix LaTeX: (L) and an upper triangular matrix LaTeX: (U) so that LaTeX: A=LU. The advantage is to solve the system LaTeX: Ax=b, we first apply forward substitution to solve LaTeX: Ly=b, then backward substitution to solve LaTeX: Ux=y.

Label correcting algorithm

Arises in labeling algorithms for shortest path problem. Each iteration a label is set to an estimate of the shortest path from a given node. All labels become exact values at termination.

Label setting algorithm

Arises in labeling algorithms for shortest path problem. Each iteration a label becomes the actual shortest path from some node. (Termination occurs when the destination node(s) are permanently lablelled.)

Labeling algorithm

A class of algorithms for path-related problems on graphs and networks. To illustrate, consider a network LaTeX: [V,A] with arc values LaTeX: c.

  • The Ford-Fulkerson max flow labeling algorithm (LaTeX: c is capacity).
    • Input is source LaTeX: (s), destination LaTeX: (t), and capacities LaTeX: (c). Output is the max flow from LaTeX: s to LaTeX: t.
    • Initialize all flows to be zero and assign the label LaTeX: (-,\infty) to the source. All other nodes are unlabeled, and all nodes (including the source) are unscanned.
    • Labeling process: Select any labeled, unscanned node, say LaTeX: k with label LaTeX: (a,b). If node LaTeX: j is an unlabeled neighbor and LaTeX: x_{kj} < c_{kj}, assign the label LaTeX: (k+,b^*) to node LaTeX: j, where LaTeX: b^* = \min \{b, c_{kj}-x_{kj}\}. If node LaTeX: j is an unlabeled neighbor and LaTeX: x_{jk} > 0, assign the label LaTeX: (k-,b^*) to node LaTeX: j, where LaTeX: b* = \min\{b, x_{jk}\}. (In either case, define node LaTeX: j as labeled, but unscanned, while node LaTeX: k is now labeled and scanned.) Repeat this until either the sink is labeled and unscanned, or until no more labels can be assigned. In the latter case terminate; in the former case, go to the flow change step.
    • Flow change: Let the sink LaTeX: (t) have label LaTeX: (a,b). If LaTeX: a is LaTeX: k+, replace LaTeX: x_{kt} by LaTeX: x_{kt}+b; if it is LaTeX: (k-,b), replace LaTeX: x_{kt} by LaTeX: x_[kt}-b. In either case, apply the flow change (LaTeX: b or LaTeX: -b) along the path back to the source. The result is a new flow that is LaTeX: b more than it was. Erase labels, except on the source, and go to the labeling process.
  • Dijkstra's shortest path algorithm (LaTeX: c is distance).
    • Input is source LaTeX: (s), destinations LaTeX: (T \neq \emptyset), and distances LaTeX: (c). Output is a shortest path from LaTeX: s to each node in LaTeX: T.
    • Initialize labels: LaTeX: L_v = \infty for LaTeX: v \neq s and LaTeX: L_s=0. Set LaTeX: S=\emptyset.
    • Label: select LaTeX: u \in \arg\min {L_v: v \in V\backslash S\} (terminate if LaTeX: S=V). Then, update LaTeX: S := S \cup \{u\}, and for LaTeX: v \in V\backslash S: if LaTeX: L_u + c_{uv} < L_v, set LaTeX: L_v := L_u + c_{uv}.
    • Terminate when LaTeX: T is a subset of LaTeX: S (i.e., all destination nodes are labeled with their shortest distance from node LaTeX: s).
  • A generic shortest path labeling algorithm from all nodes to destination node LaTeX: t (LaTeX: c is distance).
    • Input is destination LaTeX: (t) and distances LaTeX: (c). Output is a shortest path from each node to LaTeX: t.
    • Initialize labels, LaTeX: L_v is the estimate of a shortest path length from node LaTeX: v to node LaTeX: t, with LaTeX: L_t = 0.
    • Label: Find an arc, say LaTeX: (i, j), that violates the distance equations, say LaTeX: L_j > L_i + c_{ij}. If none, terminate; otherwise, update the label: LaTeX: L_j = L_i + c_{ij}.
    • The labeling step is repeated until there are no violations. At termination, LaTeX: L_j is the length of a shortest path from node LaTeX: j to node LaTeX: t.

Another type of labeling algorithm occurs in simplicial subdivision, used to compute a fixed point that represents an economic equilibrium.

Lagrange conditions

The optimality equation (plus feasibility conditions) in Lagrange's multiplier theorem.

Lagrange multiplier

See Lagrange multiplier rule.

Lagrange multiplier rule

From the extension of Lagrange's multiplier theorem. Suppose

LaTeX: x^* \in \arg\max \{f(x): g(x) \le 0, \, h(x) = 0\},

where LaTeX: f, LaTeX: g, and LaTeX: h are smooth. Then, there exist multipliers LaTeX: (u, v) for which the following conditions hold:

  • LaTeX: \nabla_x [f(x^*) - u^T g(x^*) - v^T h(x^*)] = 0;
  • LaTeX: u >= 0;
  • LaTeX: u^T g(x*) = 0.

Since LaTeX: g(x^*) \le 0, the last condition, given LaTeX: u \ge 0, is equivalent to complementary slackness. These are considered first-order optimality conditions, though the Lagrange Multiplier Rule is not always valid -- see constraint qualifications.

For extensions see the Generalized Lagrange multiplier method.

Lagrange multiplier theorem

Let LaTeX: f and LaTeX: h be smooth functions and suppose LaTeX: \nabla h(x^*) has full row rank. Then, LaTeX: x^* \in \arg\max \{f(x): h(x) = 0\} only if there exists LaTeX: v \in \mathbb{R}^m such that:

LaTeX: \nabla f(x*) - v^T \nabla h(x*) = 0.

The LaTeX: i-th component of LaTeX: v, LaTeX: v_i, is called the Lagrange multiplier associated with the LaTeX: i-th constraint, LaTeX: h_i(x)=0. Extensions to remove the rank condition and/or allow inequality constraints were by Carathéodory, John, Karush, and Kuhn & Tucker. Also see the Lagrange Multiplier Rule.


For the mathematical program

LaTeX: \min \{ f(x) : g(x) \ge 0, \, h(x) = 0, \, x \in X \},
the Lagrangian is the function:

LaTeX: L(x, u, v) = f(x) - u^T g(x) - v^T h(x).

for LaTeX: x \in X and LaTeX: u \ge 0. Note that the Lagrange Multiplier Rule can be written as the first-order conditions for LaTeX: (x^*, u^*,v^*) to be a saddle point of LaTeX: L. In Lagrange's multiplier theorem (where LaTeX: X=\mathbb{R}^n and LaTeX: g is vacuous), this is simply that LaTeX: \nabla L(x^*,v^*)=0, which could be any type of stationary point.

Lagrangian duality

See it in the list of particular duals.

Lagrangian relaxation

The removal of constraints and putting them into the objective as in the Lagrangian function. Specifically, suppose we have

LaTeX: \max \{ f(x): x \in X, \, g(x) \le 0, \, h(x) = 0\}.

The Lagrangian relaxation is

LaTeX: \max \{f(x) - u^T g(x) - v^T h(x): x \in X\},

where LaTeX: u \ge 0 (LaTeX: v is unrestricted).

This is a relaxation of the original mathematical program because the constraints have been removed (expanding the feasible region), and for LaTeX: x feasible (i.e., satisfying these constraints), the relaxed objective satisfies:

LaTeX: f(x) - u^T g(x) - v^T h(x) \ge f(x),

because LaTeX: v^T h(x)=0 and LaTeX: u^Tg(x) \le 0.

The objective is the usual Lagrangian function when LaTeX: X is simply LaTeX: \mathbb{R}^n. It provides a foundation for Lagrangian duality and the Generalized Lagrange Multiplier Method.

Note that LaTeX: X could retain some constraints, such as in the separable form LaTeX: X = X_1 \times \ldots \times X_n. Now suppose LaTeX: f(x) = f_1(x_1) + \ldots + f_n(x_n), LaTeX: g(x) = g_1(x_1) + \ldots + g_n(x_n), and LaTeX: h(x) = h_1(x_1) + \ldots + h_n(x_n). Then, the Lagrangian relaxation decomposes into LaTeX: n independent mathematical programs for any multiplier value:

LaTeX: \max \{f(x) - u^T g(x) - v^T h(x): x \in X\} 
= \sum_{j=1}^n \max {f_j(x_j) - u_j g_j(x_j) - v_j h_j(x_j): 
x_j \in X_j \}.

Lagrangian saddlepoint equivalence

The tuple LaTeX: (x^*, u^*, v^*) in LaTeX: X \times \mathbb{R}^m \times \mathbb{R}^M, with LaTeX: u^* \ge 0, is a saddlepoint of of the Lagrangian LaTeX: L if, and only if, the strong duality properties hold:

  1. LaTeX: x^* \in \arg\max \{L(x, u^* v^*): x \in X\}
  2. LaTeX: g(x^*) \le 0 and LaTeX: h(x^*) = 0 (LaTeX: x^* is feasible)
  3. LaTeX: (u^*)^T g(x^*) = 0 (LaTeX: x^* and LaTeX: u^* satisfy complementary slackness)

See the supplement on Lagrangian Saddle Point Equivalence for further information.


A set LaTeX: S that contains LaTeX: x \vee y and LaTeX: x \wedge y for all LaTeX: x and LaTeX: y in LaTeX: S.

Lattice program

Minimizing a subadditive function on a lattice.

Lattice search

This finds the maximum of a unimodal function on a finite set, LaTeX: \{x_1, x_2, \ldots, x_m\}, by evaluating the function at points placed by the modified fibonacci sequence: LaTeX: K_{n+2} = K_{n+1} + K_n + 1. This modification comes from the improvement obtained by dropping the inferior end point after an evaluation: if the set is in LaTeX: [a, b] and LaTeX: f(x) < f(y), the new interval is LaTeX: [x+1, b] dropping the point LaTeX: x from the remaining search set.

The method proceeds the same as fibonacci search, except the placements follow the modified fibonacci sequence, which is equivalent to LaTeX: K_n = F_{n+1} - 1, reflecting the improvement that accumulates as LaTeX: n increases, for the extra point dropped after each evaluation. The placement ratio, LaTeX: K_{n-1} / K_n, also approaches the golden mean, so lattice search also approaches the golden section search.

Some refer to fibonacci search when they mean lattice search. The following table shows the difference. For example, with 20 evaluations, fibonacci search can guarantee finding the maximum on a set of 10,946 points, while lattice search can find it on a set of 17,710 points. Golden section is included for comparison as well. With 20 evaluations it can find the maximum on a set of only 9,349 points.

           n        5     10     15     20
           F_n      8     89    987  10946
           K_n     12    143   1596  17710
        golden      6     76    843   9349 

Lattice search minimizes the maximum number of evaluations needed to find the maximum on a given set of points. If the number in that set is LaTeX: K_N, the number of evaluations is LaTeX: N. If the number in the set is not exactly some modified fibonacci number, one can fill in dummy points at the end of the set. For example, if LaTeX: K_{N-1} < m < K_N, let the set be LaTeX: {x_1, \ldots, x_m, y_1, \ldots, y_k\}, where LaTeX: k = K_N- m. Then, the (minimum) number of evaluations to guarantee finding the maximum of any unimodal function is LaTeX: N.

Level set

For the function LaTeX: f:X \rightarrow R, the LaTeX: c-level set is LaTeX: \{x: x \in X, \, f(x) \le c\}. Sometimes, the 0-level set is called simply the level set of f.

Levenberg Marquardt algorithm

This is designed for solving nonlinear programs using the equation:

(\nabla^2 f(x) + p I)d = -\nabla f(x),

where LaTeX: \nabla^2 f(x) is the Hessian. For unconstrained optimization, the solution LaTeX: d serves as a direction vector for the iteration. This is also used in a trust region method. The parameter LaTeX: p is set to give a balance between Newton's method LaTeX: (p=0) and Cauchy's steepest descent LaTeX: (p >> 0). A low value of LaTeX: p helps get through difficult landscape curvature, and a high value yields some descent.

Lexicographic Ordering constraint

is a global constraint that allows the comparison of sequences of variables. Given two sequences LaTeX: x and LaTeX: y of LaTeX: n variables, LaTeX: \langle x_1 \dots x_n \rangle and LaTeX: \langle y_1 \dots y_n \rangle , let LaTeX: x \preceq_{lex} y denote the lexicographic ordering constraint. The constraint holds iff LaTeX: n = 0 or LaTeX: x_1 < y_1 or LaTeX: x_1 = y_1 and LaTeX: \langle x_2 \dots x_n \rangle \preceq_{lex} \langle y_2 \dots y_n \rangle.

Lexicographic order

A nonzero vector is lexicographically positive if its first non-zero coordinate is positive. The vector LaTeX: x is lexicographically greater than the vector LaTeX: y if LaTeX: x-y is lexicographically positive, and this defines the lexicographic order in LaTeX: \mathbb{R}^n. This is a total ordering in that every two vectors are either equal, or one is lexicographically greater than the other.

This was first used in mathematical programming to resolve cycling in the simplex method. It also provides a way to obtain solutions for multiple objectives with the property that LaTeX: x is a Pareto maximum if LaTeX: f(x) is lexicographically greater than or equal to LaTeX: f(y) for all feasible LaTeX: y.


In integer programming, it is creating a valid inequality for the original set, given one for a lower dimensional set, by augmenting a binary variable that is absent from the original inequality. That is, suppose LaTeX: S is a subset of LaTeX: \{0,1\}^n and LaTeX: a^T x \le b is a valid inequality for LaTeX: S\cap \{x \in \{0,1\}^n: x_1=0\} (where it does not matter what LaTeX: a_1 is since LaTeX: x_1=0). Then, under certain conditions, the inequality is valid for LaTeX: S. The "conditions" are problem dependent, and LaTeX: a_1 is set as large as possible.

For example, suppose LaTeX: x_2 + x_3 + x_4 \le 2 is a valid inequality for LaTeX: S. A lifting is the strengthening, LaTeX: 2x_1 + x_2 + x_3 + x_4 \le 2, if it can be ascertainted that this is a valid inequality for LaTeX: S. The coeffiecient of LaTeX: x_1 (LaTeX: 2 in the example) is determined by considering how large LaTeX: a_1 can be with LaTeX: x_1=1. (In this case, the lifted inequality is valid if LaTeX: x_2 + x_3 + x_4 \le 2 is valid for LaTeX: \{x \in S: x_1=0\} and if LaTeX: x_1=1 implies LaTeX: x_2 = x_3 = x_4 = 0.) The motivation for doing this arises in a branch and cut algorithm strategy. At a node in the search tree we might have a valid inequality, LaTeX: a^Tx \le b, when the branch had LaTeX: x_1=0, and we want to make it valid even if LaTeX: x_1=1. The inequality is valid for all

a_1 \le b - \max \{a_2 x_2 + \ldots + a_n x_n : x \mbox{ feasible and } x_1 = 1\}.

Line search

Optimizing the objective along a line: LaTeX: \mbox{opt} \{ f(x + \alpha \, d): s \ge 0\}, where LaTeX: x is a given point and LaTeX: d is a given direction vector (LaTeX: \alpha is a scalar). A coarse tolerance could be used, so "Opt" is not entirely accurate. (In fact, there are reasons to believe not optimizing is best.)

Here are some particular search methods in this glossary:

Line segment

The line segment with end points LaTeX: x and LaTeX: y in LaTeX: \mathbb{R}^n is

LaTeX: \{\alpah x + (1-\alpha)y: \alpha \in [0, 1]\},

which is denoted LaTeX: [x, y] and is the closed line segment. The open line segment is

LaTeX: \{\alpah x + (1-\alpha)y: \alpha \in (0, 1)\},

Linear combination

The vector LaTeX: y is a linear combination of vectors LaTeX: x_1, x_2, \ldots, x_n if LaTeX: \textstyle y = \sum_{k=1}^n {a_k x_k}, where LaTeX: a_k is called the coefficient of LaTeX: x_k.

Linear convergence

See convergence.

Linear program

An optimization problem in which the objective and constraints are linear. Forms include LaTeX: \min \{c^Tx: Ax = b,\, x >= 0\} and LaTeX: \max \{c^Tx : Ax \le b\}. The standard form assumes LaTeX: A has full row rank. Computer systems ensure this by having a logical variable LaTeX: (y) augmented, so the form appears, for example, as LaTeX: \min \{c^Tx: Ax + y = b,\, L \le (x, y) \le U\} (also allowing general bounds on the variables). The original variables LaTeX: (x) are called structural. Note that each logical variable can be a slack, surplus, or artificial variable, depending on the form of the original constraint. This computer form also represents a range constraint with simple bounds on the logical variable. Some bounds can be infinite (i.e., absent), and a free variable (logical or structural) is when both of its bounds are infinite.

Linearity interval

When a univariate function is piece-wise linear, it has the form LaTeX: a_i x + b_i for LaTeX: x in the interval LaTeX: [c_i, c_{i+1}], where LaTeX: a_i is not equal to LaTeX: a_{i+1}. (The phrase usually means the function is continuous.) This arises in linear programming when considering the optimal value as a function of varying right-hand sides (or cost coefficients) in fixed proportions: LaTeX: b+td (or LaTeX: c+td), where LaTeX: d is an admissible change vector and LaTeX: t is the (scalar) variable. Then, for the range of LaTeX: t where the LP has a solution, the optimal value function has this piece-wise linear (continuous) form, and the intervals that comprise the domain are called linearity intervals.


The strategy of linearization is to reformulate a nonlinear program as a linear progarm through the introduction of auxiliary variables and constraints that are used to circumvent the nonlinearities. See standard linearization, Glover's linearization, and Reformulation-Linearization-Technique

Lipschitz continuous

The function LaTeX: f: X \rightarrow \mathbb{R} is Lipschitz continuous if there exists a value LaTeX: K, called the Lipschitz constant, such that LaTeX: |f(x) - f(y)| \le K \|x-y\| for all LaTeX: x and LaTeX: y in LaTeX: X. This relation is called the Lipschitz condition. It is stronger than continuity because it limits the slope to be within LaTeX: [-K, K]. The generalized Lipschitz condition is that there exists a monotonically increasing function, LaTeX: M : \mathbb{R} \rightarrow \mathbb{R} with the property that LaTeX: m(z) \rightarrow 0 as LaTeX: z \rightarrow 0 such that there exists LaTeX: K for which LaTeX: |f(x)-f(y)| \le K m(\|x-y\|) for all LaTeX: x and LaTeX: y in LaTeX: X.

Local convergence

See convergence.

Local optimum

A feasible point LaTeX: x^* that is an optimal solution to the mathematical program whose feasible region is the intersection of the original region and some neighborhood of LaTeX: x^*.

Locally convex function

There exists a subset of the domain, such as the neighborhood of a point, such that the function is convex on that subset.

Location problem

See facility location problem.

Lockbox problem

A firm wants a check that it receives to clear as quickly as possible (so it can use the money). To reduce the number of days to clear a check, the firm opens offices, called lockboxes, in different cities to handle the checks. There is a cost to open a lockbox, and there are average annual losses calculated from interest rates and the number of days to clear a check sent from source LaTeX: i to lockbox city LaTeX: j.

Logical variable

See linear program.

Lot size problem

This is one of the oldest mixed-integer programs in operations research, first presented by Wagner and Whitin in 1958. The problem is to minimize cost while satisfying product demands over (discrete) time. Let LaTeX: y_t be the number of units produced in period LaTeX: t, for LaTeX: t=1,\ldots,T (LaTeX: T is called the planning horizon), and let LaTeX: x_t = 1 if a setup occurs in period LaTeX: t; LaTeX: 0 otherwise. Let LaTeX: D_t be the demand in period LaTeX: t, and let the demand from period LaTeX: i to period LaTeX: j, inclusive, be LaTeX: d_{ij} = \sum_{i \le t \le j} D_t. Then, a MILP formulation is:

LaTeX: \min \left\{ c^T x + d^T y: x \in \{0,1\}^n, \, y \ge 0, 
\sum_{t\le i} y_t \ge d_{li} \mbox{ for } i=1, \ldots,n-1, \,
\sum_{t\le n} y_t = d_{1n}, \,
d_{in} x_i - y_i \ge 0 \mbox{ for } i=1, \ldots,n \right\}.

(This is preferred over the MILP that defines inventory balance equations. Even though they are equivalent, the LP relaxation of the above is sharper.)

Lower semi-continuity

Suppose LaTeX: x^k \rightarrow x. Of a function, LaTeX: f is lower semi-continuous if

LaTeX: \lim_{k \rightarrow \infty} \inf_{j \ge k} f(x^j) \ge f(x).

Of a point-to-set map, LaTeX: A, let LaTeX: N_{\varepsilon}[S] be a neighborhood of the set LaTeX: S: for each LaTeX: \varepsilon &gt; 0, there exists LaTeX: K such that for all LaTeX: k \ge K, LaTeX: A(x) is a subset of LaTeX: N_{\varepsilon}[A(x^k)]. A pathology to show that the feasibility map may not be lower semi-continuous is:

LaTeX: A(b) = L_b(g) = \{x \in [-1, 1]: g(x) \le b\},

where LaTeX: g(x) = x if LaTeX: x is in LaTeX: [-1, 0], and LaTeX: g(x) = 0 if LaTeX: x \in [0,1].

Lower triangular matrix

A square matrix, LaTeX: A, such that LaTeX: A_{ij}=0 for LaTeX: i \le j.


A constraint satisfaction problem where some constraints may be violated, and the quality of the solution is measured by the number of satisfied constraints (i.e., the optimal solution will have the maximum of satisfied constraints).


Manager for Interactive Modeling Interfaces - A modeling system that integrates mathematical programming, database management and artificial intelligence.


Modular In-core Nonlinear Optimization System - An optimization system, available from Stanford Optimization Lab.


Modeling by Object Driven Linear Elemental Relations


A software package for solving large-scale sparse linear, conic and convex optimization problems.


Mathematical Programming Language.MPL.

Main Page

Mathematical Programming Glossary©

This glossary contains terms specific to mathematical programming and related disciplines like economics, computer science, and mathematics. A thread of terms is constructed by following other terms that are defined within those already displayed. Terms can be removed from a thread by clicking button.png. A new thread is initiated by following the title of one of the thread's terms.

The following are of general interest (you should read them if this is your first time here).

Please remember this is a glossary, not a survey, so no attempt is made to cite credits.

Manpower planning problem

This has many variations, but typically focuses on either short-term problems, like scheduling workers, or on longterm problems, like recruiting. There are workers of different categories, some relate to each other by level (e.g., copilot and pilot), others do not. For longterm manpower planning, inventory balance equations  are typically used to represent changes in the number of employees in each category over time, including new hires and various forms of departures. For the short-term scheduling, a simple model is the shortest path  through a network. Often, more complicated combinatorial programs are required.

Marginal price

The rate at which the optimal objective value changes with respect to a perturbation of a right-hand side, like a supply, demand or capacity limit. When it exists, the marginal price is often equal to a most pessimistic dual price (e.g., consumer's marginal price is the greatest dual price, which reflects what another unit of demand would cost to satisfy).

Markov decision process

A stochastic dynamic program, whereby for each policy the resulting state variables comprise a Markov process (a stochastic process with the property that the conditional probability of a transition to any state depends only on the current state, and not on previous states).

Matching problem

Given a graph, a matching is a subgraph with the property that no two edges are incident to the same node. Subject to certain restrictions, the problem is to find a feasible matching that optimizes some objective, such as minimizing the total cost of the edges in the matching. A perfect matching is when each node is incident with exactly one edge.

The assignment problem is a classical perfect matching, whereby the graph is bipartite (nodes are two sets: people and jobs), and the matching must have all nodes (every person must do a job, and every job must be done). Then, the matching corresponds precisely to an assignment since each job node is incident with exactly one person node in the matching. Further, the cost of each such matching is precisely the total cost of the corresponding assignment, so this min-cost matching problem is the same as the assignment problem.

Another type of matching problem, called maximum cardinality matching, is to find a matching with the greatest number of nodes, and this is also solvable in polynomial time. However, minimum weight – maximum cardinality is NP-complete.

Mathematical program

Commonly an optimization problem of the form

LaTeX: \max \{ f(x) : x \in X,\, g(x) \le 0, \, h(x) = 0\},

where LaTeX: X is a subset of LaTeX: \mathbb{R}^n and is the domain of LaTeX: f, LaTeX: g and LaTeX: h, which map into real spaces. The function LaTeX: f is called the objective function, which is typically real-valued. If not, then LaTeX: f maps into LaTeX: \mathbb{R}^p with LaTeX: p \ge 2, and the problem is a multiple objective problem. The feasible region is the collection of LaTeX: x that simultaneously satisfy LaTeX: x in LaTeX: X, LaTeX: g(x) \le 0, and LaTeX: h(x) = 0, which are the program's constraints.

Matrix norm

See norm.


This is an abstraction of independence that includes vectors in linear algebra and circuits in graphs. First, we need some preliminary definitions. Let LaTeX: N = \{1, \ldots, n\} be a finite set, and let LaTeX: M = \{S_1, \ldots, S_m\} be a collection of subsets of LaTeX: N. LaTeX: (N,M) is an independence system if LaTeX: S_i in LaTeX: M implies every subset of LaTeX: S_i is in LaTeX: M. Elements of LaTeX: M are called independent sets, and the remaining subsets of LaTeX: N are called dependent sets. An independent set, say LaTeX: S_i, is maximal if LaTeX: S_i \cup \{k\} is not in LaTeX: M for any LaTeX: k \in N\backslash S_i. The max-cardinality independent set of any subset of LaTeX: N, say LaTeX: T, is given by:

LaTeX: m(T) = \max \{ |S_i|: S_i \in M, \, S_i \subseteq T\}.

A matroid is an independence system LaTeX: (N, M) in which for any subset of LaTeX: N, say LaTeX: T, every independent set in LaTeX: T that is maximal in LaTeX: T has cardinality LaTeX: m(T). The definition essentially means that maximal and maximum are the same. In other words, a system is a matroid if, and only if, a greedy algorithm yields a globally optimal solution to the associated max weight problem. An example is the spanning tree.


(or maximin). The objective is, itself, the value of another mathematical program:

LaTeX: f(x) = \inf \{ f(x, y): y \in Y(x),\, g(x, y) \ge 0,\, h(x, y) = 0\}.

Then, the objective is to maximize f(x) (on some feasible region).

Max flow-min cut theorem

The maximum flow through a (single commodity) capacitated network from a specified node, called the source, to another node, called the sink, equals the value of the minimum cutset. Originally proven directly from principles of networks, this was discovered to be a special case of the [[LP|]]uality Theorem of Linear Programming.

Max flow problem

In a network, LaTeX: [V, A], there are two special nodes in LaTeX: V: a source LaTeX: (s) and a sink LaTeX: (t). The problem is to maximize the flow from LaTeX: s to LaTeX: t subject to conservation of flow constraints at each node and flow bounds on each arc. The max flow labeling algorithm provides a constructive proof of the Max flow - Min cut theorem.

This can also be obtained from the duality of the following linear program:

maximize LaTeX: v:

LaTeX: 0 \le x_{ij} \le U_{ij} for LaTeX: (i, j) \in A;

LaTeX: \sum_j x_{ij} - \sum_j x_{ji} = 0 for LaTeX: i \in V \backslash \{s,t\};

LaTeX: \sum_j x_{sj} - \sum_j x_{js} = v;

LaTeX: \sum_j x_{tj} - \sum_j x_{jt} = -v.

LaTeX: U are upper bounds (capacities) on arc flows, and each sum is for LaTeX: j such that the indicated arc is in LaTeX: A.


In a partially ordered set, a maximal element is one for which no element follows it in the ordering. This is not to be confused with maximum. For example, consider the following knapsack problem:

LaTeX: \max \{ 5x + y: (x, y) in \mathbb{Z}^2, \, x \ge 0, \, y \ge 0,  3x + 2y \le 3\}.

The maximum value is LaTeX: 5, with LaTeX: (x, y) = (1, 0). Define the partial ordering to be the ordinary greater-than-or-equal-to, so that LaTeX: (x',y') > (x, y) means LaTeX: x' \ge x, LaTeX: y' \ge y, and either LaTeX: x' > x or LaTeX: y' > y. In words, LaTeX: (x', y') > (x, y) means we can put at least one more item into the knapsack, in which case we would increase our return. When we cannot do so, LaTeX: (x, y) is a maximal element. In particular, LaTeX: (0,1) is a maximal element, and its value is LaTeX: 1, which is not the maximum.

Another definition pertains to a subset with respect to some property such that no proper superset has that property. In the 0-1 knapsack problem with LaTeX: n variables, we define the subset of items taken (for any solution) as LaTeX: J=\{j: x_j=1\}. Then, we define LaTeX: J as maximal if no item can be added without violating the limit, i.e., LaTeX: \textstyle a_k > b - \sum_{j\in J} a_j for all LaTeX: k not in LaTeX: J.


The objective function in a mathematical program whose sense of optimization is to maximize.


(pl. maxima). A feasible point at which the supremum is achieved. (See Weierstrass' Theorem.) An LaTeX: \varepsilon-maximum is within LaTeX: \varepsilon of being a maximum: LaTeX: \textstyle f(x) \ge f^* - \varepsilon, where LaTeX: f^* is the supremum and LaTeX: \varepsilon > 0.

Maximum principle

Necessary conditions for a solution to the (constrained) problem of variational calculus, given in terms of nonlinear differential equations. Generally credited to Pontryagin (1962), it was derived as an extension of the Euler-Lagrange conditions for variational calculus, and later was derived from dynamic programming.


(or maximin). The objective is, itself, the value of another mathematical program:

LaTeX: f(x) = \inf \{f(x, y): y \in Y(x),\, g(x, y) \ge 0, \, h(x, y) = 0\}.

Then, the objective is to maximize LaTeX: f(x) (on some feasible region).

Memetic algorithm

This a population-based approach for heuristic search in optimization problems. It uses the crossover operation of genetic algorithms, but combines it with local search heuristics. Implementations are typically parallel with an MIMD architecture.


This is a general framework for heuristics in solving hard problems. The idea of ``meta is that of level. An analogy is the use of a meta-language to explain a language. For computer languages, we use symbols, like brackets, in the meta-language to denote properties of the language being described, such as parameters that are optional. Examples of metaheuristics


Besides parameters that define specific algorithms with each framework, these metaheuristics also require some concept of representation, which is problem dependent. Also see GRASP.

Method of centers

This is an interior method defined for a mathematical program without equality constraints with a nonempty strict interior. Considering LaTeX: \max \{ f(x) : x \in X, \, g(x) \le 0 \}, define

LaTeX: G(w, x) = \min \{f(w)-f(x), -g_1(w), \ldots,-g_m(w)\}

Observe that LaTeX: G(w, x) > 0 if, and only if, LaTeX: f(w) > f(x) and LaTeX: g_i(w) < 0 for all LaTeX: i. Then, the algorithm map is LaTeX: \arg\max \{G(w,x): w \in \mathbb{R}^n\}.


A nonnegative, real-valued function, LaTeX: d, on pairs from a set LaTeX: S such that for each LaTeX: x, LaTeX: y, and LaTeX: z in LaTeX: S:

  1. LaTeX: d(x, y) \ge 0;
  2. LaTeX: d(x, y) = 0 \Rightarrow x=y;
  3. LaTeX: d(x, y) + d(y, z) \ge d(x, z).

The function is also called a distance; the pair LaTeX: (S, d) is a metric space. Condition (3) is called the Triangle inequality. A space is metrizable if a metric can be defined. A metric is induced by a norm if one exists: LaTeX: d(x,y) = \| x - y \|. See Hausdorff metric.

Min-conflicts heuristic

The min-conflicts heuristic is a local search or heuristic repair method (see heuristic search) for solving CSPs. This heuristic randomly selects a variable in the scope of a violated constraint and assigns it to the value in its domain that minimizes the number of violated constraints. If there are more than one such value, it chooses randomly among them.


In a partially ordered set, a minimal element is one that does not follow another in the ordering. This is not to be confused with minimum.

Another definition pertains to a subset with respect to some property such that no proper subset has that property. In the 0-1 knapsack problem the set of items not taken is minimal with respect to the greedy algorithm of adding the most valuable item, if it fits, until there are no more items than can fit. This does not necessarily produce a maximum objective value, but the converse is certainly true:  in order that LaTeX: x be an optimal solution, LaTeX: \{j: x_j=0\} must be minimal with respect to adding items that fit. See maximal for analogy and numerical example.

Minimal inequality

In integer programming, a valid inequality is minimal if it not dominated by any valid inequality. Originally, this was limited to not being able to decrease any coefficient and remain valid. For example, suppose LaTeX: 2x_1 + x_2 \ge 1 is a valid inequality. Then, if we decrease the first coefficient to obtain LaTeX: x_1 + x_2 = 1, either this is not valid or it dominates the former, rendering it non-minimal.

More generally, suppose LaTeX: a^Tx \ge b is a valid inequality, and we consider LaTeX: (a',b') such that LaTeX: a' \le t a and LaTeX: b' \ge t b for some positive LaTeX: t such that LaTeX: (a',b') \neq t (a,b). If LaTeX: (a')^T x \ge b' is a valid inequality, it dominates the original one because

\{ x \in \mathbb{R}^n : x \ge 0, \, (a')^T x \ge b \} \subseteq \{ x \in \mathbb{R}^n : x \ge 0, \, a^T x \ge 0 \}.

For example, LaTeX: 4x_1 + 2x_2 \ge 3 dominates LaTeX: 2x_1 + x_2 \ge 1 (use LaTeX: t = 2), so if this is valid, LaTeX: 2x_1 + x_2 \ge 1 cannot be minimal. Every facet-defining inequality is minimal, but not conversely.


The objective function in a mathematical program whose sense of optimization is to minimize.


The objective is, itself, the value of another mathematical program:

f(x) = \sup \{f(x, y): y \in Y(x),\, g(x, y) \ge 0, \, h(x, y) = 0\}.

Then, the objective is to minimize LaTeX: f(x) (on some feasible region).

Minimax theorem

Proven by von Neumann in 1928, this is a cornerstone of duality (and of game theory). It states that there exists LaTeX: x \in S_m and LaTeX: y \in S_n such that LaTeX: x^T A y is a saddlepoint of the bilinear form:

LaTeX: \min \left\{ \max \{ u^T A v: v \in S_n\}: u \in S_m \right} = 
\max \left\{ \min \{ u^T A v: u \in S_m\}: v \in S_n \right\} = x^TAy.

This extends to the following.

Let LaTeX: F:X\timesY\rightarrow \mathbb{R} be such that LaTeX: X and LaTeX: Y are non-empty, convex, compact sets, LaTeX: F(.,y) is convex on LaTeX: X for each LaTeX: y \in Y, and LaTeX: F(x,.) is concave on LaTeX: Y for each LaTeX: x \in X. Then, there exists a saddlepoint, LaTeX: (x^*,y^*) \in X\times Y such that

LaTeX: \min \left\{ \max \{F(x,y): y \in Y\}: x \in X\right\} 
= \max \left\{ \min \{F(x,y): x \in X\}: y \in Y\right\} = F(x^*,y^*).


(pl. minima). A feasible point at which the infimum is achieved. (See Weierstrass' Theorem.) An LaTeX: \varepsilon-minimum is within e of being a minimum: LaTeX: f(x) \le f^* + \varepsilon, where LaTeX: f^* is the infimum and LaTeX: \varepsilon > 0.

Minkowski inequality

The Minkowski inequality is

LaTeX: \|x + y\|_p \le \|x\|_p + \|y\|_p,

where LaTeX: \|\cdot \|_p denotes the LaTeX: L_p norm and LaTeX: p \ge 1. Equality holds if, and only if, LaTeX: x = \alpha y for some scalar, LaTeX: \alpha.

Mixed-integer program

A mixed-integer program (MIP) is a mathematical program in which some of the variables are required to be integer-valued. Historically, this term implied the mathematical program was otherwise linear, so older papers (and some recent ones) mean this, but most now refer to the following:

MILP: Mixed integer linear program
MINLP: Mixed integer nonlinear program
MIQP: Mixed integer quadratic program
Also see combinatorial program.

Modified Newton method

The Newton method is designed to find the root of a function, say LaTeX: \nabla f(x)=0, by the algorithm map LaTeX: A(x) = x - \nabla^2 f(x)^{-1} \nabla f(x). This need not converge, so a modification is to use line search, resulting in the algorithm map:

LaTeX: A(x) = x - s*\nabla^2 f(x)^{-1} \nabla f(x),

where LaTeX: s^* \in \arg\max \{f(x - s\nabla^2 f(x)^{-1} \nabla f(x)): s \ge 0\}. More generally, we could have another step size rule; as long as it is chosen to converge, the modified algorithm is sometimes called the damped Newton method.


A set, LaTeX: M, defined on rational vectors, such that: (1) LaTeX: 0 \in M, and (2) if LaTeX: v, LaTeX: w are in LaTeX: M, then LaTeX: v+w is in LaTeX: M. The monoid is integral if it contains only integer vectors. One can think of a monoid as a discrete analogue of a convex cone.

Monotonic function

A function that either never decreases or never increases. A non-decreasing, or isotonic, function satisfies: LaTeX: f(x') \ge f(x) whenever LaTeX: x' \ge x (it is strictly increasing if LaTeX: f(x') > f(x) for LaTeX: x' > x). A non-increasing, or anatonic, function satisfies: LaTeX: f(x') \le f(x) whenever LaTeX: x' \ge x (it is strictly decreasing if LaTeX: f(x') < f(x) for LaTeX: x'> x). This extends to a vector function, where LaTeX: \mbox{range}(f) is in LaTeX: \mathbb{R}^n: LaTeX: (f(x)-f(x))^T (x-y) \ge 0.

Monte Carlo optimization

A class of algorithms that seek a maximum by sampling, using a pseudo-random number generator. One example is simulated annealing.

Moore-Penrose inverse

See generalized inverse.

More for less paradox

This was originally introduced in the context of linear network flows:

It is possible to send more flow from supply to demand nodes at lower cost, even if all arc costs are positive.

See Myths and Counter Examples in Mathematical Prgramming for a non-network example.


A modelling and solving environment and language, which was designed to fit with Xpress-MP.

Multi-commodity flow

A network flow problem in which more than one commodity share arc capacities. This applies to communication networks as well as to shipments of different grains on barges of limited capacities. It is more complicated than the single-commodity flow, generally NP-complete (except for some special cases). See [1] for one of the complexities.

Multi-stage decision process

See dynamic programming and the recourse model in stochastic programming.

Multilevel program

The "level" refers to sets of variables. A bilevel program has two sets:

LaTeX: \min \{ f(x, y): x \in X,\, y \in Y, \, h(x, y)=0, \, g(x, y)=0\}.

A reason for identifying levels is to apply a decomposition principle for algorithm design. One example is the bilinear program. Another is when one set of variables is constrained to be a solution to an inner optimization problem:

LaTeX: \min \{ f(x, y): x \in X,\, y \in Y,\, h(x, y)=0,\, g(x, y)=0,\, 
y \in \arg\max \{F(x, y): y \in S(x)\} \},

where LaTeX: S(x) is some subset of LaTeX: Y.

Multiple objectives

Headline text

The problem has more than one objective function. Since these do not lie in a totally ordered set, a solution is often defined as an efficient point (sometimes called a Pareto optimum): LaTeX: x^* is feasible, and there does not exist another feasible point, LaTeX: x, such that LaTeX: f(x) \ge f(x^*) and LaTeX: f_i(x) > f_i(x^*) for some LaTeX: i, where LaTeX: i indexes the objective functions, and we assume we are maximizing. This reduces to the usual definition of an optimum when there is only one objective.

There have been two principle approaches to solving a multiple objective mathematical program (MOMP):

  1. weighted sum: Maximize LaTeX: w^T f(x), where LaTeX: w > 0.
  2. lexicographically: LaTeX: F_1 = \max \{f_1(x)\} and for LaTeX: i > 1, LaTeX: F_i = \max \{f_i(x): f_k(x) = F_k \mbox{ for } k=1, \ldots, i-1\}. This results in LaTeX: x^* for which LaTeX: f(x^*) is lexicographically greater than (or equal to) any feasible solution. (Note: the order is fixed in advance.)

Both methods yield an efficient point (if one exists). Under certain assumptions, both methods can be used to generate the set of all efficient points, called the efficient frontier. (Care must be exercised in doing so, however; for example, see MOP Myth #2 for issues that arise when varying the weights of the objectives.)

Mutation operation

See genetic algorithm.

Myopic optimization

Given any partial solution, assign another value that improves the objective the most. (Algorithms that produce solutions based on myopic optimization are called greedy algorithms.)


This is a heuristic for the TSP, though it can apply more generally to a problem that seeks a subgraph with all nodes. It is a local search, which considers a move by removing n edges from a trial solution, then replacing them with some other n edges. In TSP, the 2-Opt neighbors of the tour LaTeX: \textstyle (1,2,\dots,n) are pair-wise exchanges. The set of exchanges, however, is larger than the set of 2-Opt neighbors. For example, consider LaTeX: n=4, so the original tour is LaTeX: \textstyle (1,2,3,4). The six pair-wise exchanges of this permutation are:

      (2,1,3,4):  (1)*--(2)          (3,2,1,4):  (1)*--(2)
                    \  *                          |     *
                     \/                           |     |
                     /\                           |     |
                    /  *                          *     |
                  (4)*--(3)                      (4)--*(3)

      (4,2,3,1):  (1)   (2)          (1,3,2,4):  (1)   (2)
                   |*  * |                        |*  * |
                   | \/  |                        | \/  |
                   | /\  |                        | /\  |
                   */  \ *                        */  \ *
                  (4)   (3)                      (4)   (3)

      (1,4,3,2):  (1)*--(2)          (1,2,4,3):  (1)--*(2)
                   |     *                         *  /  
                   |     |                          \/   
                   |     |                          /\   
                   *     |                         *  \  
                  (4)--*(3)                      (4)--*(3) 

We see duplicates due to the equivalence of any cyclic permutation. Also, there are only two 2-Opt moves if the TSP is symmetric:

                  (1)---(2)                      (1)   (2)
                    \  /                          |\  / |
                     \/                           | \/  |
                     /\                           | /\  |
                    /  \                          |/  \ |
                  (4)---(3)                      (4)   (3) 

For LaTeX: n=2, the replacement is unique - that is, once two edges are chosen for removal, there is only one replacement. This is not true for LaTeX: n > 2, and part of the heuristic is to decide what replacement to make.


Problems are divided into two categories: those for which there exists an algorithm to solve it with polynomial time complexity, and those for which there is no such algorithm. We denote the former class of problems by LaTeX: P. There are problems for which no known algorithm exists that solves it in polynomial time, but there is also no proof that no such algorithm exists. Among these problems that are not known to be in LaTeX: P (or in LaTeX: ~P), there is a subclass of problems known as NP-complete: those for which either all are solvable in polynomial time, or none are. Formally, a problem is NP if there exists an algorithm with polynomial time complexity that can certify a solution. For example, it is not known whether there exists a polynomial algorithm to solve a system of Diophantine equations, LaTeX: \textstyle Ax=b \mbox{ for } x \in \mathbb{Z}^n (integer n-vectors). However, we can certify any trial LaTeX: x in polynomial time, just by checking that it is in LaTeX: \textstyle \mathbb{Z}^n, then multiplying by LaTeX: A to compare with LaTeX: b. A problem, LaTeX: p, is NP-complete if it is NP and for any problem in NP, there exists a polynomial time algorithm to reduce it to LaTeX: p. A fundamental member of the NP-complete class is the satisfiability problem. It is an open question whether LaTeX: P=\mbox{NP}, and most regard the NP-complete problems as having exponential time complexity. Further details are in the supplement, Introduction to Computational Complexity.


An optimization problem that relies upon the solution of an NP-complete problem. In that sense, NP-hard problems are at least as hard as NP-complete problems.

Near optimal

A point LaTeX: x that is within a small value, say LaTeX: e > 0, of optimality in some sense. The following are the common measures of nearness:

  • in value: LaTeX: x feasible and LaTeX: 
f(x) \ge f(x^*) - e.
  • in policy: LaTeX: \textstyle ||x - x^*|| \le e.
  • in resource level: LaTeX: 
x \in \argmax \left \{f(y): y \in X, g(y) \le b, h(x) = c \right \},
where LaTeX: \textstyle ||(b, c)|| \le e.

Nearest neighbor algorithm

A type of greedy algorithm for combinatorial programs where there is measure of nearness between neighbors. An example is the traveling salesman problem, where the procedure is to choose the next city to be one that is nearest the current city in the sequence.

Negative definite matrix

LaTeX: x^TAx < 0 for all nonzero LaTeX: x.

Negative semi-definite matrix

LaTeX: x^TAx \le 0 for all LaTeX: x.


For a normed space, the neighborhood of a point, LaTeX: x, is the open ball centered at LaTeX: \textstyle x: \left \{y: ||y-x|| < e \right \}, where LaTeX: e > 0. The closed ball, with LaTeX: \textstyle ||y-x|| \le e, is also a neighborhood. The usual notation (in this glossary) is LaTeX: N_e(x), and context dictates whether it is open or closed. This extends to the neighborhood of a set LaTeX: S by taking the union; equivalently, LaTeX: \textstyle N_e[S] = \left \{y: ||y-x|| \le e \mbox{ for some } x \in S \right \}.

In integer programming, the neighborhood could mean integer-valued neighbors of the form LaTeX: \textstyle |x-x^*| \le 1. In a combinatorial program, where variables are binary, integer-valued neighbors comprise all members of LaTeX: \textstyle \{0,1\}^n. In this case the neighborhood is defined relative to some subset of binary variables reachable by some operation that depends on the problem. This is what is generally meant by a neighborhood in heuristic search in general, and simulated annealing or tabu search in particular, where a move is defined in terms of neighbors. In that context, a neighborhood could be as simple as complementing the value of one variable, as deletions or additions in a knapsack problem, or it could be more complex, like a [[n-Opt|]]-Opt neighbor of a TSP tour.

Nelder-Mead simplex method

This is a heuristic search (not guaranteed to find a solution) for unconstrained optimization. Let LaTeX: \textstyle \left \{x^0, x^1, \dots, x^n \right \} be LaTeX: n+1 points in LaTeX: \mathbb{R}^n that define a simplex. Define the extreme values: LaTeX: \textstyle f(x^u) = \max \left \{f(x^0), \dots, f(x^n) \right \} and LaTeX: \textstyle f(x^l) = \min \left \{f(x^0), \dots, f(x^n) \right \}. Seeking a maximum of LaTeX: \textstyle f \in \mathbb{R}^n, the idea is to replace LaTeX: x^l with a point having a better objective value.

    Here is an iteration:
  1. Define the centroid of the simplex without this point of least value: LaTeX: \textstyle c = \sum_i \left \{x^i: i \ne l \right \}/n.
  2. reflection: compute LaTeX: \textstyle r = c + a(c - x^l), where LaTeX: a > 0 (LaTeX: a is called the "reflection coefficient").
  3. expansion: If LaTeX: \textstyle f(r) > f(x^l) (i.e., we have a better point), compute LaTeX: \textstyle x = c + b(c - x^l), where LaTeX: b > 1 (LaTeX: b is called the "expansion coefficient"). If LaTeX: \textstyle f(x) > f(r), x replaces LaTeX: x^l, and the iteration is complete. Otherwise, LaTeX: r replaces LaTeX: x^l, and the iteration is complete.
  4. At this step, the reflection point LaTeX: (r) is not better than the worst of the simplex vertices (i.e., LaTeX: \textstyle f(r) \le f(x^l)). This is divided into the following cases.
    1. If LaTeX: \textstyle \min \left \{f(x^i): i \ne l \right \} \le f(r) \le f(x^u), replace LaTeX: x^l with LaTeX: r.
    2. If LaTeX: \textstyle \min \left \{f(x^i): i \ne l \right \} > f(r), define LaTeX: x^* as the better of the two: LaTeX: \textstyle f(x^*) = \max \left \{f(x^l), f(r) \right \} (with LaTeX: \textstyle x^* = x^l \mbox{ or } r, resp.). Then, take a contraction step: LaTeX: \textstyle x = c + d(x^* - c), where LaTeX: 0 < d < 1 (LaTeX: d is called the "contraction coefficient"). If LaTeX: \textstyle f(x) > f(x^*), x replaces LaTeX: x^l, and the iteration is complete. Otherwise, the last resort is to replace all LaTeX: x^i with: LaTeX: \textstyle x'^i = (x^i + x^u)/2.


A collection of nodes, V, sometimes called vertices, plus a collection of arcs, A, which are directed from one node to another. The two sets form a network, denoted LaTeX: N=[V,A]. As such, it can be considered a directed graph (see other terms, like special graphs).

  • Here are associated functions and data values:
  • tail of k-th arc LaTeX: (i, j) is node LaTeX: i; we sometimes write LaTeX: T(k).
  • head of k-th arc LaTeX: (i, j) is node LaTeX: j; we sometimes write LaTeX: H(k).
  • in-degree of node LaTeX: i is LaTeX: \textstyle |\left\{k: H(k)=i\right\}|.
  • out-degree of node LaTeX: i is LaTeX: \textstyle |\left\{k: T(k)=i\right\}|.
  • arc capacity limits the total flow across the arc at any one time.
  • node capacity limits the total flow through a node at any one time.
  • supply or demand at a node provides external input or an output requirement.

Network flows

This is an assignment of arc values, called flows, say LaTeX: f(k) for the k-th arc, that satisfy two types of constraints: (1) arc bounds, LaTeX: \textstyle L \le f \le U, and (2) node balances, LaTeX: \textstyle \mbox{Flow out of node } i - \mbox{ Flow into node } i = b(i). The flow out of node LaTeX: i can be expressed as LaTeX: \textstyle \sum_k \left \{f(k): T(k)=i \right \}, and the flow into node LaTeX: i can be expressed as LaTeX: \textstyle \sum_k \left \{f(k): H(k)=i \right \}.

If LaTeX: b(i) < 0, -b(i) is a supply at node LaTeX: i (called a supply node); if LaTeX: b(i) > 0, b(i) is a demand at node LaTeX: i (called a demand node). If LaTeX: b(i)=0, node LaTeX: i is simply a transshipment node, and the balance equation says that the flow into node LaTeX: i must equal the flow out of node LaTeX: i. Another way to express the node flow balance equations is with the node-arc incidence matrix: LaTeX: Mf=b.

Still another representation is to define flow variables, LaTeX: x(i, j) on LaTeX: \textstyle V \times V, which describes how much flow goes from node LaTeX: i to node LaTeX: j. This can be used as long as there are no parallel arcs - i.e., no two arcs have both the same tail and the same head. (In some applications, parallel arcs are needed, such as to increase capacity across a pair of arcs with an increased cost.) In this form, the capacity constraints are still of the form LaTeX: \textstyle L \le x \le U, but the node equations have a different form:

\sum_j \left \{x(i, j): (i, j) \in A \right \} - \sum_j \left \{x(j, i): (j, i) \in A \right \} = b(i) \mbox{ for all } i \in V.

The (linear) min cost network flow problem is to minimize total cost, LaTeX: \textstyle \sum_{ij} \left \{c(i,j)x(i,j): (i,j) \in A \right \}, where LaTeX: c is a unit cost of flow, subject to the flow bounds and balance equations.

Neural network

(also called artificial neural network, abbr. ANN). A network where the nodes correspond to neurons and the arcs correspond to synaptic connections in the biological metaphor. The following properties are included:

neural state. Each node has a state variable, say x. In the brain, this could be the potassium level; in computing applications, it could be anything the modeler chooses.
arc weight. Each arc has a weight that affects the state of its neighboring nodes when firing. If the weight is positive, it said to be excitatory; if it is negative, it is inhibitory.
state equations. The neural states change by some differential (or difference) equation, say LaTeX: \textstyle x' = F(x,w,t). Typically (but not necessarily), LaTeX: -F is the gradient of an energy function (in keeping with the biological metaphor), say LaTeX: \textstyle F(x,w,t) = -\nabla x[E(x,w,t)], so that LaTeX: x(t) follows a path of steepest descent towards a minimum energy state.
learning mechanism. This could be equations to change the weights: LaTeX: \textstyle w' = L(x,w,t). Various learning mechanisms are represented this way, including a form of supervised learning that uses a training set to provide feedback on errors. Other elements can be learned besides the arc weights, including the topology of the network.

New term

Hello test, Freeman

Newsboy problem

A newspaper is concerned with controlling the number of papers to be distributed to newstands. The cost of a paper varies (i.e., Sunday vs. daily), and the demand is a random variable, LaTeX: q, with probability function LaTeX: P(q). Unsold papers are returned, with no salvage value the next day, losing millions of dollars in the production cost. The demand for newspapers is a random variable, with probability function LaTeX: P(q) = probability that demand equals LaTeX: q. It is possible, however, for a newstand to order more papers the same day. There are holding and shortage (penalty) costs. The problem is to determine a reorder policy so as to minimize total expected cost. This problem was used to consider a reorder policy with a 2-parameter decision rule:

  1. LaTeX: s = inventory level at which an order is placed;
  2. LaTeX: S = inventory level to which to order.

Then, the decision rule to be employed is the following policy:

Order nothing if the inventory of papers is LaTeX: \ge s;
Order LaTeX: S-s if there are s papers on hand and LaTeX: s < S.

The significance of this problem is that it was used to introduce the notion (and optimality) of an LaTeX: (s, S) policy in inventory theory.

Newton method

(also called Newton-Raphson method). This is the iterative scheme to find a zero of a function:

x' = x - J(x)^{-1} F(x),

where LaTeX: \textstyle F:\mathbb{R}^n \to \mathbb{R}^n LaTeX: \textstyle (F \in C^1) and LaTeX: J(x) is the jacobian of LaTeX: F (assumed nonsingular). In mathematical programming this is used to find a stationary point, where LaTeX: F=\nabla f and LaTeX: J=H_f. Lacking global convergence, this leads to the modified Newton method, sometimes called the damped Newton method. (See the associated myth, [[./myths/myth-NLP.html#13|Myth NLP-13]] to avoid misconception.)

No-free-lunch theorem

In heuristic search, this is the proposition that all methods perform the same when averaged over all possible objective functions. The idea is that a particular search algorithm, like simulated annealing, may be designed to perform especially well for some functions, compared to a genetic algorithm, but when applied to a representative sample of all possible costs, they will perform exactly the same, on the average. This implies that to do especially well on some problems, the search method must do worse on others; hence, the "no-free-lunch" description.


A partial assignment of variables to values that does not lead to a solution. In other words, there does not exist a solution to the overall problem that satisfies all the assignments in the no good.

In a backtracking search to find a solution, each dead-end corresponds to a no good. However, where no-goods become useful is when they can be learned and added to the constraint problem as implied constraints that remove many dead-ends that would otherwise have to be searched over.

In particular, no-good learning and reasoning are very important for modern techniques to solve SAT problems.

Node consistency

A simple consistency property concerning unary constraints: a variable is node consistent if all values within its domain are consistent with the all unary constraints on the variable.

Node packing problem

See Packing problem.


A variable or column that is not basic.


A nondegenerate solution is one that is not degenerate. A nondegeneracy assumption is one that ensures every optimal solution is not degenerate. In general, a nondegenerate solution is strictly complementary. In linear programming, nondegeneracy is equivalent to uniqueness in both the primal and the dual.

Nonlinear program

At least one of the functions (objective or constraint) is not affine on LaTeX: X. (Usually, it is the rule of the function that classifies it as nonlinear. In particular, a linear integer program is not generally considered to be a nonlinear problem (NLP).


This is a function of a vector, say LaTeX: ||x||, that satisfies three properties:

  1. Homogeneous: LaTeX: \textstyle ||tx|| = |t| ||x|| \mbox{ for all (scalars)}, t.
  2. Positive: LaTeX: \textstyle ||x|| > 0 \mbox{ for } x \ne 0. (Note: LaTeX: ||0|| = 0 by homogeneity, so LaTeX: 0 is the unique vector with zero norm.)
  3. Subadditive: LaTeX: \textstyle ||x + y|| \le ||x|| + ||y||.

Norms that arise frequently in mathematical programming are:

Euclidean norm (on LaTeX: \mathbb{R}^n): LaTeX: \textstyle ||x|| = \sqrt{\sum_j x_j^2}

L_inf (on LaTeX: \mathbb{R}^n): LaTeX: \textstyle ||x|| = \max_j\{|x_j|\} (= \lim L_p \mbox{ as } p \to \infty)

L_p LaTeX: \textstyle (\mbox{on } \mathbb{R}^n, \mbox{ for } p \ge 1): ||x|| = [\sum_j |x_j|^p]^{1/p}

Matrix norm (induced by vector norm): LaTeX: \textstyle ||A|| = \max \left \{||Ax||: ||x||=1 \right \}

Supremum norm (on function space): LaTeX: \textstyle ||F|| = \sup \left \{|F(x)|: x \in X \right \}

Normal cone

For a convex set LaTeX: S, the normal cone to LaTeX: S at LaTeX: x, for LaTeX: \textstyle x \in S, is LaTeX: \textstyle \left \{y : <v - x, y> \le 0 \mbox{ for all } v \in S \right \}, where LaTeX: < * > is an inner product.

Northwest corner rule

This constructs a feasible shipment for the transportation problem, from the cost table (LaTeX: c_{ij} = unit cost from source LaTeX: i to destination LaTeX: j), as follows. Start at the NW corner (source 1, destination 1), and allocate the most possible: the min of the supply and demand. If supply exceeds the demand, proceed to the next destination, and continue until all of supply 1 is allocated. Then, go to source 2 and repeat the allocation process, starting with the first (lowest index) destination whose demand has not been fulfilled. If demand exceeds the supply, proceed to the next source, and continue until all of demand 1 is allocated. Then, go to destination 2 and repeat the allocation process. Eventually, all demand must be fulfilled if the problem is feasible (i.e., total supply LaTeX: \ge total demand).

Null space

Null space of matrix LaTeX: A. LaTeX: \textstyle \left \{x: Ax=0 \right \}. It is the orthogonal complement of its row space, LaTeX: \textstyle \left \{x: x=yA \mbox{ for some } y \in \mathbb{R}^m \right \}. Its dimension is LaTeX: n - \mbox{rank}(A), which complements the subspace spanned by its row space.

Numeric CSP

A constraint satisfaction problem is numeric if the constraints are numeric relations (e.g., log, sin, power, etc.) and the decision variables' domains are continuous.

Objective function

The (real-valued) function to be optimized. In a mathematical program in standard form, this is denoted LaTeX: f. Also see multiple objectives.

Online problem

A decision problem that is ongoing, so its data values are frequently changing with uncertainty. Here are some examples:

  1. Allocation of bandwidth during video conferencing on a telecommunications network;
  2. Investment decisions in a portfolio, selling and buying as the market changes;
  3. Scheduling jobs as they arrive;
  4. Load forecasting and dispatching for electric power.

In general, an online decision problem could be the allocation of resources that is done in real time with changes in the amount of resources, their costs and the demands for what they produce. A problem that is not online is sometimes called an offline problem.


For a mathematical program in standard form, LaTeX: \textstyle x^* \in X (the domain) is an optimal solution if it is a maximum (or a minimum):

  1. LaTeX: x^* is feasible;
  2. LaTeX: f(x^*) \ge f(x) for all feasible LaTeX: x (maximum value).

Some authors refer to an optimal solution when they mean a local optimum; others mean a member of the optimality region (which are global optima). In either case, the optimal value is the objective value, evaluated at an optimal solution. A solution is nearly optimal if it is feasible, but the optimality condition (2) is replaced by

f(x^*) \ge f(x) - t \mbox{ for all feasible } x,

where LaTeX: t > 0. (LaTeX: t = 0 corresponds to being optimal). Typically, LaTeX: t is specified as a small fraction, such as a cut-off tolerance for an algorithm to terminate finitely.

Optimal partition

Consider a primal-dual pair of linear programs:

\min \left \{cx: x \ge 0, Ax \ge b \right \} \mbox{ and } \max \left \{yb: y \ge 0, yA \le c \right \}.

Define primal surplus variables, LaTeX: s = Ax - b, and dual slack variables, LaTeX: d = c - yA. For any non-negative vector, LaTeX: v, define its support set as those indexes for which the coordinate value is positive: LaTeX: \textstyle I(v)= \left \{j: v_j > 0 \right \}. In an optimal solution, complementary slackness must hold, so that LaTeX: \textstyle j \in I(x) implies LaTeX: j \notin I(d), and LaTeX: \textstyle i \in I(s) implies LaTeX: i \notin I(y). Thus, the intersections, LaTeX: \textstyle I(x) \land I(d) and LaTeX: \textstyle I(s) \land I(y), are empty. If the solution is strictly complementary, these support sets yield partitions because then LaTeX: \textstyle I(x) \lor I(d) = \left \{1, \dots, n \right \} and LaTeX: \textstyle I(s) \lor I(y) = \left \{1, \dots, m \right \}. A strictly complementary solution is an interior solution (and conversely), so each interior solution yields a partition of these index sets.

A key fact is that every LP with an optimal solution must have an optimal interior solution. Further, every optimal interior solution yields the same partition, so it is called the optimal partition.

Optimal response function

The optimal value of the objective as a function of parameters, typically the right-hand side:

f^*(b, c) = \sup \left \{f(x): x \in X, g(x) \le b, h(x) = c\right\}.

Optimality gap

Generally the difference between a best known solution, e.g. the incumbent solution in mixed integer programming, and a value that bounds the best possible solution. One such measure is the duality gap. The term is often qualified as the absolute gap, which is the magnitude of the difference between the best known solution and the best bound, or as the relative gap, which is the absolute gap divided by the best bound.

Optimality region

Set of optimal points.

Optimization Software Library

(OSL). A library of optimization routines, callable from Fortran or C, produced by IBM.


(pl. optima). Minimum or maximum.

Order of convergence

See convergence.

Orthogonal complement

Orthogonal complement of a subspace, LaTeX: S. LaTeX: \textstyle \left \{y: yx = 0 \mbox{ for all } x \in S \right \}. In particular, if LaTeX: \textstyle S = \left \{x: x = Av \mbox{ for some } v \in \mathbb{R}^n \right \}, where LaTeX: A is an LaTeX: \textstyle m \times n matrix, its orthogonal complement is LaTeX: \textstyle \left \{y: yA = 0 \right \}.

Orthogonal matrix

A matrix, LaTeX: A, such that LaTeX: \textstyle AA^T=I.

Orthogonal vectors

Two vectors in LaTeX: \textstyle \mathbb{R}^n whose inner product is zero.

Out-of-kilter algorithm

Applies to network flows, where the balance equations hold every iteration, but bounds and duality (optimality) conditions could be violated. The dual prices, often called potentials in this context, are modified along with flows so as to move closer to both primal and dual feasibility.

Outer approximation

This solves a sequence of approximations to a mathematical program where the approximating problem contains the original feasible region. Examples are cutting plane algorithms and relaxation.

Over-constrained problem

A term generally meaning that a mathematical program has too many constraints in that either the feasible region can be described with a subset of the constraints or that it is empty.


A term that means the model is too optimistic in expecting an optimal solution to produce the benefits in its objective. For example, a time-staged model generally presumes perfect information over the planning horizon, and can be "over-optimizing" in its allocation of resources.


A square matrix with all of its principal minors positive. This includes all symmetric, positive definite matrices. Here is an example of a P-matrix that is not positive definite:

A = \begin{bmatrix}
1 & -3 \\
0 & 1

The principal minors are positive, but LaTeX: \textstyle (1, 1)A(1, 1)^t < 0. The significance of this class is in the theorem:

The linear complementarity problem, defined by LaTeX: (M, q), has a unique solution for each q in Rn if, and only if, LaTeX: M is a P-matrix.

Packing problem

The set packing problem is the question, "Does a collection of sets contain at least LaTeX: K mutually disjoint subsets?" (LaTeX: K is a positive integer not greater than the number of sets given.) This is solvable in polynomial time if no set has more than 2 members. Otherwise, it has the NP-hard equivalent integer program: LaTeX: \textstyle \max \left \{ \sum_j x_j: Ax \le 1, x \in \left \{0, 1 \right \}^n\right\}, where LaTeX: x_j = 1 if element LaTeX: j is selected; else, LaTeX: x_j = 0. The matrix LaTeX: A has 0's and 1's, where the i-th row corresponds the i-th set to be packed: LaTeX: A_{(i, j)}=1 means element j is in set i; else, LaTeX: A_{(i, j)}=0. The constraint LaTeX: Ax \le 1 means that at most one element of each set can be selected. If the IP maximum is less than LaTeX: K-1, or if it equals LaTeX: K-1 = number of elements, the answer to the set packing problem is "No". Otherwise, let LaTeX: \textstyle x_1, \dots, x_{(K-1)} be among those elements selected (re-index, if necessary). Let LaTeX: S_i be all sets containing LaTeX: x_i. Due to the packing constraints, no selected element can be in two sets, so these are disjoint. The remaining set, LaTeX: S_K, is composed of all other elements LaTeX: \textstyle (x_K, \dots, x_n), and must be disjoint from the other sets, so the answer to the set packing question is "Yes" (and we can construct the disjoint subsets).

A special case is the node packing problem on a graph: find a set of nodes of maximum cardinality such that no node is incident with the same edge (or arc). In that case, every row of LaTeX: A has exactly two 1's, and this is sometimes called [[Degree-2_inequality|]]egree-2 inequalities. Another special case is the bin packing problem.

The IP formulation has the natural extension: LaTeX: \textstyle \max \left \{ cx: Ax \le b, x \in \left \{0,1 \right \}^n \right \}, where LaTeX: \textstyle c \ge 0 and LaTeX: \textstyle b \ge 1. This allows up to LaTeX: b_i elements to be selected from the i-th set, and elements can be weighted by value LaTeX: (c).

Parallel algorithm

This is an algorithm that executes more than one instruction at a time by having a computing environment with more than one processor. The types of parallel computing environments are:

MIMD: Multiple Instruction/Multiple Data (includes massively parallel);
MISD: Multiple Instruction/Sequential Data (unusual);
SIMD: Sequential Instruction/Multiple Data (e.g., array processor).

Parallel tangents

(PARTAN). An algorithm developed from the zigzag phenomenon observed using Cauchy's steepest descent. It takes two gradient steps, then performs a line search on the line through the first and last points LaTeX: \textstyle (x^{k+2} - x^k). (For a quadratic objective, PARTAN is equivalent to the conjugate gradient method.)


A constant in a mathematical program, not subject to choice in the decision problem, but one that could vary outside the control of the decisions. Examples are supplies, demands, loss factors, exponents and coefficients in polynomial functions (of the decision variables). Not all coefficients are parameters, as many are zero by the logic of the model. For example, the only data for a standard transportation problem are the costs, supplies and demands. These can depend upon parameters, but the LP matrix does not – it is the incidence matrix of the network. In general, parameters are data-dependent constants, rather than logically fixed for all instances of the model. Some parameters are simply units of measurement, such as the amount of energy (Btu) in a ton of coal, whereas some parameters are uncertain, like demand for a product.

Parametric analysis

The sensitivity analysis of parameters, which can be vectors in any designated region.

Parametric programming

Solving a family of mathematical programs over a parameter space. For example, consider the family of linear programs:

\min \left \{ cx: Ax = b + td, x \ge 0 \right \},

where t is a (scalar) parameter and d is a specified direction vector. Starting with LaTeX: t=0, the LP is solved, then an optimal basis is found (if possible) that remains optimal as LaTeX: t is increased. The max value of LaTeX: t is determined; if this is finite, the basis changes to a new optimal basis (for the max LaTeX: t value) such that LaTeX: t can be further increased, if possible. This is continued until either LaTeX: t cannot be increased any further or a basis is found that remains optimal on the interval, LaTeX: \textstyle [t_k, \infty), where LaTeX: \textstyle 0 < t_1 < t_2 < \dots < t_k are the break points of the optimal objective value as a function of the parameter.

Pareto optimum

See multiple objectives.

Partial conjugate gradient method

The conjugate gradient method is performed for some number of iterations, say LaTeX: k (< n), then it is restarted. (If LaTeX: k=0, this is the special case of Cauchy's steepest descent.)

Partial quasi-Newton method

A quasi-Newton method is performed for some number of iterations, say LaTeX: k (< n), then it is restarted.

Partial solution

Some of the variables are specified. This arises in a branch and bound algorithm, where some variables have been fixed by the branching process. It also arises in bidding algorithms.

Partially ordered set

(or poset). A set plus a binary relation on that set that is reflexive, antisymmetric and transitive. This arises in the presence of precedence constraints, and other relations that arise in combinatorial programs.

Particle Swarm Optimization

(PSO). This is an evolutionary algorithm based on swarm behavior of animals, like bird flocking. A move from a state is influenced by directions of states in its neighborhood. A consensus function is used to average neighbors' best fitness values, and this is combined with the original state's fitness to obtain a new position for the state. It applies to continuous variables, using a velocity term to derive state increments. The fundamental equation that governs state evolution is

v_p = wv_p + A r (x^*_p - x_p) + B r' (g^* - x_p)

where p is a particle, or state, and

LaTeX: v_p = velocity vector of particle LaTeX: p
LaTeX: x_p = position vector of particle LaTeX: p
LaTeX: x^*_p = previous position of particle LaTeX: p giving best fitness value
LaTeX: g^*_p = position of globally best fitness value
LaTeX: w = parameter, called "inertia weight"
LaTeX: r, r' are pseudo-random numbers in LaTeX: [0,1]
LaTeX: A, B = positive parameters, generally in LaTeX: [1,2]

Partitioning problem

This is the combination of packing and covering. Its IP form is LaTeX: \textstyle \mbox{Opt}\left \{cx: Ax=b, x \in \left \{0,1 \right \}^n \right \}, where Opt could be Min or Max, and LaTeX: A is a 0-1 matrix. The equation LaTeX: \textstyle A(i,.) x = b_i means that exactly LaTeX: b_i elements must be selected from set LaTeX: i. In particular, a multiple choice constraint is LaTeX: \textstyle \sum_j x_j = 1.


As a (differentiable) curve, see path following. As a finite sequence, see graph or network.

Path consistency

A consistency property that is similar to arc consistency but that considers pairs of variables instead of just one. A pair of variables LaTeX: (x,y) is path-consistent with another variable LaTeX: z if for every consistent value pair for the binary constraint on LaTeX: x and LaTeX: y there exists a value for LaTeX: z such that all binary constraints between LaTeX: ({x},z) and LaTeX: (y,z) are satisfied.

Path following

In this context a path is a piecewise differentiable curve in space. The idea is to follow such a path (if one exists) from some initial point to a solution. One example of a path following algorithm is the barrier penalty function, with the path created by the parameter LaTeX: u in the solution, LaTeX: x^*(u) in LaTeX: \textstyle \argmax \left \{f(x) + uP(x): x \in X^0\right\}, where LaTeX: X^0 is the strict interior of the feasible region. (This is called the central path, or the path of the analytic center.)

Pattern search

This is a heuristic (in the sense of not guaranteeing convergence) for unconstrained optimization. It consists of two basic steps, where LaTeX: x is a current point and LaTeX: w is a vector of widths (both having been initialized).

  1. Exploration move. Set LaTeX: y=x and do for LaTeX: \textstyle j=1,\dots,n:
    If LaTeX: \textstyle f(y + w_j e_j) > f(y), replace LaTeX: y with LaTeX: y + w_j e_j.
    If LaTeX: f(y - w_j e_j) > f(y), replace LaTeX: y with LaTeX: y - w_j e_j.
    If LaTeX: x=y at the end of this, go perform a pattern move. Otherwise, set LaTeX: v = y-x (the change) and LaTeX: x = y (move to better point).
  2. Pattern move. If LaTeX: f(x + v) > f(x), replace LaTeX: x with LaTeX: x+v. Otherwise, reduce widths and return to exploration move.

The idea is to perform exploration moves as long as they are successful. Then, saving the last direction LaTeX: (v) when it was successful, this is used to advance LaTeX: x if it is an improvement, giving a new base point LaTeX: (x) for the next series of exploration moves. Termination occurs when the widths become too small.

Penalty function

The traditional concept is to augment the objective with a function and one positive constant, so that the original mathematical program is replaced by solving a parametric family of the form LaTeX: \textstyle \max \left \{f(x) - uP(x): x \in X^0\right\}. The function, LaTeX: P, is called a penalty function if it satisfies LaTeX: P(x) > 0 for LaTeX: x not feasible and LaTeX: P(x)=0 if LaTeX: x is feasible. The set LaTeX: X^0 depends on the type of penalty function, and there are two classical types, each requiring LaTeX: P to be continuous: interior (or barrier) and exterior. A penalty function is exact if there exists a finite parameter value such that its maximum is a solution to the original mathematical program.

More generally, a penalty function could involve many parameters, such as the Lagrangian, or it could be stated in parameter free form. A general model is the generalized penalty-function/surrogate dual, see the supplement on duals. The notion of an exact penalty function leads to a strong dual.


An imprecise term that means changing some parameter value (or function) and seeing what effect this has on the solution. This embodies continuity and differentiability properties of the optimal responses: objective value, policy set, and perhaps dual values too. This is what is sometimes called marginal analysis of solutions, and changes are limited to some neighborhood of the initial values. Some use perturbation function to mean the optimal objective value, even for large changes in the parameters, as in parametric programming.

Phase I and Phase II

Phase I of a mathematical program is finding a feasible solution, and Phase II is entered with a feasible solution to find an optimal solution. The standard Phase I is:

\max \sum_i \{u_i\} + \sum_i \{v_i + w_i\}: u, v, w \ge 0,

g(x) - u \le 0, h(x) - v + w = 0.

Then, for any LaTeX: x, one could set LaTeX: \textstyle u = g(x)^+, v = h(x)^+ and w = -h(x)^- to have a feasible solution to the Phase I mathematical program. The optimal objective value is 0 if, and only if, LaTeX: u=0 and LaTeX: v=w=0, in which case LaTeX: x is feasible in the original mathematical program (to commence Phase II). The Phase I problem is sometimes called an elastic program (thinking of the artificial variables LaTeX: (u,v,w) as providing "elastic" behavior in the levels of constraint violation).

In linear programming, the standard Phase I & II are:

I. Min ev: LaTeX: \textstyle x, v \ge 0, Ax + v = b, and
II. Max cx: LaTeX: \textstyle x \ge 0, Ax = b, \mbox{ where } b \ge 0 \mbox{ (multiply by } -1 \mbox{ for } i: b_i < 0.


This is the algebra associated with an iteration of Gauss-Jordan elimination, using the forward transformation. The tableaux for a pivot on element LaTeX: a(p,q) LaTeX: (\ne 0), which means nonbasic variable LaTeX: x_q enters the basis in exchange for basic variable LaTeX: x_p, are as follows:

Before pivot:

Basic & & \multicolumn{2}{c}{Nonbasic} \\
Var. & Level & x_j & x_q \\
\hline \hline
x_i & b_i & a(i,j) & a(i,q) \\
x_p & b_p & a(p,j) & a(p,q)^* \\
\hline \hline
obj & -z & d_j & d_q  \\
\hline \hline

After pivot:

Basic & & \multicolumn{2}{c}{Nonbasic} \\
Var. & Level & x_j & x_q \\
\hline \hline
x_i & b_i - b_p a(i,q)/a(p,q) & a(i,j) - a(i,q)a(p,j)/a(p,q) & -a(i,q)/a(p,q) \\
x_q & b_p/a(p,q) & a(p,j)/a(p,q) & 1/a(p,q) \\
\hline \hline
obj & -a - b_p d_q/a(p,q) & d_j - a(p,j)d_q/a(p,q) & -d_q/a(p,q) \\
\hline \hline

A pivot is primal degenerate if the associated basic solution LaTeX: (x) does not change (i.e., the nonbasic variable enters the basis, but its level remains at the same bound value, in which case no basic variable changes level). Similarly, the pivot is dual degenerate if the associated dual solution (i.e., pricing vector and reduced costs) does not change. For dealing with degenerate pivots, see Bland's rule and the TNP rule.

Pivot selection

In the simplex method, this is a tactic to select a basis exchange. The incoming column is based on its effect on the objective function, and the outgoing column is based on its effect on feasibility.

Point-to-set map

Image of map is a set. If LaTeX: \textstyle F:X< \to 2^Y, this means LaTeX: F(x) is a subset of LaTeX: Y for each LaTeX: x \in X. An example is the feasibility map.

Pointed cone

A convex cone, LaTeX: C, such that LaTeX: \textstyle C \cap (-C) = \{0\}. An example is LaTeX: \textstyle C=R^{n+} (so LaTeX: \textstyle -C=\mathbb{R}^{n-}).

Polar cone

Polar cone, of a set LaTeX: \textstyle S \in \mathbb{R}^n. \left \{(y, y^0)\in \mathbb{R}^{n+1}: yx \le y^0 \mbox{ for all } x \in S \right \}.

Policy iteration

This is an algorithm for infinite horizon dynamic programs (generally stochastic) that proceeds by improving policies to satisfy the fundamental equation:

F(s) = r^*(s) + \sum_{s'} P(s,s') F(s'),

where LaTeX: F(s) is the maximum expected value when starting in state LaTeX: s, r^*(s) is the immediate expected return when in state LaTeX: s and following an optimal policy (a decision rule), and LaTeX: P(s,s') is probability of a transition from state LaTeX: s to state LaTeX: s' in one time period.

The algorithm has some policy, LaTeX: x^*(s), at the beginning of an iteration. This determines an approximation of LaTeX: F, which is exact if the equation holds. If the equation is violated, the violation identifies how the policy can be improved. This changes the policy for the next iteration. Convergence depends upon the underlying Markov process (e.g., whether it is ergodic). Another approach is value iteration.

Polyhedral annexation

This is a cutting plane approach to find an optimal solution known to lie at an extreme point of a polyhedron, LaTeX: P. The general algorithm is to start at some extreme point and solve the polyhedral annexation problem. This will result in ascertaining that the extreme point is (globally) optimal, or it will generate a recession direction from which a convexity cut is used to exclude the extreme point. The approach generates a sequence of shrinking polyhedra by eliminating the cut region. Its name comes from the idea of annexing a new polyhedron to exclude from the original, homing in on the extreme point solution.


When applied to reverse convex programming, the convex set LaTeX: (S) in the polyhedral annexation problem is the objective level set, LaTeX: \{x: f(x) \le f(x^*)-t\}, where LaTeX: {t} > 0 is an optimality tolerance, and LaTeX: x^* is the best (extreme point) solution so far. The initial polyhedron is the feasible region, so this is an inner approximation strategy.

When applied to mixed integer programming, the original polyhedron is the LP relaxation, and the convex set LaTeX: (S) for the polyhedral annexation problem can vary according to the particular problem (exploiting structure, especially to find facets of the feasible region).

Polyhedral annexation problem

Given a convex cone LaTeX: C, a polytope LaTeX: P contained in LaTeX: C, and a compact, convex set LaTeX: S with 0 in intLaTeX: (S), find a point LaTeX: y in LaTeX: P\backslash S, or ascertain that LaTeX: P is contained in LaTeX: S.


(pl. polyhedra). A set that equals the intersection of a finite number of halfspaces. This is generally written as LaTeX: \textstyle \left \{x: Ax \le b \right \}, where the representation LaTeX: (A,b) is not unique. It is often useful to separate the implied equalities: LaTeX: \textstyle \left \{x: Ax \le b, Ex = c \right \}, so that the relative interior is LaTeX: \textstyle \left \{x: Ax < b, Ex = c \right \}. The system, LaTeX: \textstyle \left \{Ax \le b, Ex = c \right \}, is a prime representation if it is irredundant, and it is minimal if it is irredundant and contains no implied equality. A polyhedron is degenerate if it contains an extreme point that is the intersection of more than LaTeX: n halfspaces (where LaTeX: n is the dimension of the polyhedron). An example is the pyramid.


Let LaTeX: N be a finite set and let LaTeX: f be a submodular function on the subsets of LaTeX: N with LaTeX: f(\emptyset)=0. The polymatroid associated with LaTeX: (N,f) is the polytope:

\left \{x \in \mathbb{R}^{n+}: \sum_{j \in S} x_j \le f(S) \mbox{ for all } S \in N \right \}.


A bounded polyhedron.

Pooling of inventory

When there are two or more demand points for a product, it may possible to save money if the separate inventories for these demand points can be combined or "pooled." There is a "square root economy of scale" from pooling for two of the simplest inventory models: a) the Economic Order Quantity (EOQ) model, and b) the Newsboy (NV) model.

For the EOQ case, where we want to minimize the fixed charge of ordering plus the holding cost of inventory carried, if we have two separate inventories for two demand points, each with constant demand rate LaTeX: D, fixed charge of ordering LaTeX: K, and holding cost/unit of inventory LaTeX: h, then the minimum total cost is LaTeX: \textstyle \sqrt{2\, D\, K\, h} at each demand point for a total cost of LaTeX: \textstyle 2\, \sqrt{2\, D\, K\, h}. If we combine the two demand streams and carry a single inventory for them, then the minimum total cost is LaTeX: \textstyle \sqrt{2\, (2\, D)\, K\, h} = 2\, \sqrt{D\, K\, h}, which decreases the total cost by a factor of LaTeX: \textstyle 1/\sqrt{2}.

For the NV case, if we have two demands, each independently Normal distributed with mean LaTeX: D and standard deviation LaTeX: \sigma, a specified service level LaTeX: \textstyle \alpha = \prob \left \{\mbox{Demand} < \mbox{inventory}\right\} at an inventory point, and LaTeX: z_\alpha being the number of standard deviations corresponding to a left tail probability of LaTeX: \alpha, then we must carry inventory LaTeX: \textstyle D + z_\alpha \sigma at each inventory point for a total inventory of LaTeX: \textstyle 2D + 2z_\alpha \sigma. If we can combine the two demands so we carry just one inventory, then the mean and standard deviation for the combined demands are LaTeX: 2D and LaTeX: \sqrt{2}\sigma. Thus, the total inventory we must carry is LaTeX: \textstyle 2D + \sqrt{2}z_\alpha\sigma, and there is a square root economy of pooling in the amount of safety stock we must carry. If we say that the system service level should be LaTeX: \alpha, then the benefits of pooling are even greater.

There is an interesting anomaly wherein pooling can increase inventory. Specifically, for the NV problem, if instead of there being a constraint on service level, the objective is simply to minimize the expected cost of unsatisfied demand and the cost of surplus inventory, and the demand distributions at each of two demand points (with associated inventory) are right skewed, e.g., the mean is greater than the median, then pooling the two inventories may in fact result in a higher total optimal inventory. For example, if the cost/unit short is 5, the cost/unit of inventory left over is 4, the demands at each of two demand points are independent Poisson distributed with mean 3.3, then the optimal amount to stock at each location is 3. If, however, the demands are pooled and a single inventory is carried so the demand is Poisson with mean 6.6, then the optimal amount to stock is 7. So pooling increased the optimal inventory. See Myths and Counter Examples for an additional example.

Pooling problem

A type of blending problem, as follows.

  1. A set of sources, indexed by LaTeX: \textstyle 1,\dots,N_s. Each source supplies raw feeds in limited amounts, such as crude oil.
  2. A set of attributes, indexed by LaTeX: \textstyle 1,\dots,N_a. Each supply contains one or more attributes. An example of an attribute is the percent of sulfur in a crude oil supply.
  3. A set of products, indexed by LaTeX: \textstyle 1,\dots,N_d, for which there is demand; each product has a quality defined by its attributes. For example, if percent sulfur is an attribute, each refined petroleum product has a range of percent sulfur: 1.5% sulfur is a higher quality product than 2.5% percent sulfur.
  4. A set of pools, indexed by LaTeX: \textstyle {1},\dots,N_p. Each pool is formed by inflows from sources, whose attributes are linearly mixed to determine the attributes of the pool, and hence its quality. Pools combine to form products (see Flow of Materials below). For example, suppose a pool receives flow values LaTeX: x and LaTeX: y from two sources whose attribute values are LaTeX: A and LaTeX: B, respectively. Then, the pool's attribute value is the average: LaTeX: \textstyle (Ax+By)/(x+y).
                            Flow of Materials
                Sources          Pools         Products
                         S(s,p)         P(p,d)
      SUPPLY --->(s)------------->(p)------------>(d)--->DEMAND

The objective is to minimize cost, which is the sum of source flow costs, subject to four types of constraints:

  1. Limited supplies: LaTeX: 
\frac{\sum_{1} S(s,1)}{\mbox{Total flow out of source } s \mbox{ (into pools)}} \le \mbox{SUPPLY}(s)

  2. Balances at pools: LaTeX: 
\frac{\sum_s S(s,p)}{\mbox{Total flow into pool (from source)}} - \frac{\sum_d P(p,d)}{\mbox{Total flow out of pool (to products)}} = 0

  3. Quality constraints:
L(p,a) \le Q(p,a) = \frac{\sum_s w(s,a)S(s,p)}{\sum_s S(s,p)} \le U(p,a).
    The numerator is a weighted sum, where the weights (w) are the (given) quality values of attribute a at the sources (s). The denominator is the total flow into the pool (p).

  4. Demand requirements: LaTeX: 
\frac{\sum_p P(p,d)}{\mbox{Total flow into product (from pools)}} \ge \mbox{DEMAND}(p).

Portfolio selection problem

In its elementary form, this is the same as the capital budgeting problem, except that the objective is to minimize the risk, rather than maximize expected return. Let LaTeX: x_j be the percent of capital invested in the j-th opportunity (e.g., stock or bond), so LaTeX: x must satisfy LaTeX: \textstyle x \ge 0 and LaTeX: \textstyle \sum_j x_j = 1. Let LaTeX: v_j be the expected return per unit of investment in the j-th opportunity, so that LaTeX: vx is the sum total rate of return per unit of capital invested. It is required to have a lower limit on this rate: LaTeX: \textstyle vx \ge b (where LaTeX: b is between LaTeX: \min v_j and LaTeX: \max v_j). Subject to this rate of return constraint, the objective is the quadratic form, LaTeX: x^TQx, where LaTeX: Q is the variance-covariance matrix associated with the investments (i.e., if the actual return rate is LaTeX: V_j, then LaTeX: \textstyle Q(i,j) = E[(V_i - v_i)(V_j - v_j)].

Positive definite matrix

The symmetric matrix LaTeX: A is positive definite if LaTeX: \textstyle x^T Ax > 0 for all nonzero LaTeX: x. Equivalently, the matrix is positive definite if all its eigenvalues are positive.

Also see Positive semi-definite matrix

Positive semi-definite matrix

The symmetric matrix LaTeX: A is positive semi-definite if LaTeX: \textstyle x^T Ax \ge 0 for all LaTeX: x.

Also see positive definite matrix.

Postman problem

See Chinese postman problem.


A positive sum of monomials: LaTeX: \textstyle \sum_i c(i) \prod_{j} [x(j)^{a(i,j)}], where LaTeX: c > 0. Each monomial is the product,

\prod_j [x(j)^{a(i,j)}] = x(1)^{a(i,1)} \times x(2)^{a(i,2)} \times \dots \times x(n)^{a(i,n)},

and LaTeX: [a(i, j)] is called the exponent matrix. This is the fundamental function in geometric programming.

Example: LaTeX: \textstyle x(1)x(2) + 2x(3)/x(1)^{2} is a posynomial with 2 monomial terms and 3 variables. The exponent matrix is 2 by 3, showing the exponent in each monomial (row) of each variable (column):

1 & 1 & 0 \\
-2 & 0 & 1


Generally the same as presolve, but the purpose need not be to speed up optimization. Instead, it could be aimed at deepening one's knowledge about the mathematical program, such as what is forced by implication of the constraints versus what is determined by the economic trade-off.

Precedence constraint

When ordering objects, like jobs to to be performed, this is a constraint that restricts the order: i must precede j, denoted LaTeX: \textstyle i << j. If order really means time, and if the model has decision variables LaTeX: t_i and LaTeX: t_j to denote the start times of LaTeX: i and LaTeX: j, resp., this precedence constraint can be written as LaTeX: \textstyle t_j \ge t_i + T_i, where LaTeX: T_i is the time job LaTeX: i takes. On the other hand, a precedence constraint need not correspond to real time. For example, LaTeX: i << j could mean if project LaTeX: j is not selected, we cannot select project LaTeX: i. In that case, suppose the model has binary variables LaTeX: x_i and LaTeX: x_j, where LaTeX: x_i=1 means project LaTeX: i is selected, and LaTeX: x_i=0 means it is not selected. Then, the precedence constraint LaTeX: i << j is represented as: LaTeX: x_i \le x_j.


A generalization of scaling that modifies the hessian so as to improve convergence properties. In the case of quadratic programming, this typically means multiplying the quadratic form matrix, LaTeX: Q, by a preconditioner, LaTeX: P, so that the condition number of LaTeX: PQ is as close to 1 as possible. There are many variations, including splitting a matrix, LaTeX: Q = P - S, to achieve a better conditioned matrix in the sense of its eigenvalue (and eigenspace) structure that governs convergence properties of algorithms like steepest ascent and conjugate gradient.

Predictor-corrector algorithm

This is a class of algorithms whose origin is in ordinary differential equations. One first "predicts" a solution with either a functional approximation or an algorithm, getting "close" to a solution. Then, this is the initial point for a "corrector" method, such as using Newton's method to get "closer" to the curve. Although it could be used to characterize many methods in mathematical programming, it arises most naturally in path following methods. The term began being widely used in connection with following the central path in an interior point method. One such algorithm is using the affine scaling direction as a predictor and centering as a corrector.


A heuristic applied to reduce the problem in some way before starting an algorithm. In linear programming, for example, one might scan for an equation of the form LaTeX: x=0, then simply fix LaTeX: x at zero, thus reducing the number of variables and constraints. Further details are in the supplement, simplex method, by which each variable is evaluated for its potential to improve the value of the objective function. Let LaTeX: p = c_B[B^{-1}], where B is a basis, and LaTeX: c_B is a vector of costs associated with the basic variables. The vector LaTeX: p is sometimes called a dual solution, though it is not feasible in the dual before termination. LaTeX: p is also called a simplex multiplier or pricing vector. The price of the j-th variable is LaTeX: c_j - pA_j. The first term is its direct cost LaTeX: (c_j) and the second term is an indirect cost, using the pricing vector to determine the cost of inputs and outputs in the activity's column LaTeX: (A_j). The net result is called the reduced cost, and its value determines whether this activity could improve the objective value.

Other pricing vectors are possible to obtain information about the activity's rates of substitution without actually computing LaTeX: \textstyle r = [B^{-1}]A_j. If LaTeX: \textstyle p = v[B^{-1}], then LaTeX: \textstyle pA_j = vr, and LaTeX: v = c_B is the special case to get the reduced cost. Another special case is to obtain information about whether the (nonbasic) activity would need to perform a degenerate pivot. For LP in standard from, let LaTeX: v_i=1 if LaTeX: x_i=0 and LaTeX: v_i=0 if LaTeX: \textstyle x_i > 0. Then, LaTeX: \textstyle vr = \sum_i \left \{r_i: x_i = 0\right\}. Thus, if LaTeX: pA_j > 0, activity LaTeX: j must have a positive tableau value with respect to some basic variable whose level is zero, so the pivot would have to be degenerate.


This is a tactic in the simplex method, by which each variable is evaluated for its potential to improve the value of the objective function. Let LaTeX: \textstyle p = c_B[B^{-1}], where LaTeX: B is a basis, and LaTeX: c_B is a vector of costs associated with the basic variables. The vector LaTeX: p is sometimes called a dual solution, though it is not feasible in the dual before termination. LaTeX: p is also called a simplex multiplier or pricing vector. The price of the j-th variable is LaTeX: \textstyle c_j - pA_j. The first term is its direct cost LaTeX: (c_j) and the second term is an indirect cost, using the pricing vector to determine the cost of inputs and outputs in the activity's column LaTeX: (A_j). The net result is called the reduced cost, and its value determines whether this activity could improve the objective value.

Other pricing vectors are possible to obtain information about the activity's rates of substitution without actually computing LaTeX: \textstyle r = [B^{-1}]A_j. If LaTeX: \textstyle p = v[B^{-1}], then LaTeX: \textstyle pA_j = vr, and LaTeX: v = c_B is the special case to get the reduced cost. Another special case is to obtain information about whether the (nonbasic) activity would need to perform a degenerate pivot. For LP in standard from, let LaTeX: v_i=1 if LaTeX: x_i=0 and LaTeX: v_i=0 if LaTeX: x_i > 0. Then, LaTeX: \textstyle vr = \sum_i \left \{r_i: x_i = 0 \right \}. Thus, if LaTeX: pA_j > 0, activity LaTeX: j must have a positive tableau value with respect to some basic variable whose level is zero, so the pivot would have to be degenerate.

Primal degeneracy

See Degeneracy

Primal method

An algorithm for which each iterate is feasible in the (primal) mathematical program.

Primal program

The original mathematical program, cited in the context of having its dual.

Prime representation

A set represented by an irredundant system of inequalities. (See polyhedron.)

Principle of optimality

The idea that an optimal path, or trajectory, is composed of optimal sub-paths. Equivalently, once we reach a particular state in a sequential decision process, the remaining decisions must be optimal with respect to that state. This is the foundation of dynamic programming.


There are various problems that have evolved that are popularly cited by name.

Product form of basis

See factored form of basis.

Product mix problem

To determine what mix of products to make from shared resources. For example, if a company has apples and cranberries, one could make a certain amount of apple juice, cranberry juice, cranapple juice, and applecran juice, each differing in the percentages of apples and cranberries. More extensive examples abound to illustrate one of the early, fundamental applications of linear programming.

Production scheduling problem

To determine levels of production over time. Constraints include demand requirements (possibly with backordering), capacity limits (including warehouse space for inventory), and resource limits. One basic model is as follows.


LaTeX: x_t = level of production in period t (before demand);
LaTeX: y_t = level of inventory at the end of period t;
LaTeX: U_t = production capacity in period t;
LaTeX: W_t = warehouse capacity in period t;
LaTeX: h_t = holding cost (per unit of inventory);
LaTeX: p_t = production cost (per unit of production);
LaTeX: D_t = demand at the end of period t.

Then, the mathematical program is

\min \{ p^T x + h^T y : 0 \le (x, y) \le (U, W), \, y_{t+1} = y_t + x_t - D_t \mbox{ for } t=0,\dots,T \}.

LaTeX: y(0) is the given initial inventory, and LaTeX: T is the planning horizon.

The condition that LaTeX: y \ge 0 means there is no backordering. Other tacit assumptions, which could be relaxed to gain more scope of the model are as follows.

  • Letting quantities be multiple – e.g., LaTeX: x_{k,t} = level of production of product LaTeX: k at time LaTeX: t. In such cases, competition for warehouse space and other resources result in more equations.
  • There could be an ending inventory constraint, such as LaTeX: y_T=y_0.
  • There could be additional decision variables to allow capacity expansion.
  • Costs could be nonlinear functions.
  • This could be a stochastic program; for example, with demands not known with certainty.

Also see warehouse problem.

Projected gradient method

A feasible direction is obtained for the region LaTeX: \textstyle \left \{x: Ax=b \right \} by projecting the gradient of the objective function to the null space of LaTeX: \textstyle A: d = P \nabla f(x), where LaTeX: \textstyle P = \begin{bmatrix}I - A^T[AA^T]^{-1} & A \end{bmatrix} (note LaTeX: A[Py] = 0 for all LaTeX: y, so LaTeX: \textstyle \left \{Py \right \} is the null space of LaTeX: A). Thus, LaTeX: x + ad is feasible for all feasible LaTeX: x and all LaTeX: a > 0 (since LaTeX: Ad=0). Further, if LaTeX: \textstyle P \nabla f(x) \ne 0, LaTeX: f(x+ad) = f(x) + a \nabla f(x)' P \nabla f(x) > f(x) for LaTeX: \textstyle a > 0, so the objective value improves each iteration until its projected gradient is null. At that point where LaTeX: \textstyle P \nabla f(x)=0, x satisfies first-order conditions.


See constraint propagation.

Proper optimum

A local optimum that is uniquely optimal in some feasible neighborhood.

Pseudo-boolean function

This maps LaTeX: \textstyle \left \{0,1 \right \}^n \to \mathbb{R}.

Pseudo-boolean program

LaTeX: \textstyle X = \left \{0,1 \right \}^n and all functions are pseudo-boolean.


(of a matrix). Same as generalized inverse.

Pseudo-monotone function

LaTeX: \textstyle f: \mathbb{R}^n \to \mathbb{R}^n is pseudo-monotone over LaTeX: X (subset of LaTeX: \textstyle \mathbb{R}^n) if

f(y)^T (x - y) \ge 0 \mbox{ implies } f(x)^T (x - y) \ge 0 \mbox{ for all } x, y \in X.

The gradient of a pseudoconvex function is a pseudo-monotone function.

Pseudoconcave function

Negative is pseudoconvex.

Pseudoconvex function

LaTeX: f is in LaTeX: C^1 and satisfies the property: LaTeX: \textstyle \nabla f(x)(y-x) \ge 0 implies LaTeX: \textstyle f(y) \ge f(x). While every differentiable