# 100% Rule

This pertains to sensitivity analysis in linear programming. In its original form, it uses the convexity of the set of admissible changes in the rim data to test whether a particular change is admissible: any combination of changes can occur as long as the total percentage deviation from the coordinate extremes does not exceed 100%. (Note: this applies to right-hand sides (b) and costs (c) separately.)

More generally, suppose the optimal value remains constant if the cost coefficient vector $LaTeX: c$ in a linear program is replaced with any of $LaTeX: c + d^1, c+d^2, \ldots , \mbox{ or } c + d^k$ (we could have $LaTeX: k = n$ and let $LaTeX: d^j$ be the j-th coordinate extreme for $LaTeX: c_j$, but that is not necessary). Then, the optimal objective value is the same for $LaTeX: \textstyle c + \sum_{i=1}^k \mu_i d^i$, provided $LaTeX: \mu_i \ge 0$ and $LaTeX: \textstyle \sum_{i=1}^k \mu_i \le 1$

The same applies for convex combination of changes in the right-hand side $LaTeX: b$ (maybe with the origin, which is no change). If the objective value remains optimal at if $LaTeX: b$ is replaced with any of $LaTeX: b + f^1, b + b^2, \ldots, b + f^q$, then it is also optimal for the right-hand side $LaTeX: \textstyle b + \sum_{i+1}^q \beta_i f^i$ so long that $LaTeX: \beta_i \ge 0$ and $LaTeX: \textstyle \sum_{i=1}^q \beta_i \le 1$

# ABS algorithm

This is a family of algorithms to solve a linear system of $LaTeX: m$ equations, $LaTeX: Ax = b$, with $LaTeX: n$ variables such that $LaTeX: n \ge m$. Its distinguishing property is that the k-th iterate, $LaTeX: x^k$, is a solution to the first $LaTeX: k$ equations. While algorithms in this class date back to 1934, the family was recognized and formally developed by J. Abaffy, C.G. Broyden and E. Spedicato, so the algorithm family name bears their initials. The ABS algorithm is given by the following.

Notation: $LaTeX: A_{(i,:)}$ is the i-th row of $LaTeX: A$.

1. Initialize: choose $LaTeX: x^0 \in \mathbb{R}^n$, $LaTeX: H^1 \in \mathbb{R}^{n \times n}$ so that is it nonsingular, set $LaTeX: i = 1$
2. Set search direction $LaTeX: d^i = H^i z$ for any $LaTeX: z$ that does not satisfy $LaTeX: z^T H^i A_{(i,:)} = 0$. Compute residuals: $LaTeX: r = Ax^i - b$.
3. Iterate: $LaTeX: x^{i+1} = x^i - s d^i$, where $LaTeX: s$ is the step size: $LaTeX: s = r_i / A_{(i,:)} d^i$.
4. Test for convergance (update residuals, if used in test): Is solution within tolerance? If so, stop; else, continue.
5. Update: $LaTeX: H^{i+1} = H^i - D^i$, where $LaTeX: D^i$ is a rank 1 update of the form $LaTeX: H^i A_{(i,:)} (H^i w)^T$ for $LaTeX: w$ such that $LaTeX: w^T H^i A_{(i,:)} = 1$.
6. Increment: let $LaTeX: i = i + 1$ and go to step 1.

The ABS algorithm extends to nonlinear equations and to linear diophantine equations.

# AIMMS

AIMMS is Advanced Integrated Multidimensional Modeling Software.

# AMPL

AMPL is A Mathematical Programming Language.

# ANALYZE

A computer-assisted analysis system to analyze mathematical programs, including an artificially intelligent level of support.

# Abstract program

This is a mathematical program defined on an abstract vector space (generally a Banach space). It unifies the ordinary mathematical program with control theory.

# Active constraint

An inequality constraint that holds with equality (at a point). Same as tight, but not to be confused with binding. The set of all active constraints generally includes equality constraints as well.

# Active set method

A technique that partitions inequality constraints into active constraints (or sufficiently close to be deemed active) and inactive constraints and then ignores the inactive constraints for the iteration. The active set for this iteration is sometimes called the working set. The new point is selected by moving on the surface defined by the working set (called the working surface).

# Activity analysis

An approach to micro-economics for which a system is composed of elementary functions, called activities, which influenced the early growth of linear programming. An insightful way to view an activity in a standard form linear program is that negative coefficients represent input requirements and positive coefficients represent outputs:

$LaTeX: \begin{bmatrix} c_j \\ \cdots \\ - \\ \cdots \\ + \end{bmatrix} \begin{matrix} \leftarrow & \mbox{cost or revenue} \\ \\ \leftarrow & \mbox{input to activity} j \\ \\ \leftarrow & \mbox{output from activity} j \end{matrix}$

In general, the reduced cost represents the net worth of the activity for the prices $LaTeX: p$,

$LaTeX: d_j = c_j - p^T A_{(:,j)} = \mbox{ input cost } - \mbox{ output revenue}.$

This leads to an economic interpretation of not only linear programming but also of the simplex method: agents (activity owners) respond instantaneously to changes in prices, and the activity with the greatest net revenue wins a bid to become active (basic), thus changing the prices for the next time (iteration).

In this context, activities can be regarded as transformations, from inputs to outputs. Three prevalent transformations are: form, place and time.

Here are examples:

$LaTeX: \begin{bmatrix} c \\ \cdots \\ -0.4 \\ -0.6 \\ \cdots \\ 1 \end{bmatrix} \begin{matrix} \\ \\ \mbox{ apples} \\ \mbox{ cranberries} \\ \\ \mbox{ cranapple} \end{matrix} \;\;\;\;\;\; \begin{bmatrix} c \\ \cdots \\ -1 \\ \cdots \\ 0.6 \\ 0.4 \end{bmatrix} \begin{matrix} \\ \\ \mbox{ Chicago} \\ \\ \mbox{ Denver} \\ \mbox{ Seattle} \end{matrix} \;\;\;\;\;\; \begin{bmatrix} c \\ \cdots \\ -1 \\ \cdots \\ 1 \end{bmatrix} \begin{matrix} \\ \\ t \\ \\ t+1 \end{matrix}

$

# Acyclic

Means without cycles. Arises in graphs and networks, as well as in the context of algorithms (e.g., not cycling back to some previous solution).

The $LaTeX: n \times n$ binary matrix, say $LaTeX: A$, that represents node adjacency in a simple graph: $LaTeX: A_{i,j} = 1$ if node $LaTeX: i$ is adjacent to node $LaTeX: j$. In the undirected case, $LaTeX: A$ is symmetric. In the directed case, the adjacency means there is an arc from $LaTeX: i$ to $LaTeX: j$. In the case of a multigraph, $LaTeX: A_{i,j}$ is the number of edges whose endpoints are nodes $LaTeX: i$ and $LaTeX: j$.

Two bases are adjacent if they differ by exactly one column (so one is reachable from the other by a single pivot).

The classical adjoint of a matrix $LaTeX: A$ is the transpose of the matrix of cofactors

$LaTeX: \mbox{adj }A = \left[ A^{i,j} \right]_{i,j}^T,$

where $LaTeX: A^{i,j}$ is the determinant of the submatrix of $LaTeX: A$ formed by removing the i-th row and the j-th column. The Hermitian adjoint (conjugate transpose or Hermitian transpose), $LaTeX: A^*$, is the transpose of the conjugate. We have that $LaTeX: A^* = A^T$ if $LaTeX: A$ is real-valued.

Generically means something that satisfies conditions. One example is the admissibility of a direction of change of some parameter, such as a right-hand side of a linear program (LP) from $LaTeX: b$ to $LaTeX: b + t h$, where $LaTeX: t$ is some scalar and $LaTeX: h$ is the direction of change. For a positive value of $LaTeX: t$, this might cause the LP to become infeasible, no matter how small it is. In that case it is common to call the direction vector inadmissible. For a perturbation to be admissible in this context, it is required that the mathematical program have a solution in some neighborhood of that direction. For the example, $LaTeX: h$ is admissible if the (primal) LP is feasible for all $LaTeX: t \in [0, t^*)$ for some $LaTeX: t^* > 0$.

Admissibility need not be for directions of change in the context of sensitivity analysis. The term is used elsewhere and defined for the context in which it is used. For example, a primal ascent algorithm could define a direction, say $LaTeX: d$, to be admissible if $LaTeX: x + td$ is feasible and $LaTeX: f(x+td) \ge f(x)$ for all $LaTeX: t \in [0, t^*)$ for some $LaTeX: t^* > 0$.

Initiating the simplex method with a known feasible basis instead of with logical variables as in Phase I. The advanced basis could be optimal for some parameter values, but those values have changed. It is usually better to start with an advanced basis because fewer iterations are generally required to reach the (new) optimal basis.

# Affine combination

The affine combination of $LaTeX: \{x^1, x^2, \ldots x^K\}$ is a linear combination $LaTeX: \sum_k a_k x^k$ such that $LaTeX: \sum_k a_k = 1$.

# Affine function

Let $LaTeX: f : X \rightarrow \mathbb{R}^n$. Then, $LaTeX: f$ is affine if $LaTeX: X$ is a convex set and

$LaTeX: f(\theta x + (1-\theta) y) = \theta f(x) + (1-\theta)f(y),$

for all $LaTeX: x, y \in X$ and $LaTeX: \theta \in [0, 1]$.

Equivalently, $LaTeX: f$ is affine if it is both convex and concave. Moreover, if $LaTeX: X = \mathbb{R}^n$, $LaTeX: f$ is the translation of a linear function: $LaTeX: a^T x + b$.

# Affine hull

The affine hull of a set $LaTeX: S$ is the intersection of all affine sets containing $LaTeX: S$. Equivalently, the set of all affine combinations of points in the set.

# Affine independence

The set $LaTeX: \{x^0, x^1, \ldots, x^k \}$ is affinely independent if their affine hull is k-dimensional. Equivalently, $LaTeX: \{ x^1-x^0, \ldots, x^k-x^0 \}$ is linearly independent. For n=2, any two distinct vectors are affinely independent (k=1). Three vectors need to form a triangle; otherwise, they are co-linear and are affinely dependent.

# Affine scaling

A variant of Karmarkar's algorithm for the linear program in standard form: $LaTeX: \min \{c^T x : x \ge 0, \; Ax = b \}$, where $LaTeX: A$ has full row rank.

Here are the basic steps (of the primal method).

1. Given $LaTeX: x > 0$, let $LaTeX: X = \mbox{diag}(x)$.
2. Estimate dual: $LaTeX: y = [AX^2A^T]^{-1} AX^2c$ and $LaTeX: d = c - A^T y$.
3. Move: $LaTeX: x = x - aX^2d / \|Xd\|_{\infty}$, where $LaTeX: a$ is a parameter in $LaTeX: (0, 1)$.

Multiplication of $LaTeX: A$ and $LaTeX: c$ by $LaTeX: X$ is a scaling operation, using current levels as scale values -i.e. $LaTeX: AX$ multiplies column $LaTeX: j$ of $LaTeX: A$ by $LaTeX: x_j$, and $LaTeX: Xc$ multiplies $LaTeX: c_j$ by $LaTeX: x_j$. Note that the reduced cost $LaTeX: d$ is in the null space of $LaTeX: A$ -i.e. $LaTeX: Ad = 0$, so $LaTeX: d$ is a feasible direction. This affine scaling direction is used in its own right in interior methods. It minimizes the duality gap subject to $LaTeX: Ax = b$ and $LaTeX: x$ in an ellipsoid (replacing $LaTeX: x \ge 0$).

# Affine set

(or affine manifold, affine variety, linear variety, flat.)

A set $LaTeX: S$ that contains the line through any two of its points --i.e. $LaTeX: x, y \in S$ implies $LaTeX: ax + (1-a)y \in S$ for all $LaTeX: a \in \mathbb{R}$. The dimension of an affine set is that of the (unique) subspace parallel to it. Dimensions of 0, 1 and 2 are called points, lines and planes, respectively.

# Aggregation

Aggregation is combining units to reduce problem dimensions. One form of aggregation is to combine basic entities at the data level, such as regions, time periods, and materials. The dimensions of the mathematical program, which include numbers of variables and constraints, are generally reduced by aggregation at the entity level. Another form of aggregation is the use of surrogates, including the strong form: integer equivalent aggregation.

# Algorithm

An iterative method that generates a sequence of the form $LaTeX: x^{k+1} = S_k (A_k (x^k))$, where $LaTeX: x^0$ is given (called the initial point). $LaTeX: A_k$ is an algorithm map that yields a set of policies, given the current point, $LaTeX: x^k$, and $LaTeX: S_k$ is a selection function (in case $LaTeX: A_k$ has more than one policy). Note that the algorithm map and selection function can depend on the iteration number $LaTeX: k$. A concrete example is of the form $LaTeX: x^{k+1} = x^k + s_k d^k$, where $LaTeX: s_k$ is a scalar parameter, called the [[Step_size|]]tep size and $LaTeX: d^k$ is a vector, called the direction of change. The step size may require a selection from a line search if there are many optimal solutions; one is to select the least value (perhaps above some threshold). In practice, algorithms have a variety of tolerances to control its computer representation, progression and termination.

A stochastic algorithm is one whose algorithm map is a random variable. Unlike the meaning in computer science, most people in mathematical programming mean an algorithm whereby an iterate is selected by some random mechanism. One example is simulated annealing. When $LaTeX: x^k$ is a random variable, there is a meaning to convergence that differs from ordinary (deterministic) sequences.

An online algorithm is one that obtains a solution for an online problem. Typically, this is a heuristic that obtains a "good" solution because there is not enough time to guarantee optimality.

# Alldifferent constraint

is a global constraint that is imposed on $LaTeX: n$ variables, $LaTeX: x_1, \dots, x_n$, stating that every variable is assigned a different value.

In other words, it represents the set of disequalities $LaTeX: x_1 \neq x_2, x_1 \neq x_3, \dots, x_{n-1} \neq x_n$, however, typically, algorithms for all-different propagate far more efficiently than the set of disequality constraints.

# Almost complementary

A pair of vectors, $LaTeX: w, z \in \mathbb{R}^n_+$, such that each coordinate, except one, satisfies complementary slackness -i.e. $LaTeX: w_i z_i = 0$ for all but one $LaTeX: i$.

# Alternative systems

Two systems of inequalities and/or equations such that there exists a solution to one system, or there exists a solution to the other system, and there cannot exist a solution to both systems. Several alternative systems are widely used in mathematical programming, especially in the study of duality.

• Fourier (1826)
$LaTeX: Ax \le a, \; Bx < b$ (not vacuous)
or
$LaTeX: A^Ty + B^Tv = 0, \; y \ge 0, \; v \ge 0, \; a^Ty + b^T v < 0$
• Gordan (1873)
$LaTeX: Ax > 0$
or
$LaTeX: A^Ty = 0, \; y \ge 0, y \neq 0$
• Farkas (1902)
$LaTeX: Ax = b, \; x \ge 0$
or
$LaTeX: A^Ty \ge 0, \; b^Ty < 0$

A Variant is

$LaTeX: Ax \le b$
or
$LaTeX: A^Ty = 0, \; b^Ty < 0, \; y \ge 0$
• Fredholm (1903)
$LaTeX: Ax = b$
or
$LaTeX: A^Ty = 0, \; b^Ty > 0$
• Stiemke (1915)
$LaTeX: Ax = 0, \; x > 0$
or
$LaTeX: A^Ty \ge 0, \; A^T y \ne 0$
• Motzkin (1936)
$LaTeX: Ax \le a, \; Bx < b, \; Cx = c$ ($LaTeX: B$ not vacuous)
or
$LaTeX: A^Ty + B^Tv + C^Tw = 0, \; y \ge 0, \; v \ge 0, \; v \neq 0, \; a^Ty + b^Tv +c^Tw \le 0$
• Ville (1938)
$LaTeX: Ax > 0, \; x \ge 0$
or
$LaTeX: A^Ty \le 0, \; y \ge 0, \; y \neq 0$
• Tucker (1956)
$LaTeX: Ax \ge 0, \; Ax \neq 0, \; Bx \ge 0, \; Cx = 0$
or
$LaTeX: A^Ty + B^Tv + C^Tw = 0, \; y > 0, \; v \ge 0$
• Gale (1960)
$LaTeX: Ax \le b$
or
$LaTeX: A^Ty = 0, \; b^Ty = -1, \; y \ge 0$

### Integer Systems

• Papadimitriou and Steiglitz (1982) Let $LaTeX: A$ be an $LaTeX: m \times n$ integer matrix, $LaTeX: a^* = \max_{(i,j)} |A_{i,j}|$, $LaTeX: M = (ma^*)^{m+1}$, and $LaTeX: b$ an integer $LaTeX: m$-vector.
$LaTeX: Ax = 0, x \geq 0, \; x \neq 0, \; x \text{ integer-valued}$
or
$LaTeX: A^Ty \geq 1, \; y \in \{ 0, \pm 1, \pm 2, \ldots, \pm M \}^m$.

### Convex Systems

For the following, $LaTeX: f$ is convex on $LaTeX: X$.

• Fan, Glicksburg and Hoffman (1957)
$LaTeX: f(x) < 0$
or
$LaTeX: y f(x) \geq 0$ for all $LaTeX: x \in X, \; y \geq 0, \; y \neq 0$.
(See NLP Myth 12 to avoid misconception.)

# Analytic center

Given the set, $LaTeX: \{x \in X : g(x) \ge 0 \}$, which we assume is non-empty and compact, such that $LaTeX: g$ is concave on $LaTeX: X$, with a non-empty strict interior, its analytic center is the (unique) solution to the maximum entropy problem:

$LaTeX: \max \left\{ \sum_i \ln(g_i(x)) : x \in X, g(x) > 0 \right\}.$

Note that the analytic center depends on how the set is defined -- i.e., the nature of $LaTeX: g$, rather than just the set, itself. For example, consider the analytic center of the box, $LaTeX: [0,1]^n$. One form is to have $LaTeX: 2n$ functions as: $LaTeX: \{ x : x \ge 0, 1-x \ge 0\}$. In this case, the analytic center is $LaTeX: x^*_j = y^*$ for all $LaTeX: j$, where $LaTeX: y^*$ is the solution to:

$LaTeX: \max \left\{ \ln(y) + \ln(1 - y) : 0 < y < 1 \right\}.$

Since $LaTeX: y^* = 1/2$, the analytical center is what we usually think of as the center of the box. However, the upper bounds could be defined by $LaTeX: 1 - (x_j)^p \ge 0$ for all $LaTeX: j$, where $LaTeX: p > 1$ (so $LaTeX: 1 - (x_j)^p$ is concave). This changes the functional definition, but not the set -- it's still the unit box. The analytic center is skewed towards the corner because the defining mathematical program is:

$LaTeX: \max \left\{ \ln(y) + \ln(1 - y^p) : 0 < y < 1 \right\}.$

The solution is $LaTeX: y^* = (1/(1+p))^{(1/p)}$, so the analytic center approaches $LaTeX: (1,1,...,1)$ as $LaTeX: p \rightarrow \infty$.

# Ant colony optimization

A heuristic search approach to solving a combinatorial program based on the behavior of ant colonies, particularly their ability to collectively determine shortest paths through the cumulative affect of pheramones.

# Anticycling rule

A rule that prevents cycling, such as Bland's rule for linear programming.

# Approximation algorithm

An algorithm that reaches a feasible solution to an NP-complete problem in polynomial time with a guaranteed bound on the quality of the solution. For minimization, with $LaTeX: Z^*$ being a positive optimal value, the bound generally has the form:

$LaTeX: Z \le rZ^* \mbox{ for some } r > 1,$
where $LaTeX: Z$ is the value of the feasible solution.

# Arborescent sets

Any two sets that are either disjoint, or one is a proper subset of the other.

# Arc consistency

A consistency property in constraint programming used as a basis for inference. A variable $LaTeX: x$ is arc-consistent with another variable $LaTeX: y$ if for every value in the domain of $LaTeX: x$ there exists a value in the domain of $LaTeX: y$ such that the binary constraint $LaTeX: C(x,y)$ is satisfied. A binary constraint $LaTeX: C(x,y)$ is arc-consistent if both of its variables are arc-consistent with respect to each other.

# Argmax

The optimality region for maximization:

$LaTeX: \arg\max \{ f(x) : x \in X \} = \{x^* \in X : f(x^*) \ge f(x) \mbox{ for all } x \in X \}.$

It is the "argument of the maximum."

# Argmin

The optimality region for a minimization problem:

$LaTeX: \arg\min \{ f(x) : x \in X \} = \{x^* \in X : f(x^*) \le f(x) \mbox{ for all } x \in X \}.$

It is the "argument of the minimum."

# Artificial variable

A variable, say $LaTeX: v$, added to an equation, $LaTeX: h(x) = 0$. The resulting system, $LaTeX: h(x) + v = 0$, is feasible upon letting $LaTeX: v = -h(x)$ for a chosen $LaTeX: x$. The objective function is modified to penalize nonzero values of $LaTeX: v$. Often, $LaTeX: v \ge 0$ is required, multiplying $LaTeX: h$ by -1 if necessary. This originated in linear programming, where a Phase I objective is used to find a solution with $LaTeX: v=0$ (or ascertain that the original system has no feasible solution) by minimizing $LaTeX: \textstyle \sum_i v_i$ (ignoring the original objective, $LaTeX: c^T x$).

# Assembly line balancing problem

A combinatorial program. An assembly line consists of $LaTeX: m$ work stations, connected by a conveyor belt. Each station performs a set of tasks that takes a cycle time, $LaTeX: c$. Each task must be assigned to exactly one station such that precedence constraints are satisfied. One model minimizes the total cycle time, given the number of stations. Another minimizes the number of stations, given the maximum allowable cycle time. There are variants of this simple form.

# Assignment polytope

The assignment polytope is

$LaTeX: \textstyle \left\{ x \in \mathbb{R}^{n^2} : x \ge 0, \; \sum_i x_{i,j} = 1 \mbox{ for } j = 1, 2, \ldots, n, \; \sum_j x_{i,j} = 1 \mbox{ for } i = 1, 2, \ldots, n \right\}.$

The name originates from an interpretation of its extreme points in the assignment problem. If $LaTeX: x_{i,j} = 1$, assign person $LaTeX: i$ to task $LaTeX: j$. Each extreme point has the property that each element of $LaTeX: x$ is 0 or 1. The sums describing the polytope require that each row ($LaTeX: i$) and each column ($LaTeX: j$) has exactly one 1 in it. This means every person is assigned to some task, and every task is assigned to be done by one person. The polytope is also the set of doubly stochastic matrices, whose extreme points are the permutation matrices.

# Assignment problem

Choose an assignment of people to jobs so as to minimize total cost. The ordinary model is that of a matching problem. Although the usual assignment problem is solvable in polynomial time (as a linear program), important extensions are the following NP-complete problems:

• Quadratic assignment (QAP). The objective is a quadratic form, which can have just cross terms, so it need not be convex (in fact, it can be quasiconcave, in which case a solution occurs at an extreme point). The QAP includes the traveling salesman problem as a special case (this is what is used with a neural network model of the TSP).
• Multi-dimensional assignment. More than 2 indexes, the variables are of the form, $LaTeX: x_{i,j,k, ...}$. The constraints sum all but one index -- for example, the 3-index assignment problem is

$LaTeX: \textstyle \begin{array}{rrcl} \min & \sum_{i,j,k} c_{i,j,k} x_{i,j,k} & & \\ \\ \mbox{s.t.} & \sum_{j,k} x_{i,j,k} & = & 1 \mbox{ for all } i, \\ & \sum_{i,k} x_{i,j,k} & = & 1 \mbox{ for all } j, \\ & \sum_{i,j} x_{i,j,k} & = & 1 \mbox{ for all } k. \end{array}$

# Asymptotic LP

A linear program in which the coefficients are functions of a single parameter, usually denoting time (some authors require the functions to be rational -i.e. of the form $LaTeX: p(t)/q(t)$, where $LaTeX: p$ and $LaTeX: q$ are polynomials). The problem is to find a steady state solution -i.e., one that is optimal (or nearly optimal) for all sufficiently large values of the time parameter.

# Asymptotic stability

The system $LaTeX: x^{k+1} = A x^k$ is asymptotically stable in the large if $LaTeX: \textstyle \lim_{k \rightarrow \infty} x^k = 0$ for any initial $LaTeX: x^0$. Originally developed for differential (and difference) equations, this applies to a sequence generated by an algorithm.

# Atleast constraint

is a global constraint that states the minimum number of occurrences of a particular value among an array/set of variables. Typically it is written as atleast($LaTeX: n$, $LaTeX: \langle x_1, \dots, x_m \range$, $LaTeX: v$), which is true if value $LaTeX: v$ is assigned at least [/itex]n[/itex] times among the variables $LaTeX: \langle x_1, \dots, x_m \rangle$.

# Atmost1 constraint

is a global constraint stating that $LaTeX: n$ sets $LaTeX: S_1, \dots, S_n$ of known cardinality should intersect pairwise in at-most one element. If $LaTeX: (s_1, \dots, s_n) \in atmost1(S_1, \dots, S_n, c_1, \cdots, c_n)$ then $LaTeX: s_1\subset S_1, \dots , s_n\subset S_n$, $LaTeX: |s_1|=c_1, \dots, |s_n|=c_n$ and $LaTeX: |s_i\cap s_j| \leq 1$ for all $LaTeX: 1\leq i < j \leq n$.

# Atmost constraint

is a global constraint that states the maximum number of occurrences of a particular value among an array/set of variables. Typically it is written as atmost($LaTeX: n$, $LaTeX: \langle x_1, \dots, x_m \range$, $LaTeX: v$), which is true if value $LaTeX: v$ is assigned at most $LaTeX: n$ times among the variables $LaTeX: \langle x_1, \dots, x_m \rangle$.

# Augmented Lagrangian

The Lagrangian augmented by a term that retains the stationary properties of a solution to the original mathematical program but alters the Hessian in the subspace defined by the active set of constraints. The added term is sometimes called a penalty term, as it decreases the value of the augmented Lagrangian for $LaTeX: x$ off the surface defined by the active constraints. For example, the penalty term could be proportional to the squared norm of the active constraints:

$LaTeX: L_a (x, y) = L(x, y) + p G(x)^T G(x),$

where $LaTeX: p$ is a parameter (negative for maximization), $LaTeX: G(x)$ is the vector of active constraints (at $LaTeX: x$), and $LaTeX: L$ is the usual Lagrangian. In this case, suppose $LaTeX: x^*$ is a Kuhn-Tucker point with multiplier $LaTeX: y^*$. Then, $LaTeX: L_a(x^*, y^*) = L(x^*, y^*)$ (since $LaTeX: G(x^*) = 0$). Further, the gradient of the penalty term with respect to $LaTeX: x$ is $LaTeX: 0$ at $LaTeX: x^*$, and the Hessian of the penalty term is simply $LaTeX: p A(x^*)^T A(x^*)$, where $LaTeX: A(x^*)$ is the matrix whose rows are the gradients of the active constraints. Since $LaTeX: A(x^*)^TA(x^*)$ is positive semi-definite, the penalty term has the effect of increasing the eigenvalues (decreasing them if maximizing).

# Augmenting path

In network flows, an augmenting path is a path from a source to sink such that every forward arc has flow less than capacity.

# Automatic differentiation

A process for evaluating derivatives of a function that depends only on an algorithmic specification of the function, such as a computer program. (The term "automatic" stems from having the derivatives produced when executing the program to evaluate the function.)

There are two modes:

1. Forward - direct parse and evaluation, introducing intermediate variables in the usual way, as rules of differentiation are applied.
2. Reverse - evaluates entries in the code list first, and lastly differentiates the intermediate and independent variables in reverse order.

While the forward mode is straightforward and easier to implement, the reverse mode is computationally superior for scalar functions, as its complexity is independent of the number of intermediate variables. More generally, if $LaTeX: f : \mathbb{R}^n \rightarrow \mathbb{R}^m$, the forward mode has complexity of O(n), and the backward mode has complexity of O(m).

# BFGS method

This is a method to solve an unconstrained nonlinear program that proceeds as follows.

1. Start with any symmetric, negative definite matrix, say $LaTeX: B^0$, and any initial point $LaTeX: x^0$. Set $LaTeX: k = 0$
2. Direction: Solve $LaTeX: B^k d^k = -\nabla f(x^k)$.
3. Step size: $LaTeX: s^k \in \arg\max \{ f(x^0 + td^k) : t \ge 0 \}$.
4. Change in position: $LaTeX: p^k = s^k d^k$.
5. New point: $LaTeX: x^{k+1} = x^k + p^k$.
6. Change in gradient: $LaTeX: q^k = \nabla f(x^{k+1}) - \nabla f(x^k)$.
7. Update $LaTeX: B^k$ to form $LaTeX: B^{k+1}$ with the BFGS update.
8. Let $LaTeX: k = k+1$ to complete the iteration.

Compare this with the DFP method, and note a key difference is that DFP estimates the inverse hessian, whereas BFGS estimates the hessian (and solves the linear system). There is a correspondance, and empirical evidence suggests, that the BFGS method has less of a problem with error.

# BFGS update

A way to update an approximation of the hessian, used for unconstrained optimization. The update is

$LaTeX: B' = B + \frac{(q - Bp)(q - Bp)^T}{p^T(q - Bp)},$

where $LaTeX: B'$ is the update from $LaTeX: B$ and $LaTeX: p$ and $LaTeX: q$ are defined in the BFGS method for the $LaTeX: k$-th iteration. Taking the inverse to compare with the DFP update,

$LaTeX: H' = H + \frac{1 + q^T H q}{(q^T p)^2} \left( pp^T \right) - \frac{1}{p^T q} \left( pq^T H + H qp^T \right)$

where $LaTeX: H'$ is the update from $LaTeX: H$.

# Backbone

Originally, this was a set of propositions that must be true in every solution of a satisfiability problem. If there is no solution, one seeks a set of truth values that maximizes the number of clauses that are true. Then, the backbone is the set of propositions that must be true in every optimal solution. This has been extended to various combinatorial programs, for which the backbone is

$LaTeX: \{j : x_j = 1 \mbox{ for all } x \in X^* \},$

where $LaTeX: x$ is a vector of binary variables in the IP formulation and $LaTeX: X^*$ is usually the optimality region, although it could allow some ball around the objective value. Some results suggest that the difficulty of an NP-hard problem can be further analyzed by the size of itsbackbone.

# Backjumping

A backtracking algorithm that, in contrast to classic backtracking algorithms, records partial justifications for inferences that are made during the tree search. These justifications allow the search to soundly jump back several levels in the search tree in case of an inconsistency.

# Backtracking

An algorithm applied in tree search that builds up candidate solutions by making one decision at a time (i.e., branching, see branch and bound) and checks for feasibility at each decision. In case an infeasible state is reached,the algorithm "backtracks", returning to a previous decisions and changing it.

In constraint programming a narrower definition is often used: backtracking is a recursive algorithm that traverses the search tree depth-first, such that at each step a variable is assigned a value (all possible values are assigned in turn) and then the partial assignment is checked for consistency, i.e. the algorithm checks whether all constraints whose variable are fully assigned, are satisfied. If there is an inconsistency, the algorithm prunes the sub-tree and tracks back one step.

# Backward Transformation

Backward Transformation (abbr. BTRAN). This applies to the factored system, $LaTeX: u E_1 E_2 \ldots E_n = c$, where each $LaTeX: E_i$ is an elementary matrix. The recursion is:

$LaTeX: u_n E_n = c$
$LaTeX: u_{n-1} E_{n-1} = u_n$
...
$LaTeX: u_{1} E_{1} = u_2$,
for which $LaTeX: u_1$ is a solution to the original system.

# Backward substitution

The recursion to solve a nonsingular upper triangular system, $LaTeX: Ux = b$. It starts with $LaTeX: x_n = b_n / U_{n,n}$, then

$LaTeX: x_j = \frac{1}{U_{j, j}} \left( b_j - \sum_{i > j} U_{i,j} x_i \right) \mbox{ for } j = n-1, \ldots ,1.$

# Backward triangularization

An algorithm to rearrange a matrix by recursively assigning a singleton column to the rear (with its only row as its pivot). The recursion applies to the submatrix defined by omitting the assigned column and row (and reducing remaining column counts accordingly). This results in a lower-triangular rearrangement if and only if such a rearrangement exists. Otherwise, the algorithm reaches a submatrix will all columns having more than one element, of which one is selected to be a spike that is assigned to the rear of the matrix. The process then continues on the submatrix.

Example 1: Let the nonzeros be denoted by *:

$LaTeX:

 A = \begin{bmatrix} * & & & \\ * & * & * & * \\ * & & & * \\ * & & * & * \end{bmatrix}

$

The column counts are $LaTeX: (4,1,2,3)$. We thus put column $LaTeX: 2$ at the rear, assigned to pivot on row $LaTeX: 2$, and consider the remaining $LaTeX: 3\times 3$ submatrix:

$LaTeX:

A^1 = \begin{bmatrix} * & & & \cdot & \\ * & & * & \cdot & \\ * & * & * & \cdot & \\ \cdot & \cdot & \cdot & \cdot & \cdot \\ * & * & * & \cdot & * \end{bmatrix}

$

The remaining column counts (for $LaTeX: A^1$) are $LaTeX: (3,1,2)$, so column $LaTeX: 2$ is placed at the (new) rear, pivoting on row $LaTeX: 3$:

$LaTeX:

A^2 = \left[\begin{array}{cc|cc} * & & & \\ * & * & & \\ \hline * & * & * & \\ * & * & * & * \end{array}\right]

$

The remaining column counts are $LaTeX: (2,1)$, so we reach the final rearrangement, which is triangular:

$LaTeX:

A^3 = \left[\begin{array}{cc cc} * & & & \\ * & * & & \\ * & * & * & \\ * & * & * & * \end{array}\right]

$

Example 2: Column counts are shown below the matrix.

$LaTeX:

\begin{array}{l} A = \left[\begin{array}{cccc} * & & & \\ * & * & * & * \\ * & * & & * \\ * & & * & * \end{array}\right] \\ \phantom{A = \left[\right.}~ \begin{array}{cccc} 4 & 2 & 2 & 3 \end{array} \end{array}

$

All column counts are greater than one, so we select a column to be a spike, and we assign it to the rear with a pivot row (some algorithms delay the pivot row assignment). For this example, we choose a column with minimum count and pick the pivot row with maximum count.

$LaTeX:

\begin{array}{l@{\,\Rightarrow\,}l} A^1 = \left[\begin{array}{ccc | c} * & & & \\ * & * & & \\ * & & * & * \\ \hline * & * & * & * \end{array}\right] & A^2 = \left[\begin{array}{cc |cc} * & & & \\ * & * & & * \\ \hline * & & * & \\ * & * & * & * \end{array}\right] = A^3. \end{array}

$

The final rearrangement ($LaTeX: A^3$) has one spike.

# Balance equation

A constraint of the form $LaTeX: I(x) - O(x) = 0$, where $LaTeX: I(x)$ is the input and $LaTeX: O(x)$ is the output, each as a function of the decision variables, $LaTeX: x$. Typically this is linear, conserving flows, measured in units of volume or mass. In an ordinary network in which there are no gains or losses, this appears as:

$LaTeX: I(x) - O(x) = \sum_{i} x_{i,j} - \sum_{k} x_{j,k} = 0.$

# Barrier function

$LaTeX: P$ is a barrier function on the strict interior, say $LaTeX: I$, of the set $LaTeX: S$ if $LaTeX: P$ is continuous, and $LaTeX: P(x) \rightarrow \infty$ as $LaTeX: x$ approaches any point in $LaTeX: S\backslash I$. Barrier functions arises in penalty function methods for certain classes that include convex programs of the form:

$LaTeX: \min \{ f(x) : g(x) \ge 0 \},$

where $LaTeX: I = \{ x : g(x) > 0 \}$. Two popular choices are

$LaTeX: \textstyle P(x) = \sum_i 1 / g_i(x),$

and

$LaTeX: \textstyle P(x) = -\sum_i \ln (g_i(x)).$

# Barycenter

This is the center of mass and is the point at which a mass (body) can be considered as being concentrated without altering the effect of the attraction that the earth has upon it. This is the average position of particles located at postions $LaTeX: x^i$ weighted by their masses $LaTeX: m_i$,

$LaTeX: \frac{\sum_i m_i x^i}{\sum_i m_i}.$

In mathematical programming, this is usually used to mean the simple average of the extreme points of a polytope,

$LaTeX: \textstyle \frac{1}{m} \sum_i x^i,$

where $LaTeX: \{x^1, \ldots, x^m\}$ is the set of extreme points. This could be a weighted average,

$LaTeX: \textstyle \sum_i w_i x^i,$

where $LaTeX: w \in S_m$.

# Basic

Associated with a submatrix of $LaTeX: A$, say $LaTeX: B$, whose columns comprise a basis for $LaTeX: \mathbb{R}^m$ (i.e., $LaTeX: B$ consists of $LaTeX: m$ linearly independent columns of $LaTeX: A$, which is a basis for $LaTeX: \mathbb{R}^m$).

Here are some related terms that arise in linear programming.

• Adjacent basis. One that differs in exactly one column from a given basis.
• Basic column. A column of the basis matrix.
• Basic variable. The variable, say $LaTeX: x_j$, associated with the $LaTeX: j$-th column of the basis matrix.
• Basic level. The value of a basic variable.
• Basic solution. The solution, $LaTeX: x$, obtained by setting nonbasic values to some bound value, like 0, resulting in a unique solution for the basic variables. That is, $LaTeX: Ax=b$ is equivalent to $LaTeX: Bu + Nv = b$, where $LaTeX: A=[B \; N]$ and $LaTeX: x=[u^T \; v^T]^T$. Upon fixing the value of $LaTeX: v$, the nonsingularity of $LaTeX: B$ gives the basic solution with $LaTeX: u = B^{-1}(b - Nv)$. In a standard linear program, $LaTeX: v=0$, and hence $LaTeX: u = B^{-1} b$.
• Basic feasible solution. A basic solution that is feasible --i.e., the basic values satisfy their bounds. (In a standard LP, this means $LaTeX: B^{-1}b \ge 0$.)
• Basis kernel. After performing forward triangularization, if the basis does not triangularize completely, backward triangularization is applied. The result is a (rearranged) blocking of the basis into three segments:
                |\
| \ <--- Forward triangle
|__\ ______
|   |      |
|   |      | <--- Kernel
|   |______|
|          |\
|          | \ <--- Backward triangle
|__________|__\

Each row and column in the kernel has at least 2 nonzeroes.

# Basic solution

See terms associated with being basic.

# Benders decomposition

This decomposition separates complicating variables. The (original) semi-linear model is:

$LaTeX: \max \{ c^Tx + F(y) : x \in X, \; y \in Y, \; Ax + G(y) \le b \}.$

Generalized Benders decomposition considers the following mathematical program:

$LaTeX: \max \{ f(x, y) : g(x, y) \le 0, \; h(x, y) = 0, \; x \in X, \; y \in Y \}.$

Suppose this is not a convex program, but that with each fixed $LaTeX: y \in Y$, the remaining mathematical program (with $LaTeX: x$ being the only variable) is convex. The idea is to consider

$LaTeX: v(y) = \sup \{ f(x, y) : g(x, y) \le 0, \; h(x, y) = 0, \; x \in X \},$

and solve $LaTeX: \max \{v(y) : y \in Y\}$, with evaluations of $LaTeX: v$ requiring the solution to the convex program.

# Biconcave function

A function whose negative is biconvex.

# Biconvex function

A function $LaTeX: f(x, y)$ on $LaTeX: X \times Y$ that is convex on $LaTeX: X$ for each $LaTeX: y \in Y$ and convex on $LaTeX: Y$ for each $LaTeX: x \in X$. An example is $LaTeX: f(x,y) = xy$ on $LaTeX: \mathbb{R}^2$, which is not convex in both $LaTeX: x$ and $LaTeX: y$ together.

# Biconvex program

A nonlinear program with each function biconvex or biconcave, such that if $LaTeX: x$ or $LaTeX: y$ is fixed, the resulting mathematical program in $LaTeX: y$ or $LaTeX: x$, resp., is a convex program. A special case is the bilinear program.

# Bidding algorithm

Also called an auction algorithm. This proceeds by generating a sequence of partial solutions (e.g., a partial assignment), terminating when the solution is complete. There is a pricing vector and a bidding increment whereby the solution is revised and extended by a myopic optimization scheme (which depends on the particular problem).

# Big-M method

An alternative to using Phase I, a large number $LaTeX: M$ is used as a linear penalty in the composite objective:

$LaTeX: \min \{ c^T x + M \; e^T v: Ax + v = b, \; x, v \ge 0 \},$

where $LaTeX: v$ is an artificial variable and $LaTeX: e = (1, 1, ..., 1)^T$.

# Bilevel program

A multilevel program with two levels (sets of variables).

# Bilinear function

A special quadratic function, $LaTeX: x^T Q y + c^Tx + b^Ty$, that is linear in $LaTeX: y$ for each $LaTeX: x$, and linear in $LaTeX: x$ for each $LaTeX: y$.

# Bilinear program

A nonlinear program whose objective and/or constraint functions are bilinear. An example is the pooling problem.

# Bin packing problem

Partition a set of integers, $LaTeX: \{c_1, \ldots, c_n\}$, into bins such that the sum of all integers in each bin does not exceed the bin capacity, say $LaTeX: b$, so as to minimize the number of bins required. The problem is NP-hard.

# Binary CSP

A constraint satisfaction problem is binary if all constraints are either unary or binary relations (wrt the decision variables). That is, the cardinality of the scope of the constraint is either 1 or 2. For example, $LaTeX: x > 2$ (unary) or $LaTeX: x \neq y$ (binary).

# Binary constraint

A constraint is binary if there are exactly two variables in its scope.

# Binary relation

A subset of $LaTeX: S \times T$, say $LaTeX: R$. When $LaTeX: T=S$, we refer to just the relation on the set $LaTeX: S$.

Here are 5 properties that distinguish relations on a set ($LaTeX: x$, $LaTeX: y$, $LaTeX: z$ are in $LaTeX: S$):

1. antisymmetric - $LaTeX: (x, y) \in R$ and $LaTeX: x \ne y$ imply $LaTeX: (y, x) \not\in R$.
2. irreflexive - $LaTeX: (x, x) \not\in R$.
3. reflexive - $LaTeX: (x, x) \in R$.
4. symmetric - $LaTeX: (x, y) \in R$ implies $LaTeX: (y, x) \in R$.
5. transitive - $LaTeX: (x, y)$ and $LaTeX: (y, z)$ both in $LaTeX: R$ imply $LaTeX: (x, z) \in R$.

# Binary variable

A decision variable with two feasible values, usually 0 or 1.

# Binding constraint

A constraint whose removal changes the optimality region. (Some authors require that the removal results in a strict improvement in the objective value.)

# Bisection

Taking the midpoint of endpoints known to contain a solution. If the solution is in the interval (a,b), bisection chooses x = (a+b)/2 as the next point to evaluate. The result reduces the interval to (a,x) or (x,b), so the length is cut in half with each evaluation.

# Bland rule

This is for pivot selection in the simplex method to avoid cycling:

• If more than one (nonbasic) column has a negative (for minimization) reduced cost, choose the one with lowest index.
• If more than one (basic) column has the same determining value to leave the basis, select the one with the lowest index.

Beale's example of cycling shows that Bland's Rule must apply to all candidates for entering the basis, not just those with most negative reduced cost, see the supplement on cycling. The fifth tableau is the first time Bland's Rule is needed to break the cycle. Variable $LaTeX: x_1$ is chosen to enter the basis rather than $LaTeX: x_5$.

# Blending problem

A blend is some combination of materials to make another material. Given raw materials, their blends make intermediate materials, called stock, and/or final materials, called products (there can be other blending activities that make products from stocks). The raw materials have purchase costs and the blending activities have costs of operation and maintenance. The products have either fixed demands or selling prices, and the problem is to find blends that minimize total net cost (or maximize profit). This arises in refinery operations, in the food industry, and in other process industries. The problem can sometimes be solved with linear programming, but there can be complicating relations that require nonlinear programming, such as the pooling problem.

# Block pivot

Given a detached coefficient form, a pivot is performed on a nonsingular submatrix of nonbasic variables (rather than just one element). Suppose the detached coefficient form is partitioned into 2 row blocks and 5 column blocks (with the last column corresponding to the right-hand side):

$LaTeX: A = \left[ \begin{array}{ccccc} I & 0 & P & R & e \\ 0 & I & Q & S & f \end{array} \right].$
(Note: each $LaTeX: I$ is an identity matrix of appropriate dimension.) If $LaTeX: P$ is nonsingular, the basis can change by entering all (nonbasic) variables associated with the columns of $LaTeX: P$ for all basic variables associated with the rows of $LaTeX: P$.

# Blocking polyhedron

The blocking polyhedron of a polyhedron $LaTeX: P$ is $LaTeX: \{ y \in \mathbb{R}^n_+ : y^T x \ge 1 \mbox{ for all } x \in P\}$.

# Bolzano-Weierstrass theorem

A fundamental existence theorem guaranteeing a maximum and a minimum.

A continuous function on a non-empty compact set achieves a minimum and a maximum over the set.

A useful corollary to the Bolzano-Weierstrass theorem is:

Suppose $LaTeX: f$ is continuous on its non-empty feasible region, $LaTeX: F$, and that $LaTeX: F$ is a closed set. Then, $LaTeX: f$ achieves a maximum on $LaTeX: F$ if there exists $LaTeX: x^0$ such that $LaTeX: \{ x \in F : f(x) \ge f(x^0) \}$ is bounded.

In particular, if we have a quadratic objective of the form $LaTeX: f(x) = x^T Q x + c^T x$, it is sufficient for $LaTeX: Q$ to be negative definite. Although often credited to Weierstrass alone, it was proven by Bolzano in 1817, and it was known to Cauchy near the end of the 19th century.

# Bordered hessian

The complete Hessian of the Lagrangian:

$LaTeX: \left[ \begin{array}{cc} H_x[L(x,u,v)] & G^T \\ G & 0 \end{array} \right],$
where
$LaTeX: G = \left[ \begin{array}{cc} \nabla g(x) & 0 \\ 0 & \nabla h(x) \end{array} \right].$
Here we let $LaTeX: H_x[L(x,u,v)]$ be the Hessian of the Lagrangian (L) with respect to $LaTeX: x$ only. The southwest and northeast blocks are the constraint gradients because they give the cross derivative between $LaTeX: x$ and the Lagrange multipliers $LaTeX: (u,v)$.

# Bottleneck TSP problem

The problem asks if there a traveling salesman problem tour whose value is less than some given value?

# Bottleneck assignment problem

This is the assignment problem, where the "cost", $LaTeX: c_{i, j}$, is the rate at which person $LaTeX: i$ can perform job $LaTeX: j$. The objective is to maximize the rate of production of the entire assembly line.

# Bottleneck transportation problem

This is the transportation problem with a traversal time instead of arc cost, and the objective is to minimize the time it takes to satisfy the demands:

$LaTeX: \min \left\{ \max \{ t_{ij} : x_{ij} > 0 \} : x \ge 0, \sum_j x_{ij} \le s_i, \sum_i x_{ij} \ge d_j \right\},$

where $LaTeX: x_{ij}$ is the flow from source $LaTeX: i$ to destination $LaTeX: j$, and $LaTeX: t_{ij}$ is the time it takes to ship across arc $LaTeX: (i, j)$ (independent of the quantity shipped).

# Bounds consistency

A consistency property for variables over integer domains where consistency is only defined on the minimum and maximum values of the integer domain.

# Box constraint

Simple bounds of the form $LaTeX: L \le x \le U$.

The phenomenon that in a traffic network in which travel times depend on link (arc) flows the addition of a link (arc) can increase travel time for every user if all users minimize their own travel time. Hence, the paradox is that building a new road can make traffic congestion worse. See Myths and Counter Examples in Mathematical Programming for an example.

# Branch and bound

Born in integer programming (IP), branch and bound (B&B) is more generally used in global optimization, including continuous functions. A prototype is as follows.

1. Initialize set list with feasible region ($LaTeX: X$ plus all constraints), say $LaTeX: S$, with associated bounds $LaTeX: -\infty$, $LaTeX: \infty$ (finite bounds on the objective $LaTeX: f$ can be computed, if computationally preferable, and the initial set could be a superset of the feasible region). Initialize $LaTeX: f^*$ equal to the lower bound ($LaTeX: -\infty$, unless an initial feasible solution is known).
2. Branch by selecting the $LaTeX: k$-th set from list, say $LaTeX: S_k$, and splitting into a partition, say $LaTeX: S_k = A \vee B$ such that the interior of $LaTeX: A$ does not intersect the interior of $LaTeX: B$. (Could partition $LaTeX: S_k$ into more than 2 subsets.)
3. Bound $LaTeX: f$ in $LaTeX: A$ and/or in $LaTeX: B$, say $LaTeX: L(S) \le f(x) \le U(S)$ for all $LaTeX: x \in S$, for $LaTeX: S = A$ and/or $LaTeX: B$. Define $LaTeX: U(S) = -\infty$ if it is determined that $LaTeX: S$ is empty (in which case $LaTeX: S$ will be discarded in the next step).
4. Test if $LaTeX: U(S) \le f^* + t$, for $LaTeX: S = A$ and/or $LaTeX: B$. If not, put $LaTeX: S$ on list. (If the test passes, $LaTeX: S$ is discarded by not being put on the list. Note: $LaTeX: f^*$ is best solution value to date, and $LaTeX: t$ is a tolerance that says we want to get a solution whose objective value is within $LaTeX: t$ of the optimal value.) Further, if $LaTeX: L(S) > f*$ and the associated point, $LaTeX: x(S)$, is feasible (where $LaTeX: f(x(S)) = L(S)$), replace $LaTeX: f^*$ with $LaTeX: L(S)$ and $LaTeX: x^*$ with $LaTeX: x(S)$. If the list is not empty, go to branch.

B&B terminates when the list becomes empty, and the best feasible solution obtained is $LaTeX: x^*$ with value $LaTeX: f^*$, which is within $LaTeX: t$ of optimality. (If $LaTeX: f^* = -\infty$, then there is no feasible solution.) Notes:

1. A particular bounding rule for IP is to solve an LP relaxation, whose value is an upper bound on the integer-valued maximum. The lower bounds do not change unless an integer solution is obtained from the LP, in which case the best solution to date can change.
2. Branching rules can range from breadth-first, i.e. expand all possible branches from a node in the search tree before going deeper into the tree, to depth-first, i.e. expand the deepest node first (and backtrack, as necessary). For example, suppose we branch on the value of $LaTeX: x_j$, which could be $LaTeX: 0$ or $LaTeX: 1$. The breadth-first rule would solve the LPs for each of the two cases. The depth-first rule would solve for $LaTeX: x_j=0$ and postpone the complementary case, $LaTeX: x_j=1$ to consider another variable dichotomy, going deeper into the search tree. To illustrate, the following are search progressions in $LaTeX: \{0, 1\}^2$;.

         ____.                 ____.____          ____.____
|                     |         |        |         |
(x=0)                 (x=0)     (x=1)    (x=0)     (x=1)
/
(y=0)
1                     2                  3

____.____             ____.____          ____.____
|         |           |         |        |         |
(x=0)     (x=1)       (x=0)     (x=1)    (x=0)     (x=1)
/\                    /\         /       /\         /\
(y=0)(y=1)            (y=0)(y=1) (y=0)   (y=0)(y=1) (y=0)(y=1)
4                     5                  6



Depth-first:

          ____.                ____.              ____.
|                    |                  |
(x=0)                (x=0)              (x=0)
/                  /\
(y=0)              (y=0)(y=1)
1                     2                  3

____.____            ____.____          ____.____
|         |          |         |        |         |
(x=0)     (x=1)      (x=0)     (x=1)    (x=0)     (x=1)
/\                   /\         /       /\         /\
(y=0)(y=1)           (y=0)(y=1) (y=0)   (y=0)(y=1) (y=0)(y=1)
4                     5                  6


3. A best-first search is to select the node (i.e., set on list) that seems most promising, such as one with the greatest upper bound. Typically, there are multiple criteria for branching, and these change depending on whether a feasible solution has been obtained.
4. Backtracking need not go in exactly the reverse order (i.e., need not have a LIFO rule), but care must be taken when reordering the path.
5. When the problem is not finite (not IP), branching rules need to consider the hyper-volume of each set in a partition to ensure convergence.
6. The B&B methods we consider in OR are special cases of heuristic search in Artificial Intelligence (AI).

# Branch and cut

A hybrid of Branch and bound and cutting plane methods.

# Branch and price

The price operation is the same as column generation.

# Branching heuristic

See branching rule.

# Branching rule

A branching rule (or branching heuristic) specifies how a search tree is constructed during search. More specifically, it is a procedure or set of conditions for choosing the next variable to be branched on. It is part of the search strategy.

In constraint programming a branching rule is commonly referred to as a variable ordering heuristic.

# Breakpoint

A point where some property of a function changes, such as its slope. For example, a piece-wise linear, univariate function with $LaTeX: k$ breakpoints has the form:

$LaTeX: f(x) = a_i x + b_i \mbox{ for } c_i \le x < c_{i+1}$
where $LaTeX: i=1, ...,k$ and $LaTeX: c_{k+1}$ is defined to be infinity and $LaTeX: c_1$ is the least value of $LaTeX: x$. This function is continuous if it is equal at breakpoints, but the slope changes if $LaTeX: a_i \neq a_{i+1}$ for $LaTeX: i=1, ..., k-1$.

See BFGS update.

# Broyden family

This is a family of algorithms to solve an unconstrained nonlinear program where the objective function is in $LaTeX: C^2$;. The algorithms in this family differ by how they update the approximation of the hessian inverse (assumed to exist). These are of the form:

$LaTeX: H = aH^{\mbox{BFGS}} + (1-a)H^{\mbox{DFP}},$

with the two matrices being the Broyden-Fletcher-Goldfarb-Shanno update (BFGS) and the Davidon-Fletcher-Powell update (DFP), respectively. If $LaTeX: a$ is in $LaTeX: [0, 1]$, this is called the Broyden convex family.

# Bundle method

A class of nonsmooth-optimization methods for the numerical minimization of a convex function, $LaTeX: \theta : \mathbb{R}^n \rightarrow \mathbb{R}$. At the $LaTeX: k$th iteration the technique uses the available information contained in a bundle $LaTeX: (\theta(v^1), g_1, \ldots , \theta(v^k), g_k)$ that is obtained from an oracle; here each $LaTeX: g_i$ is a subgradient of $LaTeX: \theta$ at $LaTeX: v^i$.

Important applications are Lagrangian relaxation and column generation applied to a primal problem

$LaTeX: \max \{ p(x) : x \in X, \; h(x) = 0 \}.$

In this context, for a given $LaTeX: v \in \mathbb{R}n$,

$LaTeX: \theta(u) = \max \{ p(x) - v^T h(x) : x \in X \}$

results from the maximization of the Lagrange function; each $LaTeX: g_i$ is $LaTeX: -h(x_i)$, with $LaTeX: x^i$ maximizing the Lagrangian at $LaTeX: v^i$. Minimizing $LaTeX: \theta$ is the dual problem.

If minimizing $LaTeX: \theta(v)$, a basic method to construct the next iterate $LaTeX: v^{k+1}$ uses the cutting-plane paradigm: one minimizes the piecewise-linear function

$LaTeX: \theta_k(v) = \max \{ \theta (v^i) + g_i^T (v - v^i) : i = 1, ... ,k \},$

which can be viewed as a model of $LaTeX: \theta$ (in fact $LaTeX: \theta_k \le \theta$). Minimizing $LaTeX: \theta_k$ is a linear program, which is called the restricted master program in the context of column generation. One of the deficiencies of this method is instability -i.e. the sequence $LaTeX: v^k$ usually oscillates uncontrollably. Several possibilities exist to stabilize the process; one, adopted in the bundle approach, replaces the minimization of $LaTeX: \theta_k$ by the quadratic program

$LaTeX: \min_v \{ \theta_k(v) + (1/(2t)) |v - C|^2 \}.$

Here the positive $LaTeX: t$ is a stability parameter that is usually adjusted at each iteration, and the stability center $LaTeX: C$ is

some $LaTeX: v^i$ ($LaTeX: i \le k$) taken from the bundle, typically one that gave the best $LaTeX: \theta$-value.

# CPLEX

A collection of mathematical programming software solvers.

# Capacity expansion

This is an augmentation to any mathematical program that has constraints representing a capacity limit. Suppose $LaTeX: g(x) \le b$ represents the limitation of using more than $LaTeX: b$ units of capacity, where $LaTeX: g(x)$ is the actual amount used for policy $LaTeX: x$. Then, this becomes a capacity expansion model by replacing the constraint with

$LaTeX: g(x) - vC \le b \;\mbox{ and }\; 0 \le v \le 1.$
Here $LaTeX: C$ is the maximum amount of new capacity, with $LaTeX: vC$ the portion brought online by choice of $LaTeX: v$. This is also reflected in the objective function with an added cost term, say $LaTeX: F(v)$, such that $LaTeX: F(0)=0$ and $LaTeX: F$ increasing on $LaTeX: [0, 1]$. If continuous use of the capacity is not possible (i.e., one builds a new plant or not), the model further requires $LaTeX: v$ to be binary-valued (0 or 1), and $LaTeX: F(1)$ is called the fixed charge.

# Capital budgeting problem

In its elementary form there is a fixed amount of capital, say $LaTeX: C$, that can be allocated to any of $LaTeX: n$ investments. Each investment has a minimum level, say $LaTeX: L$, and a maximum level, say $LaTeX: U$. The expected return on investment is a function, $LaTeX: v_j(x_j)$, where $LaTeX: x_j$ is the level of the $LaTeX: j$-th investment opportunity ($LaTeX: L_j \le x_j \le U_j$). Risk is measured by a standard deviation from the expected return, say $LaTeX: s_j(x_j)$. The problem is to maximize total expected return, subject to a budget constraint: $LaTeX: \sum_j x_j \le C$, and a risk constraint: $LaTeX: \sum_j v_j(x_j) + a_j s_j(x_j)} \le b$, where $LaTeX: a_j$ and $LaTeX: b$ are parameters. The returns on the investments could be correlated. Then, if $LaTeX: Q$ is the variance-covariance matrix, the risk constraint is quadratic: $LaTeX: v^Tx + x^T Q x \le b$. (Also see the portfolio selection problem.)

# Caratheodory conditions

For the classical Lagrange form, $LaTeX: \min \{f(x): x \in \mathbb{R}^n, \; h(x) = 0 \}$, where $LaTeX: f$ and $LaTeX: h$ are smooth, the following conditions are necessary for a feasible $LaTeX: x$ to be optimal: there exists $LaTeX: (y_0, y) \in \mathbb{R}^{m+1}\backslash \emptyset$, called multipliers, such that

$LaTeX:

y_0 \nabla f(x) - y^T \nabla h(x) = 0.

$

This reduces to the Lagrange Multiplier Rule when $LaTeX: y_0$ is not zero (divide by $LaTeX: y_0$), which must be the case if $LaTeX: \nabla h(x)$ has full row rank.

# Caterer problem

A caterer has booked his services for the next $LaTeX: T$ days. He requires $LaTeX: r_t$ fresh napkins on the t-th day, $LaTeX: t=1,\ldots,T$. He sends his soiled napkins to the laundry, which has 3 speeds of service: $LaTeX: s=1, 2$, or $LaTeX: 3$ days. The faster the service, the higher the cost, $LaTeX: c_s$, of laundering a napkin. He can also purchase new napkins at a cost, $LaTeX: c_0$. With an initial stock of $LaTeX: N$ napkins, the caterer wishes to minimize his total cost. (This can be formulated as a transportation problem.)

# Cauchy-Schwarz inequality

This is inequality

$LaTeX: |x^T y| \le \|x\| \|y\|,$

where $LaTeX: \|*\|$ is the Euclidean norm. (Note: for $LaTeX: \|x\| = \|y\| = 1$, $LaTeX: x^T y$ is the cosine of the angle between vectors $LaTeX: x$ and $LaTeX: y$.) .

# Ceiling

The integer round-up of a value, $LaTeX: x$:

$LaTeX: \mbox{Ceiling}(x) = \min \{ z : z \in \mathbb{Z}, \; z \ge x \}.

$

Examples: $LaTeX: \mbox{Ceiling}(5)=5$, $LaTeX: \mbox{Ceiling}(5.001)=6$, $LaTeX: \mbox{Ceiling}(-1.2) = -1$. Also, see floor.

# Central path

A differentiable curve where each point is the analytic center of a polyhedron associated with a linear program. Specifically, suppose we have the primal-dual pair:

$LaTeX: P: \; \max \left\{c^Tx : x \ge 0, \; Ax \le b \right\}.$
and
$LaTeX: \; D: \; \min \left\{ y^Tb : y \ge 0, \; A^Ty \ge c \right\}.$

Let $LaTeX: s = b - Ax$ be the primal slack. Then, the primal central path is:

$LaTeX: x^*(\mu) = \arg\max \left\{c^Tx + \mu \left(\sum_j \ln(x_j) + \sum_i \ln(s_i)\right) : x > 0, \; Ax + s = b, \; s > 0 \right\},$
for $LaTeX: u > 0$, where $LaTeX: \mu$ is the penalty parameter. Let $LaTeX: d = -(c - A^y)$ be the dual surplus. Then, the dual central path is:

$LaTeX: y^*(u) = \arg\min \left\{ y^Tb - \mu \left(\sum_j \ln(d_j) + \sum_i \ln(y_i)\right): y > 0, \; A^Ty - d = c, \; d > 0 \right\},$
for $LaTeX: u > 0$. A key fact is that if the LP optimality region is nonempty and bounded, then the central path converges to its analytic center as $LaTeX: \mu \downarrow 0$.

# Certificate

A set of sufficient conditions for some property to hold.

A certificate of optimality is a set of conditions that certify some feasible solution, $LaTeX: x$, is optimal. One example comes from duality: let $LaTeX: y$ be feasible in any dual with objective value $LaTeX: F(y)$. Then, $LaTeX: F(y)=f(x)$ is a certificate of optimality. In particular, if we have a linear program:

$LaTeX: \max \{ cx : Ax = b, x \ge 0,$

a certificate that a feasible $LaTeX: x$ is optimal is the set of dual conditions:

$LaTeX: A^Ty \ge c \;\;\mbox{ and }\;\; c^Tx = y^T b.$

(Note the general use of duality does not require any convexity assumption, and the dual can be weak.)

A certificate of unboundedness is a collection of conditions that certify a mathematical program is unbounded. An example from duality is that a primal feasible solution is found and the dual is determined to be infeasible. In particular, if we have a linear program:

$LaTeX: \max \{cx : Ax = b, \; x \ge 0 \},$

a certificate that this is unbounded is the existence of a feasible x and the determination that $LaTeX: A^Ty \ge c$ implies a contradiction.

A certificate of infeasibility is a set of conditions that certify a mathematical program is infeasible. One class comes from duality: a dual sequence is found whose objective diverges. In particular, if we have a linear program:

$LaTeX: \max \{ c^T x : Ax = b, \; x \ge 0\},$

then a certificate that this is infeasible is the existence of a sequence $LaTeX: y^k$ such that $LaTeX: A^T y^k \ge c$ for all $LaTeX: k$ and $LaTeX: y^k \rightarrow \infty$ as $LaTeX: y \rightarrow \infty$.

# Character of solution

The idea is to define character by solution status. In linear [integer] programming, columns might be partitioned into

• basic vs. nonbasic at Lower bound vs. nonbasic at Upper bound
• positive in some optimal solution vs. zero in every optimal solution

Rows might be partitioned into

• inactive vs. active
• non-binding vs. binding
• positive dual price in some optimal solution vs. zero dual price in every optimal solution

Only some rows and columns might be partitioned, and others are free to change. The character is the property we want to fix by the partition. For example, if the LP has activities for operating a plant, we might want to specify the plant operation as a character of the solution, never mind specific levels (or maybe some minimal throughput is specified).

Then, one conducts sensitivity analysis by asking for preservation of the character. Classical LP sensitivity analysis does this by preserving the optimality of the basis; interior point analysis seeks to preserve the optimality of the partition. Both are special cases, and one might want to focus on only a portion of the LP. Here are some properties of interest:

1. If the character holds at [c', b'] and at [c", b"], it holds for all [c, b] in their line segment.
2. The stability region is a convex set.

This notion of character extends naturally to combinatorial optimization, such as preserving a key portion of a TSP tour or job-order in a schedule.

# Characteristic cone

The Characteristic cone of a polyhedron, $LaTeX: P$ is

$LaTeX: \{y: x + y \in P \;\mbox{ for all }\; x \in P\}.$

If $LaTeX: P = \{x : Ax \le b\}$, then its characteristic cone is $LaTeX: \{y: Ay \le 0\}$, which is the same as the recession cone.

# Chemical equilibrium problem

The problem is

$LaTeX: \min \left\{ c^Tx + \sum_j \left(x_j \ln(x_j) \right) : x > 0, \; \sum\limits_j x_j = M, \; Ax=b \right\},$
where the objective is the Gibbs free energy function with $LaTeX: x_j$ being the number of moles of species $LaTeX: j$, $LaTeX: b_i$ being the atomic weight of atom of type $LaTeX: i$, and $LaTeX: A_{i,j}$ being the number of atoms of type $LaTeX: i$ in one mole of species $LaTeX: j$. The equation, $LaTeX: Ax=b$, is mass balance. (Its significance is that it was an early application to show how an LP approach can apply to a nonlinear mathematical program.)

# Chinese postman problem

A mailman starts from the post office, delivers the mail on his route, covering each street at least once, then returns to the post office. The problem is to find a traversal such that the total distance travelled is minimized. (Some streets are one-way, and the postman is driving.) Abstractly, the problem is to find a least-cost cycle in a graph that uses each edge and arc (forward direction only) at least once. This differs from the Traveling Salesman Problem in that the postman can traverse an edge more than once. A related problem is to determine whether the postman can cover his route within a prescribed distance (while starting and ending at his post office). If so, the cycle is produced. This is computationally equivalent to the minimization problem (in a complexity sense), and is sometimes taken as the problem definition.

# Chinese remainder theorem

Let $LaTeX: m_1, m_2, \ldots, m_k$ be relatively prime positive integers, and let $LaTeX: b_1, b_2, \ldots, b_k$ be any integers. Then, there exists $LaTeX: x$ such that $LaTeX: x = b_i(\hspace*{-8pt}\mod x_i)$ for each $LaTeX: i$. The value of $LaTeX: x$ is uniquely determined by $LaTeX: m_1, m_2, \ldots, m_k$ and $LaTeX: b_1, b_2, \ldots, b_k$.

# Cholesky factorization

Given an $LaTeX: m \times m$ symmetric matrix $LaTeX: A$, a lower triangular matrix, $LaTeX: L$, is obtained, called the Cholesky factor, such that $LaTeX: A = LL^T$. This is particularly useful for solving linear systems, $LaTeX: Ax=b$, by using forward substitution, $LaTeX: Ly=b$, then backward substitution, $LaTeX: L^T x = y$. The original algorithm assumes $LaTeX: A$ is positive definite, but it can apply more generally. This is also called Cholesky decomposition.

# Chvatal cut

A cutting plane of the form,

$LaTeX: \sum_j F(A_j) x_j \le F(b)$,

where $LaTeX: F$ is a Chvatal function.

# Chvatal function

This class is defined recursively:

• $LaTeX: F$ is a Chvatal function (of rank 0) if it is a linear function.
• If $LaTeX: F$ and $LaTeX: G$ are Chvatal functions, then $LaTeX: aF+bG$ is a Chvatal function for any nonnegative $LaTeX: a$ and $LaTeX: b$, and its rank is $LaTeX: \max \{\mbox{rank}(F), \mbox{rank}(G)\}$. (Some require $LaTeX: a$ and $LaTeX: b$ to be rational.)
• If $LaTeX: F$ is a Chvatal function of rank $LaTeX: r$, then $LaTeX: | F |$ is a Chvatal function, and its rank is $LaTeX: r+1$ if $LaTeX: | F |$ is not equal to $LaTeX: F$.

This arises in integer linear programming in several ways, see Gomory function.

# Closed form solution

This is where one could express a solution (e.g., optimal) by an equation of the

form:

$LaTeX: x^* = F(p),$

where $LaTeX: F$ is some function of the parameters, $LaTeX: p$. This is typically meant to suggest that no algorithm is needed, but one must be careful since $LaTeX: F$ is not limited in any way, such as being "easy" to evaluate. For example, under mild assumptions, the coordinates of a global maximum of a function, $LaTeX: f$, on the domain $LaTeX: X$, is given by the following integral:

$LaTeX: \displaystyle x^*_j = \lim_{w \rightarrow \infty} \frac{\int_{X} x_j e^{wf(x)} dx} {\int_{X} e^{w f(x)} dx}.$

(The idea is to weight "mass" at the global maximum, and it is valid if the set of global maxima is convex.) This is not too useful, and the integral would typically have to be evaluated with a numerical method - i.e., an algorithm.

# Closed function

In convex analysis, this is where its epigraph is a closed set. (If f is concave, it is its hypograph that must be closed.)

# Closed map

Let $LaTeX: A: X \rightarrow 2^X$ be a point-to-set map and suppose $LaTeX: x^k \rightarrow x$ and $LaTeX: y^k \rightarrow y$ are such that $LaTeX: y^k \in A(x^k)$ for all $LaTeX: k$. Then, $LaTeX: A$ is closed at $LaTeX: x$ if $LaTeX: y \in A(x)$. The map $LaTeX: A$ is closed on a subset of $LaTeX: X$, say $LaTeX: S$, if $LaTeX: A$ is closed at each $LaTeX: x$ in $LaTeX: S$.

# Closed set

A set is closed if it contains all its limit points, i.e. if $LaTeX: x^k \rightarrow x$ and each $LaTeX: x^k$ is in $LaTeX: X$, then $LaTeX: x$ is in $LaTeX: X$. By the logic of the definition, an empty set is closed (it is also open). A finite number of intersections of closed sets is a closed set, so the feasibility region of a mathematical program in standard form is closed if each constraint function is continuous. ($LaTeX: g$ can be only upper semi-continuous for the constraint, $LaTeX: g(x) \le 0$.) An example of a set that is not closed is: $LaTeX: {x: x < 0}$. The point 0 can be approached by a sequence, say $LaTeX: x^k = 1/k$, but 0 is not in the set. The closure of a set is the union of the set and all of its limit points.

# Closure condition

The conditinos that the closure of a nonempty strict interior be equal to the level set: $LaTeX: \mbox{cl}( \{x \in X: g(x) < 0 \} = \{x \in X: g(x) \le 0\}$. Here is an example where the closure condition fails:

$LaTeX: g(x) = \left\{ \begin{array}{cl}

 -1-x, & x < -1 \\ 0, & -1 \le x < 0 \\ -1 + (x-1)^2, & 0 \le x < 2 \\ x-2, & 2 \le x. \end{array} \right.

$

Note that $LaTeX: g$ is continuous and quasiconvex. Its level set is $LaTeX: \{ x: g(x) \le 0 \} = [-1, 2]$, but the strict interior, is $LaTeX: (0, 2)$, so its closure is only $LaTeX: [0, 2]$. We lose the flat portion, $LaTeX: [-1, 0)$.

This is important in the use of interior point methods and in stability. Here are some functions that satisfy the closure condition:

When equality constraints are present, there are two forms of extension of the closure condition: to consider the relative strict interior, and to consider "feasible sequences". The first generally assumes $LaTeX: h$ is affine, so the closure condition becomes:

$LaTeX: \mbox{cl}( \{ x: Ax=b, \; g(x) < 0 \}) = \{ x: Ax=b, g(x) \le 0\}.$
The second defines a family of maps:

$LaTeX: I_i - = \{ x \in X: g(x) < 0, \; h_k(x) = 0 \; \mbox{ for } k \neq i, \; h_i(x) < 0 \}$

and

$LaTeX: I_i+ = \{ x \in X : g(x) < 0, \; h_k(x) = 0 \; \mbox{ for } k \neq i, \; h_i(x) > 0 \}.$

Then, the closure condition is that $LaTeX: I_i-$ and $LaTeX: I_i+$ are not empty, and

$LaTeX: \mbox{cl}(I_i-) = \{ x \in X : g(x) \le 0, \; h_k(x) = 0 \; \mbox{ for } k \neq i, \; h_i(x) \le 0 \}$

and

$LaTeX: \mbox{cl}(I_i+) = \{ x \in X : g(x) \le 0, \; h_k(x) = 0 \; \mbox{ for } k \neq i, \; h_i(x) \ge 0 \}.$

# Coercive function

The function $LaTeX: f:\mathbb{R}^n \rightarrow \mathbb{R}^n$ is coercive with respect to $LaTeX: X \subseteq \mathbb{R}^n$ if there exists a vector $LaTeX: x^0 \in X$ such that for any sequence $LaTeX: x^k \in X$ having the property that $LaTeX: \|x^k \| \rightarrow \infty$, we also have

$LaTeX: \lim_{k \rightarrow \infty} | (f(x^k) - f(x^0))^T (x^k - x^0) | = \infty,$
where any norm can be used. This arises in variational inequalities (and complementarity problems).

Some people use a different definition, where $LaTeX: f : \mathbb{R}^n \rightarrow \mathbb{R}$ (i.e., a scalar, real-valued function):

$LaTeX: |f(x)| / \| x \| \rightarrow \infty, \;\mbox{ as }\; \|x\| \rightarrow \infty$
Note that the two definitions differ, even for $LaTeX: n=1$. For example, $LaTeX: f(x)=|x|$ is coercive under the first definition and is not under the second. For the bilinear function, $LaTeX: f(x,y)=x^TAy$, for some square matrix $LaTeX: A$, $LaTeX: f$ is coercive if there exists $LaTeX: a > 0$ such that
$LaTeX: y^TAy \ge a||y||^2.$

# Column generation

This pertains to solving a linear program whose columns are generated during pricing. Typically, the number of columns is astronomically large, possibly infinite. An example is when solving the randomized program, as with the Generalized Lagrange Multiplier method. In that case, column generation consists of maximizing the Lagrangian. A similar view applies to Dantzig-Wolfe decomposition. From the dual view, this is a cutting plane method since generating a column in the primal corresponds to generating a constraint in the dual.

The concept applies to any mathematical program, and the randomized program model hightlights the duality and how a completely general mathematical program can be considered with (generalized) linear programming.

# Combinatorial program

Let $LaTeX: N = \{1, ..., n\}$ and consider a finite collection of subsets, say $LaTeX: {S_1, S_2, ..., S_m}$. For each subset there is an objective function value, $LaTeX: f(S_k)$, and the problem is to optimize $LaTeX: f(S_k)$. Typically, the feasible subsets are represented by inclusion or exclusion of members such that they satisfy certain conditions. This then becomes a special class of integer programs whose decision variables are binary valued: $LaTeX: x_{i,k}=1$ if the i-th element is in $LaTeX: S_k$; otherwise, $LaTeX: x_{i,k}=0$. (IP formulations are not always easy, and often there is more than one formulation, some better than others.)

Also see integer program.

# Compact formulation

In integer programming, this refers to having a polynomial number of constraints. For example, look at the travelling salesman formulations. The linear form has an exponential number of "subtour elimination consgtraints," so it is not compact. The

# Compact set

This has a general meaning in mathematical analysis. For mathematical programming, where we work in finite-dimensional Euclidean vector spaces, it means the set is closed and bounded. This is often an assumption about the feasibility region in order to ensure the existence of an optimum value for a continuous objective function – see Weierstrass' theorem. Any finite set is compact. An open interval $LaTeX: (a,b)$ is not compact because its endpoints, $LaTeX: a$ and $LaTeX: b$ are limit points but are excluded from the set.

# Compatibility theory

The idea that a solution's character does not change for a particular perturbation. In linear programming the character could be an optimal basis, and the theory is concerned with whether a particular basis remains optimal when the data is changed in a prescribed direction. A Fundamental Theorem of Basis Compatibility is the following:

$LaTeX: h$ is an admissible direction for perturbing $LaTeX: (b, c)$ if, and only if, it is compatible with some equilibrium basis.

The range of compatiblity of a basis, $LaTeX: B$, for a direction, $LaTeX: h$, is the greatest step for which $LaTeX: B$ remains optimal:

$LaTeX: \sup \{t : B \;\mbox{ is optimal for the LP defined by }\; r + th\}.$

The basis spectrum is the greatest range:

$LaTeX: \sup\{ \mbox{range}(B; h) : B \;\mbox{ is optimal for the LP defined by }\; r\}.$

# Complementarity condition

Satisfying complementary slackness.

# Complementarity problem

Let $LaTeX: F: \mathbb{R}^n \rightarrow \mathbb{R}^n$. The complementarity problem (CP) is to find $LaTeX: z \ge 0$ such that $LaTeX: F(z) \ge 0$ and $LaTeX: F(z)z' = 0$. It is complementary because every solution has the property that either $LaTeX: z_j=0$ or $LaTeX: F_j(z)=0$ (or both) for each j. The linear complementarity problem (LCP) is when $LaTeX: F(z)=Mz+q$.

The problem generalizes to allow bounds so that $LaTeX: L \le z \le U$. Then, $LaTeX: F(z)$ is required to satisfy:

$LaTeX: F_j(z) \ge 0$ if $LaTeX: z_j = L_j$

$LaTeX: F_j(z) \le 0$ if $LaTeX: z_j = U_j$

$LaTeX: F_j(z) = 0$ if $LaTeX: L_j < z_j < U_j$.

# Complementary slackness

The condition that two non-negative vectors are orthogonal. It arises in the Kuhn-Tucker conditions, where the Lagrange multiplier, say $LaTeX: y$, must be orthogonal to the (inequality) constraint functional value: $LaTeX: y^T g(x)=0$. This means either $LaTeX: y_i=0$ or $LaTeX: g_i(x)=0$ for each $LaTeX: i$ -- that is, if a constraint is not active, its Lagrange multiplier must be zero.

# Complementary variables

If $LaTeX: u$ and $LaTeX: v$ are the two vectors that satisfy complementary slackness, each pair, $LaTeX: u_i$ and $LaTeX: v_i$ are called complementary variables. In linear programming, primal levels and dual prices are complementary variables.

# Complexity

A measure of computer time or space to solve a problem by an algorithm as a function of the problem's dimensions. Suppose $LaTeX: T(n)$ is the time it takes to solve an instance of the problem with dimension $LaTeX: n$. Then, the algorithm has (worst-case) time complexity $LaTeX: K(n)$, if the greatest time it could take to solve an instance of the problem is $LaTeX: O(K(n))$. When $LaTeX: K(n)$ is a polynomial, we say the algorithm has polynomial time complexity. The Klee-Minty polytope shows that the elementary simplex method does not have polynomial time complexity.

The average time complexity is the average (rather than worst) time of an algorithm (or class of algorithms) over some class of problems, for which some probability distribution is assumed. Absent a modifier, the complexity of an algorithm is taken to mean its worst-case time complexity.

Whereas complexity refers to the performance of an algorithm, see the notion of NP-completeness for the related meaning of problem complexity. The standard notation to describe complexity in terms of problem dimensions, say $LaTeX: n$, is $LaTeX: O(K(n))$. This big-O notation means the following: a function, $LaTeX: f:Z^+ \rightarrow R$, is $LaTeX: O(K(n))$ if there exists a constant, $LaTeX: c$, and $LaTeX: N$ in $LaTeX: Z^+$, such that $LaTeX: f(n) \le cK(n)$ for all $LaTeX: n \ge N$. For example, if an algorithm requires $LaTeX: 5n^3 + 2n + 10$ fundamental operations on a problem of size $LaTeX: n$, its time complexity is $LaTeX: O(n^3)$. See the supplement on complexity for more information.

# Complicating variables

Those variables which, when fixed at particular values, render the remaining mathematical program relatively easy to solve. One example is the use of binary variables for fixed charges in an otherwise linear program. More generally, see Benders' decomposition.

# Component

In a graph (or network), this is a maximal connected subgraph. (The number of components is 1 if the graph is connected.)

# Composite concave program

A global optimization problem whose objective is a composite concave function.

# Composite function

A function of a function, of the form $LaTeX: g(f(x))$.

# Concave function

The real valued function $LaTeX: f$ is concave if its domain, say $LaTeX: X$, is a convex set, and for $LaTeX: x, y \in X$ and $LaTeX: 0 \le a \le 1$:

$LaTeX: f(ax + (1-a)y) \ge af(x) + (1-a)f(y).$
Equivalently, $LaTeX: f$ is concave if its hypograph is a convex set or if its negative is a convex function.

# Condition number

This is $LaTeX: \|A\| \|A^{-1}\|$ when $LaTeX: A$ is nonsingular and $LaTeX: \|\cdot\|$ is some matrix norm. This arises in convergence analysis, where $LaTeX: A$ is the hessian. Whenever $LaTeX: A$ is symmetric and positive definite (as the hessian of a strictly convex function), and the matrix norm is that induced by the Euclidean norm (i.e., $LaTeX: \|A\| := \max \{\|Ay\|: \|y\|=1\}$), the condition number is the ratio of the extreme values of the eigenvalues. It is often used to measure the convergence rate of an ascent algorithm, due to Kantorovich's inequality. (Unfortunately, it is erroneously used to measure the goodness of an algorithm – see Myths and Counterexamples in Mathematical Programming.)

# Cone

A set, $LaTeX: C$, with the property that if $LaTeX: x \in C$, then $LaTeX: a x \in C$, for all positive real $LaTeX: a$. A convex cone is a cone that is also a convex set. Equivalently, $LaTeX: C$ is a convex cone if $LaTeX: C = C + C$. (An example of a cone that is not convex is the union of the axes.) A polyhedral cone is a cone that is also a polyhedron; equivalently, $LaTeX: C$ is a polyhedral cone if there exists a matrix $LaTeX: A$ such that $LaTeX: C = \{ x : Ax \le 0\}$. An example of a cone that is not polyhedral is $LaTeX: \{(x,y,z): x^2+y^2-z^2=0\}$.

A quadratic cone is of the form $LaTeX: \{x: x^T Q x \le 0\}$, where $LaTeX: Q$ is any (real) matrix. If $LaTeX: Q$ is negative semi-definite, the cone is all of $LaTeX: \mathbb{R}^n$ space. If $LaTeX: Q$ is positive definite, the cone is just the origin, $LaTeX: \{0\}$. So, quadratic cones usually arise when $LaTeX: Q$ is indefinite. Example: $LaTeX: \{x : x \ge 0, \; 2x_1x_2 \ge \|(x_3,...,x_n)||^2\}$. See each of the following special cones:

# Cone of optimality

Given an LP in standard form, let $LaTeX: B$ be an optimal basis (for which the dual is feasible). Then, the cone of optimality of $LaTeX: B$ is $LaTeX: {b : B^{-1}b& \ge 0}$. It is so called because $LaTeX: B$ remains an optimal basis for any $LaTeX: b$ in this cone.

# Conic program

The standard form is

$LaTeX: \min \{ c^T x : Ax=b, \; x \in K\}$,

where $LaTeX: K$ is a cone (not necessarily convex). This is more general than it looks. Suppose $LaTeX: S$ is any non-empty set, and we have the mathematical program:

$LaTeX: \min \{ c^T x : x \in S\}$.

Then, defining $LaTeX: K = \{t(1,x): x \in S, \; t \ge 0\}$, we have that the above problem is equivalent to the conic program:

$LaTeX: \min \{ c^Tx : (x_0, x) \in K, \; x_0 = 1 \}$.

This problem class also includes the semi-definite program, as $LaTeX: x$ could be a matrix.

# Conjugate directions

Direction vectors chosen in an algorithm that are conjugate with previous directions, with respect to the hessian of the objective. For the unconstrained quadratic, $LaTeX: x^T Qx + c^Tx$ (where $LaTeX: Q$ is symmetric), direction vector $LaTeX: d$ is conjugate if $LaTeX: d^T Q d^k = 0$ for all previous direction vectors, $LaTeX: d^k$.

# Conjugate duality

See the supplement on particular duals.

# Conjugate function

The convex conjugate of $LaTeX: f$ on $LaTeX: X$, denoted $LaTeX: f^*$ on $LaTeX: X^*$, is the greatest convex approximation from below:

$LaTeX: f^*(x^*) = \sup \{ x^T x^* - f(x): x \in X\}$,

and $LaTeX: X^* = \{x^*: f^*(x^*) < \infty\}$, i.e. $LaTeX: X^*$ is the effective domain of $LaTeX: f^*$). The concave conjugate of $LaTeX: f$ on $LaTeX: X$, denoted $LaTeX: f^{\wedge}$ on $LaTeX: X^{\wedge}$, is the least concave approximation from above:

$LaTeX: f^{\wedge}(x^*) = \inf\{ x^T x* - f(x): x \in X\}$,

and $LaTeX: X^{\wedge} = \{x^*: f^{\wedge}(x^*) > -\infty}$. This is a foundation for Lagrangian duality, viewed in response space.

An ascent method for unconstrained optimization such that successive directions are conjugate with respect to the hessian for a quadratic objective function.

# Conjugate vectors

With respect to a symmetric matrix, $LaTeX: A$, $LaTeX: u$ and $LaTeX: v$ are conjugate if $LaTeX: u^TAv = 0$. (In particular, if $LaTeX: A=I$, $LaTeX: u$ and $LaTeX: v$ are conjugate if they are orthogonal.)

# Conjunctive normal form

A logical expression that is written as a conjunction,

$LaTeX: C_1 \wedge C_2 \wedge \ldots \wedge C_m$,

where $LaTeX: \wedge$ is the conjunction (logical and). Each $LaTeX: C_i$ is called a clause and has the form

$LaTeX: L_{i_1} \vee L_{i_2} \vee \ldots \vee L_{i_n}$,

where $LaTeX: \vee$ is the disjunction (logical inclusive or) and each $LaTeX: L_i$ is a literal -- a proposition or its negation.

# Connected network

In the undirected case, a graph is connected if, for any pair of nodes, there is a path that contains both of them. In a directed graph, or network, the path may be required to be directed (i.e., follow the orientations of the arcs), in which case the network is strongly connected; or, the path may be allowed to ignore the arc orientations, in which case the network is weakly connected.

# Consistency

In general, a logical theory is consistent, if it contains no contradictions.

In constraint programming, a (partial) variable assignment is called consistent with respect to a set of constraints if none of the constraints is violated.

More generally, in CP, different definitions of consistency are used as a basis for inference. See:

# Consistent

Pertains to a system of equations and inequalities, for which there exists a feasible solution.

# Constraint

Usually this is a relation of the form of an inequality, g(x) <= 0, or equation, h(x)=0. More generally, it can be any restriction the decision variables must satisfy. For example, some regard "x must be integer-valued" as a constraint, while others would say that this is simply the domain of the decision variables in the mathematical program. There are other relations, such as the logical form of a precedence constraint: IF x=0 THEN y=0.

In constraint programming, a constraint is an arbitrary relation imposed on a set of variables $LaTeX: X = \{x_1, \dots, x_n\}$ where each variable must be assigned on value from its domain. A constraint holds if all variables in $LaTeX: X$ are assigned values from their domain and the relation is satisfied.

# Constraint graph

Constraint graphs are used to conceptualize the set of constraints and their scopes in a CSP.

A primal constraint graph represents variables by nodes and associates an arc with any two nodes in the scope of in the same constraint.

A dual constraint graph represents each constraint with a node and associates a labeled arc with any two nodes whose scopes share variables. The arcs are labeled by the shared variables.

# Constraint optimization problem

The general aim for constraint optimization problem is to find an optimal solution to a set of constraints subject to some objective function obj. For example, consider a CSP $LaTeX: P = \{ C, x_1 \in D_1, \dots, x_n \in D_n\}$ together with a function $LaTeX: obj : Sol \to \mathbb{R}$. $LaTeX: obj$ is a mapping from solutions to $LaTeX: P$ (i.e., all elements the set $LaTeX: Sol$) to the set of real numbers, $LaTeX: \mathbb{R}$.

A solution, $LaTeX: d$, to a constraint optimization problem is a complete assignment of variables that satisfies all constraints $LaTeX: C \in P$ and for which the value $LaTeX: obj(d)$ is optimal (either minimal or maximal, depending on the sense of the optimization).

# Constraint programming

Constraint Programming (CP) is a paradigm to solve satisfaction or optimization problems by a two-step approach: propagation and tree search. Given a CSP the decision variables’ domains are tightened by first propagating the constraints. If after this step all domains contain only one value then a solution has been found. If at least one domain is empty, the CSP has no solutions. Otherwise, the second step, tree search, is initiated: by specifying a variable ordering heuristic and a value ordering heuristic, a search tree is constructed which is then traversed according to one of various search strategies. If the problem is a satisfaction problem, the first solution found is returned or it is proved that no solution exists. For an optimization problem the whole tree is traversed in a branch-and-bound search and the best solution returned.

# Constraint propagation

Constraint propagation is a general term for inference techniques used in constraint programming. The basic technique consists in explicitly forbidding values or combinations of values for some variables of a problem because a given subset of its constraints cannot be satisfied otherwise. Constraint propagation appears under different names depending on both historical period and author including constraint relaxation, constraint inference, local consistency enforcing, and rule iteration.

Typically, the propagation process is repeated until a "fixed point" is reached where no further values can be removed from the domain of any variable.

# Constraint propagator

A constraint propagator is an algorithm (often associated with a global constraint) which implements a constraint propagation algorithm. Typically, the algorithm prunes the domains of the variables involved in the constraint, removing values for which it can be shown that they participate in no solutions given the current search node.

Constraint propagator is also now as filtering algorithm.

# Constraint qualification

Conditions on the constraint functions ($LaTeX: g$ and $LaTeX: h$) that are sufficient to make the Lagrange Multiplier Rule valid. Here is an example to illustrate what can go wrong:

$LaTeX: \max x : x^2 \le 0$.

Since $LaTeX: x=0$ is the only feasible solution, it is optimal. The Lagrange Multiplier Rule requires that there exist $LaTeX: u$ for which $LaTeX: f' - u^T g' = 0$, but $LaTeX: f' = 1$ and $LaTeX: g'= 0$, so no such $LaTeX: u$ exists. Slater used this example in illustrating his interiority condition. The classical qualification, given by Lagrange's multiplier theorem without inequality constraints, is that $LaTeX: \nabla h(x)$ have full row rank, which stems from the Implicit Function Theorem. Another constraint qualification is that all constraint functions be affine (even with redundant constraints). Each constraint qualification gives a sufficient condition for the Lagrange Multiplier Theorem to be valid. A constraint qualification is necessary if it must hold in order to guarantee that the Lagrange Multiplier Rule is valid for all smooth $LaTeX: f$ having optimal value at $LaTeX: x$.

# Constraint satisfaction problem

A classical Constraint Satisfaction Problem (CSP) is defined by a finite set of variables $LaTeX: X = \{x_1, \dots, x_n\}$ where each variable $LaTeX: x_i$ is associated with a finite domain of integers $LaTeX: D_i= \{1, \dots, N_i\}$. Furthermore, a finite set of constraints $LaTeX: \mathcal{C} = \{C_1, \dots, C_m \}$ is imposed on the variables, where a constraint $LaTeX: C_i$ is an arbitrary relation on a subset of variables $LaTeX: X$. A solution to the CSP is a value assignment to each variable such that the value lies within the respective domain and all constraints are satisfied.

There exist different variants and sub-classes of constraint satisfaction problem:

# Constraint solver

A constraint solver typically consists of two parts:

A solution is found as soon as every variable is assigned a value from its domain and all constraints are satisfied.

A constraint solver can typically search for one solution (returning the first solution it finds, if one exists) or for all solutions.

# Continuous program

A mathematical program with continuous variables. Moreover, the term is commonly used to further imply that the objective and constraint funtions are continuous, an assumption that allows certain analytical results. For example, Weierstrass first proved that continuous functions attain both a maximum and a minimum over a compact set. If the problem contains both continuous and integer variables, then the problem is said to be a mixed integer program.

# Contour

Also called an isoquant of a function, $LaTeX: f$. The projection of its graph into its domain: $LaTeX: \{x \in X : f(x) = c\}$, where $LaTeX: c$ is the contour value.

# Contraction map

Also called a contractor. A function that brings points closer together, forming the foundation for convergence by successive approximation. Mathematically, we have $LaTeX: f: X \rightarrow X$, where $LaTeX: X$ is in a normed space, with norm of $LaTeX: x$ denoted by $LaTeX: \|x\|$. $LaTeX: f$ is a contractor if there exists a constant $LaTeX: K$ in $LaTeX: [0,1)$ such that $LaTeX: \|f(x)-f(y)\| \le K \|x-y\|$ for all $LaTeX: x$ and $LaTeX: y$ in $LaTeX: X$. See the supplement on fixed points.

# Convergence

Specifically, convergence of an algorithm. The algorithm is represented as the point-to-set map, $LaTeX: x' \in A(x)$, where there is some selection function to choose $LaTeX: x'$ if $LaTeX: A(x)$ has more than one member. Convergence means that the sequence, $LaTeX: x^k$, has a limit point, say $LaTeX: x$, such that $LaTeX: x$ satisfies certain conditions. Ideally, the conditions are that $LaTeX: x$ is an optimal solution, at least locally, but this is often not the definition used in a convergence theorem. (For example, in unconstrained optimization, most algorithms converge to a stationary point, which need not be even a local optimum.)

Let $LaTeX: \textstyle s^k \rightarrow 0$, where $LaTeX: s^k$ is a scalar series pertaining to the sequence $LaTeX: x^k$. Typically, $LaTeX: s^k = \|x^k - x\|$, which is sometimes called policy convergence (to a solution, $LaTeX: x$). We could also have $LaTeX: s^k = f(x^k) - f^*$, where $LaTeX: f$ is the objective function, in which case the concern is with value convergence to an optimal value, $LaTeX: f^*$.

Some related terms and concepts:

• Dual convergence. Dual values, notably Lagrange multipliers, converge to a dual solution.
• Geometric convergence. Same as linear convergence, but usually used when the sequence is exactly a geometric progression: $LaTeX: s^k = r^k(s^0)$ - $LaTeX: r^k$ is $LaTeX: r$ to the power of $LaTeX: k$
• Global convergence. Usually means the same as globally convergent, but some mean that the algorithm convergences to a global solution.
• Globally convergent. Convergence to a solution from any starting point.
• Linear convergence. Order is 1 and rate is less than 1 (see below).
• Local convergence. Some mean locally convergent, but some mean that the limit point is a local optimum (or just satisfies necessary conditions, see Myths and Counter Examples in Mathematical Programming to avoid misconception).
• Locally convergent. There exists an open neighborhood of 0 such that {s^k}-->0 from any s^0 in the neighborhood.
• Order of convergence. The order is $LaTeX: \sup\{ p : \lim_{k \rightarrow \infty} \|s^{k+1}\| / \|s^k\|^p < \infty \}$. The order being 1 is linear convergence and the order being 2 is quadratic convergence. Most (non-finite) algorithms to solve mathematical programs are between linear and quadratic.

Let $LaTeX: v^k = \|s^k\|$ and suppose $LaTeX: v^0 = 1$ (for notational convenience). The term order derives from the approximate equation, $LaTeX: v^{k+1} := r(v^k)^p$, where $LaTeX: r$ is the rate. If this equation were exact, we would have $LaTeX: v^k = r^k$ if $LaTeX: p=1$, and $LaTeX: v^k = r^{(p^k-1)/(p-1)}$ if $LaTeX: p > 1$, for all $LaTeX: k$. If $LaTeX: r=0.1$ and $LaTeX: p=1$, we gain one decimal place each time: $LaTeX: v^1 = 0.1$, $LaTeX: v^2 =0.01$, $LaTeX: v^3 = 0.001$, etc. If $LaTeX: p=2$, the number of accurate decimal places approximately doubles each iteration: $LaTeX: v^1 = 0.1$, $LaTeX: v^2 = 0.0001$, $LaTeX: v^3 =0.0000001$, etc. Unfortunately, the underlying equation is not exact – see Myths and Counter Examples in Mathematical Programming to avoid misconception. Some authors call this q-order (or Q-order) convergence to distinguish it from variations of the definition of order. Each definition is designed to capture the notion of how many digits of accuracy are added as the sequence approaches its limit.

• Rate of convergence. This is generally used to mean the limiting ratio: $LaTeX: \lim_{k \rightarrow \infty} \|s^{k+1}\| / \|s^k\|^p$, given the order is $LaTeX: p$.
• Sublinear convergence. Order is 1 and rate is 1 (slower than all linear convergence), e.g., $LaTeX: s^k = 1/k$.
• Superlinear convergence. Order is 1 and rate is 0 (faster than all linear convergence), e.g., $LaTeX: s^k = (1/k)^k$.
• Stochastic convergence. This applies when the successor point is a random variable, as in simulated annealing. Here are the most common types of convergence for a sequence of random variables, $LaTeX: X_n \rightarrow X$.
• with probability $LaTeX: 1$ or almost everywhere (abbr., a.e.). $LaTeX: P\{\lim_{n \rightarrow \infty} X_n = X\} = 1$.
• in probability. $LaTeX: P\{\|X_n - X\| > e\} \rightarrow 0$ for all $LaTeX: e > 0$.
• in distribution. The sequence of cumulative distribution functions (cdf), converges point-wise: $LaTeX: F_n(x) \rightarrow F(x)$ for all $LaTeX: x$ at which $LaTeX: F$ is continuous, where $LaTeX: F_n$ is the cdf of $LaTeX: X_n$ and $LaTeX: F$ is the cdf of $LaTeX: X$.
• in p-th mean. $LaTeX: E\{\|X_n - X\|^p\} \rightarrow 0$. For $LaTeX: p=2$, this is called convergence in quadratic mean or in mean square.

# Convex combination

An affine combination of vectors whose coefficients are non-negative, i.e., $LaTeX: \textstyle\sum_k a_k x^k$, where $LaTeX: a_k \ge 0$ and $LaTeX: \textstyle\sum_k a_k = 1$.

# Convex cost flow problem

The min-cost network flow problem with a nonlinear convex cost function.

# Convex function

Let $LaTeX: f:X\rightarrow R$. Then $LaTeX: f$ is convex if $LaTeX: X$ is a convex set, and for $LaTeX: x, y \in X$ and $LaTeX: 0 \le a \le 1$:

$LaTeX: f(ax + (1-a)y) \le af(x) + (1-a)f(y)$.

Equivalently, its epigraph is convex.

# Convex hull

(of a set). The intersection of all convex supersets (which can be limited to halfspaces). Equivalently, the set of all convex combinations of points in the set (which can be limited to convex combinations of at most n+1 points, in n dimensions, which is known as Carathéodory's Theorem).

# Convex program

The mathematical program $LaTeX: \min \{f(x) : x \in X \}$ is convex if the objective function $LaTeX: f$ is convex, and the feasible region $LaTeX: X$ is a convex set. In particular, the mathematical program $LaTeX: \min \{f(x) : g(x) \le 0, \; h(x) = 0 \}$ is convex if $LaTeX: g$ is convex, and $LaTeX: h$ is affine. See the supplement, Convex Cones, Sets, and Functions.

Here are some key properties of a convex program:

# Convex set

If any two points are in the set, so is their line segment, i.e.

$LaTeX: x, y \in X \Rightarrow (1 - a) x + a y \in X$,

for any $LaTeX: 0 \le a \le 1$. See Myths and Counter Examples in Mathematical Programming.

# Convex simplex method

This extends the general strategy of the simplex method in linear programming to maximize a concave objective over linear constraints of the form $LaTeX: Ax=b$ and $LaTeX: x \ge 0$. (A form of local convergence applies when the objective is not concave, but is smooth.) A tableau is maintained, but nonbasic variables need not be at zero level. The partition is used to compute a generalized reduced cost of the form $LaTeX: d(x) = \nabla f(x) - y^T A$. The $LaTeX: y$ vector is determined by the basic portion: $LaTeX: 0 = \nabla_B f(x) - y^TB$ (so $LaTeX: y = (\nabla_B f(x))B^{-1}$).

As in the simplex method, pricing is used to determine whether there is an improving nonbasic vector. If not, the algorithm terminates (satisfying the Kuhn-Tucker conditions); otherwise, a line search is used to determine the new point, changing only the basic variables to compensate for the change in the one nonbasic level. If this causes a basic variable to reach zero, the basis is changed with the pivot operation.

# Convexity cut

A class of cutting planes derived by considering a convex set in a polyhedron, P. The form of the cut is

$LaTeX: \sum_i \frac{y_i}{t_i} \ge 1$,

and it is derived as follows.

Let $LaTeX: x_0$ be an extreme point of a given polyhedron, which contains a given set, $LaTeX: S$. Suppose $LaTeX: C$ is a convex set in $LaTeX: \mathbb{R}^n$ whose interior contains $LaTeX: x_0$ but does not contain any point of $LaTeX: S$. Let $LaTeX: v_1, v_2, \ldots, v_n$ be linearly independent vectors in $LaTeX: \mathbb{R}^n$, and let $LaTeX: t_i$ be such that $LaTeX: t_i > 0$ and $LaTeX: [x_0, x_0 + t_i v_i] \in C$ for all $LaTeX: i=1, 2, \ldots, n$ (e.g., the edges emanating from $LaTeX: x_0$ in $LaTeX: V$). Define $LaTeX: V = [v_1,..., v_n]$ and $LaTeX: M = [\mbox{diag}(t_i)V]^{-1}$ (i.e., $LaTeX: M(i,.) = V^{-1} / t_i)$. Then, the cutting plane $LaTeX: \{x: M(x-x_0) \ge 1\}$ excludes $LaTeX: x_0$ without excluding any other $LaTeX: x$ in $LaTeX: S$ for which $LaTeX: V^{-1}(x-x_0) \ge 0$. The cut, $LaTeX: M(x-x_0)\ge 1$, is equivalent to the above form, $LaTeX: y \, \mbox{diag}(1/t_i) \ge 1$, where $LaTeX: y = x_0 - Vw$ for some $LaTeX: w \ge 0$.

One special case is the intersection cut for a 0-1 integer program:

• $LaTeX: S = \{(x, s): x \in \{0, 1\}^n, \; s \in \mathbb{Z}^m, \; Ax + s = b \}$;
• $LaTeX: P = {(x, s): x \in [0, 1]^n, \; s \in \mathbb{Z}^m, \; Ax + s = b\} \cap \{x: Cx \le c\}$, where $LaTeX: \{Cx \le c\}$ are the cuts already added;
• $LaTeX: x_0$ is an extreme point of $LaTeX: P$, obtained by solving the LP relaxation with a simplex method, so $LaTeX: x_0 = (u, w)$ is a basic optimum.
• $LaTeX: V = B^{-1}N$, where $LaTeX: B$ is the basis that transforms the original equations to $LaTeX: u + B^{-1}Nw = x_0$ $LaTeX: ( =B^{-1}b )$;
• $LaTeX: C$ is a sphere, $LaTeX: \{y: \|y\|^2 \le 1/4\}$.

# Corner polyhedron problem

Gomory's relaxed IP that drops the non-negativity constraints of basic variables. Suppose $LaTeX: x = (u,0)$ is a basic solution to the LP, $LaTeX: \mbox{opt}\{c^Tx : Ax=b, \; x \ge 0\}$, where $LaTeX: u$ is the vector of basic variables. Then, the corner polyhedron problem is

$LaTeX: \mbox{opt} \{d^Tv: u \in \mathbb{Z}^m, \; v \in \mathbb{Z}^{n-m}+, \; Bu + Nv =b\}$,

(dropping $LaTeX: u \ge 0$), where $LaTeX: d$ is the reduced cost of $LaTeX: v$ in the LP solution. (Note: $LaTeX: c^Tx = d^Tv$ for all $LaTeX: x=(u,v)$ such that $LaTeX: Ax=b$.)

See elliptope.

# Cost of captial

Used in computing as an objective function, it is the interest rate. The units are currency per time period (e.g., dollars per year). One must be careful in computing this. For example, the marginal cost of capital can be greater than the marginal interest rate because the latter applies to the entire borrowing, not just the last dollar borrowed. (This is the "classical" definition, which does not consider advances in option theory.)

# Covering problem

The idea is to select enough members in each of a specified collection of sets; that defines covering the sets. Subject to this, there is a cost for the elements selected, and the objective is to minimize total cost. The IP form of the usual set covering problem is

$LaTeX: \min \{c^Tx: Ax \ge 1, \; x \in \{0,1\}^n\}$,

where $LaTeX: x_j=1$ if element $LaTeX: j$ is selected; else, $LaTeX: x_j=0$. The matrix $LaTeX: A$ has 0's and 1's, where the $LaTeX: i$-th row corresponds to the $LaTeX: i$-th set to be covered: $LaTeX: A_{i, j}=1$ means element $LaTeX: j$ is in set $LaTeX: i$; else, $LaTeX: A_{i, j}=0$. The constraint $LaTeX: Ax \ge 1$ means that at least one element of each set must be selected. This could be extended to require $LaTeX: b_i$ elements of set $LaTeX: i$ be selected by writing $LaTeX: Ax \ge b$. Also see vertex cover.

# Cramer rule

To solve $LaTeX: Ax=b$, where $LaTeX: A$ is nonsingular, the j-th coordinate is given by:

$LaTeX: x_j = \frac{ \mbox{det}(A^j)}{\mbox{det}(A)}$,

where $LaTeX: det(\cdot)$ is the determinant, and $LaTeX: A^j$ is the matrix obtained if column $LaTeX: j$ of $LaTeX: A$ is replaced by $LaTeX: b$:

$LaTeX: A^j = [A(., 1) A(., 2) \ldots A(., j-1) b A(., j+1), \ldots A(., n)]$.

# Crash

This is a heuristic to generate an initial point for an algorithm. It is from the early linear programming computer systems that used a variety of heuristics to generate an initial basic solution.

# Criss-cross method

Criss-cross method. A method in linear programming that chooses a pivot by possibly crossing from primal to dual, and vica versa. The least-index criss-cross method is a finite pivot method that chooses a pivot based on the least index of a column or row for which there is improvement in either the primal or dual infeasibility. It terminates at a basis when one of the following termination conditions is reached:

1. Both the associated primal and dual solutions are feasible (implies optimality because complementary slackness holds by construction).
2. There is evidence of primal and/or dual infeasibility (like the tests in the simplex method).
(Note: any basis is used to start and there is only one phase.)

# Critical path

A longest path in a network, where length units are time durations. The nodes represent tasks, and the arcs represent precedence constraints. The path is critical because the associated tasks determine the total completion time of the project. Moreover, at least one of their duration times must decrease in order to decrease the total completion time.

# Critical point

A critical point of a smooth function is an element of the domain for which the first partial derivatives are zero or infinite.

# Cumulative constraint

is a global constraint that models scheduling problems as follows. Given are a collection of tasks $LaTeX: T=t_1 \dots t_n$, such that each task $LaTeX: t_i$ is associated with four variables:

• release time, $LaTeX: r_i$, is the earliest time at which it can begin executing,
• deadline (or due date), $LaTeX: d_i$, is the time by which it must complete
• processing time, $LaTeX: p_i$, is the length of time it takes to complete
• capacity requirement, $LaTeX: c_i$, is the capacity of the resource that it uses while it executes.

In addition, we are given the capacity variable $LaTeX: C$ of the resource.

The $LaTeX: cumulative(\{s_1 \dots s_n\},\{p_1 \dots p_n\}, \{c_1 \dots c_n\}, C)$ constraint assigns a starting time $LaTeX: s_i$ for each task $LaTeX: t_i$ such that:

• $LaTeX: r_i\leq s_i \leq d_i - p_i$: the task starts at or after its release date and end at or before its deadline, and
• $LaTeX: \forall u \sum_{i|s_i\leq u\leq s_i+p_i} c_i\leq C$: at any time unit $LaTeX: u$, the capacity of the resource is not exceeded.

# Cut search

An algorithm strategy for global optimization (notably for integer programming) that consists of two alternating phases: search and cut. The search phase finds linearly independent vectors emanating from a root point, $LaTeX: x_0$, to setup a probe for the cut phase. Usually, $LaTeX: x_0$ is an extreme point, so the search is an edge probe phase that extends edges of the cone rooted at $LaTeX: x_0$ until it intersects the candidate solution points. Then, a cutting plane is added (usually a convexity cut) to exclude the root point. (A new root point is obtained by solving the new approximating problem on the smaller polyhedron.)

# Cutset

In a weakly connected network, this is a set of arcs whose removal decomposes the network into exactly two components. Separating node $LaTeX: s$ from node $LaTeX: t$, the value of the cutset is the sum of the capacities of its arcs from the component containing $LaTeX: s$ to the one containing $LaTeX: t$.

# Cutting plane

A hyperplane whose halfspace cuts off a particular point, such as a non-integer extreme point solution of a linear program relaxation of an integer program. The cut is such that it does not eliminate any solution to the original problem. For example, suppose we want to solve

$LaTeX: \max \{x: (x,y) \in \{0,1\}^2, \; 2x + 2y \le 1 \}$.

The only feasible point is $LaTeX: (0, 0)$. The LP relaxation is

$LaTeX: \max \{x: (x,y) \in [0,1]^2, \; 2x + 2y \le 1\}$,

and an optimal solution is at $LaTeX: (1/2, 0)$. One cutting plane is $LaTeX: \{(x,y): x + y \le 1/3\}$. A deeper one is $LaTeX: \{(x,y): x + y \le 0\}$. (When adding a cutting plane to the LP, a new relaxation is obtained, and the solution is closer to the IP solution.)

# Cutting stock problem

Determine a way to cut standard sheets of material into various shapes (like clothes parts) to minimize waste. This is a (linear) integer programming model: patterns are specified, and $LaTeX: A_{i, j, k}$ is the amount of $LaTeX: i$-th stock (e.g., sheet or roll of material) used to generate the $LaTeX: j$-th output by the $LaTeX: k$-th pattern. Then, let $LaTeX: x_k$ be level of $LaTeX: k$-th pattern used and $LaTeX: y = Ax$. Thus, $LaTeX: \textstyle s_i = \sum_j y_{i,j}$ is the amount of the $LaTeX: i$-th stock used, which is limited by its availability: $LaTeX: s \le S$; and $LaTeX: \textstyle v_j = \sum_i y_{i, j}$ is the amount of $LaTeX: j$-th output generated, which is required to be in some range, say $LaTeX: L \le v \le U$ (allowing some demand overruns or underruns). Some models seek to minimize the total waste: $LaTeX: \textstyle \sum_i S_i - s_i$. Other models consider cost too. The most common problems are 2-dimensional (odd shapes from sheets of material); the 1-dimensional case is called the trim problem. In the latter case, the stock index $LaTeX: i$ is not needed. For example, consider a stock of rolls of paper with a given width, which must be slit into rolls of various widths. Then, $LaTeX: A_{j, k}$ is how much of a stock roll is used by the $LaTeX: k$-th pattern to generate a roll of the $LaTeX: j$-th width. Moreover, $LaTeX: \textstyle y_j = \sum_k A_{j, k} x_k$ is the amount of a stock roll used by pattern $LaTeX: j$ to generate a roll of width $LaTeX: j$.

# Cyclic descent

An algorithm that seeks to optimize a multivariate function by optimizing each coordinate with a univariate optimization technique (keeping the other coordinates fixed). This is repeated until a fixed point is reached. In general, it is possible for such a fixed point not to be an optimum (even locally) because a simultaneous change in variables could result in an improvement. An example is given by:

$LaTeX: f(x, y) = (y - x^2)(y - 2x^2) \; \mbox{ at } \; (x, y) = (0,0)$.

$LaTeX: f(0, y)$ has a minimum at $LaTeX: y=0$, and $LaTeX: f(x, 0)$ has a minimum at $LaTeX: x=0$. However, $LaTeX: f$ decreases with simultaneous change, $LaTeX: y=(3/2)x^2$. Along this parabola, $LaTeX: f(x,y) = -(x^4)/4$, which is negative for $LaTeX: x$ nonzero. Thus, $LaTeX: 0$ is not a local minimum of $LaTeX: f$. For further discussion about this example, see Myths and Counter Examples.

# Cycling

In linear programming this means revisiting a basis, mainly referring to a simplex method, so that the algorithm would cycle through the same sequence of bases, not converging to an optimal one. See the cycling supplement

# DFP method

This is a method to solve an unconstrained nonlinear program that proceeds as follows.

1. Start with any symmetric, negative definite matrix, say $LaTeX: H$ (e.g., $LaTeX: -I$), and any point, say $LaTeX: x$. Compute $LaTeX: g=\nabla f(x)$, and set each of the following:
2. direction: $LaTeX: d = -Hg$.
3. step size: $LaTeX: s \in \arg\!\max \{ f(x + td): t \ge 0\}$.
4. change in position: $LaTeX: p = sd$.
5. new point and gradient: $LaTeX: x' = x + p$ and $LaTeX: g' = \nabla f(x')$.
6. change in gradient: $LaTeX: q = g' - g$.
7. Replace $LaTeX: x$ with $LaTeX: x'$ and update $LaTeX: H$ by the DFP update to complete the iteration.

# DFP update

This is a way to update an approximation of an inverse Hessian, used for unconstrained optimization. Using the notation in the DFP method, the update is:

$LaTeX: H' = H + \frac{pp^T}{p^Tq} - \frac{H q q^T H}{q^T H q}$.

Note: $LaTeX: \textstyle pp^T$ is a rank 1 matrix (the outer product of the vector $LaTeX: p$ with itself).

# Dantzig-Wolfe decomposition

The principle applied here is that the total problem consists of:

1. subprograms corresponding to its almost independent parts;
2. master program that ties together the subprograms.
In linear programming   this applies to the following block diagonal structure, linked by master constraints.

           .---------------------.
| Master constraints  |
|_____________________|
.----.
| B1 |
|____|
.----.
| B2 |
|____|
.
.
.
.----.
| Bk |
|____|


The procedure operates on the master LP with blocks used to generate extreme points as part of the pricing.

Same as fathoming. Used by scientists who have combinatorial optimization problems, this is more descriptive than the older term of "fathoming" a node. The idea is that a condition has been met that ensures that following some branch (i.e., setting values of variables) cannot lead to an optimal solution. This is often just a dominance property.

# Decision variable

A decision variable represents a problem entity for which a choice must be made. For instance, a decision variable might represent the position of a queen on a chessboard, for which there are 100 different possibilities (choices) on a 10x10 chessboard or the start time of an activity in a scheduling problem. Each possible choice is represented by a value, hence the set of possible choices constitutes the domain that is associated with a variable.

# Decomposition principle

The idea of decomposing a mathematical program into two or more sets of variables and associated constraints. The purpose is to separate some portion with special structure from the rest of the mathematical program.

Here are some particular ones that are in this glossary:

# Decoupling principle

Arises in sensitivity analysis, as a corollary to the Compatibility Theorem:

An equilibrium basis is compatible with the change $LaTeX: (db, dc)$ if, and only if, it is compatible with both $LaTeX: (db, 0)$ and $LaTeX: (0, dc)$.

# Degeneracy

The solution $LaTeX: (x, y, s)$ to the primal-dual pair of linear programs:

$LaTeX: \min \{c^Tx : x \ge 0, \; Ax = b\}$ and $LaTeX: \max \{y^Tb: s \ge 0, \; yA + s = c\}$

is degenerate if it is not strictly complementary ---i.e. $LaTeX: x_i = s_i = 0$ for some $LaTeX: i$. The pair $LaTeX: (x_i, s_i)$ is primal degenerate if there is an optimal solution $LaTeX: x'$ such that $LaTeX: x'_i > 0$. Similarly, the pair is dual degenerate if there is a dual optimal solution $LaTeX: (y', s')$ such that $LaTeX: s'_i > 0$.

With regards to a basic optimal solution, such a solution is (primal) degenerate when some basic variable is at one of its bound values (canonically zero). A basic optimal is dual degenerate if one of its nonbasic variables has a zero reduced cost. Geometrically, this corresponds to a degenerate polyhedron. Suppose we have $LaTeX: \{x: Ax \le b\}$ (where $LaTeX: A$ is $LaTeX: m$ by $LaTeX: n$). This polyhedron is degenerate if there exists an extreme point that is an element of the intersection of more than $LaTeX: n$ hyperplanes. The pyramid is degenerate because four planes meet at a point. (Example.)

Algorithmically, degeneracy can cause cycling in the simplex method.

# Degeneracy graph

An undirected graph, where the nodes represent bases and edges their adjacency. The degeneracy graph is a way to probe deeply into a degenerate extreme point (represented by more than one basic feasible solution). Among other things, it reveals good pivot steps through the point enroute to an optimum, and it provides a clear, efficient approach to sensitivity analysis.

• Here are some terms for a degeneracy graph associated with a (basic) solution, $LaTeX: x$:
• degeneracy power of $LaTeX: x$: number of bases corresponding to $LaTeX: x$.
• internal node: a node whose neighbors are all in this degeneracy graph.
• transition column: a tableau column such that a pivot exists for which that column enters the basis, and the new basis corresponds to a transition node.
• transition node: a node with at least one member outside this degeneracy graph (so a pivot moves to a different solution).
• Transition Node Pivoting(TNP) rule: a lexicographic rule using a special column to avoid internal nodes (but cycling is possible).

See Degeneracy

# Degree-2 inequality

Arises in mixed integer linear programming models for combinatorial programs on graphs, where each row of the $LaTeX: A$ matrix has 2 non-zeroes (usually 1's), as in the node packing problem.

# Detached coefficient form

Given the system $LaTeX: Ax=b$, we augment the objective equation: $LaTeX: c^{T}x - z = 0$. Then, detaching the variables, the form is:

$LaTeX: \left[ \begin{array}{rrr} A & 0 & b \\ c & {-1} & 0 \end{array} \right].$

Upon multiplying $LaTeX: Ax = b$ by the inverse of a basis, say $LaTeX: B$, where $LaTeX: A=[B \; N]$, the detached coefficient form becomes

$LaTeX: \left[ \begin{array}{ccrc} I & B^{-1}N & 0 & B^{-1}b \\ c_B & c_N & -1 & 0 \end{array} \right]$,

where $LaTeX: c=[c_{B} \; c_{N}]$ (conformal with the partition of the columns of $LaTeX: A$). Now if we subtract $LaTeX: c_B$ times the first $LaTeX: m$ rows from the last row, this becomes

$LaTeX: \left[ \begin{array}{ccrc} I & B^{-1}N & 0 & B^{-1}b \\ 0 & c_N - c_B B^{-1}N & -1 & -c_B B^{-1} b \end{array} \right]$

Recognizing that $LaTeX: B^{-1}b$ is the vector of basic levels, and the associated objective value $LaTeX: (z)$ is $LaTeX: c_B B^{-1}b$, we can drop the first $LaTeX: m$ columns plus column $LaTeX: n+1$ (corresponding to $LaTeX: z$) and form the tableau associated with this basis (adding labels to identify the basic variables in the rows and nonbasic variables in the columns).

# Dichotomous search

This finds the maximum of a unimodal function on an interval, $LaTeX: (a, b)$, by evaluating points placed near the center, approximating the bisection method. With $LaTeX: \varepsilon$ being a small positive value, let $LaTeX: x = (a+b)/2 - \varepsilon$ and $LaTeX: y = (a+b)/2 + \varepsilon$. Then, if $LaTeX: f(x) < f(y)$, the new interval of uncertainty is $LaTeX: (x, b)$; if $LaTeX: f(x) > f(y)$, it is $LaTeX: (a, y)$; if $LaTeX: f(x) = f(y)$, it is $LaTeX: (x, y)$. With $LaTeX: N=2M$ function evaluations, the interval of uncertainty can be reduced to within $LaTeX: (b-a)r$, where

$LaTeX: r = 2^{-M} + [1 - 2^{-M}]\varepsilon$.

The same reduction occurs with $LaTeX: N=2M+1$, as an even number of evaluations is required. For example, with $LaTeX: N=20$, $LaTeX: 2^{-10}=0.0009765$, so the interval reduction is $LaTeX: 0.0009765 + 0.999756\varepsilon$. See fibonacci search   for a more efficient method.

# Diet problem

Select levels of food consumption so as to minimize total cost subject to nutrient requirements. As a linear program,

$LaTeX: \min \{c^Tx : Ax \ge b, \; x \ge 0\}$,

$LaTeX: x_j$ is the level of food $LaTeX: j$ at unit cost $LaTeX: c_j$. The $LaTeX: i$-th nutrient requirement (e.g., amount of vitamin C) is $LaTeX: b_i$, and $LaTeX: A_{i, j}$ is the amount of nutrient $LaTeX: i$ in each unit of food $LaTeX: j$.

Directed graph.

# Dimension

• Of a set: the dimension of the smallest affine space that contains the set. The dimension of an affine space is $LaTeX: k$ if the maximum number of affinely independent points in the set is $LaTeX: k+1$. Of particular interest is the column space of a matrix $LaTeX: A$,
$LaTeX: \mbox{col}(A) = \{x: x=Ay \mbox{ for some } y \in \mathbb{R}^n\}$.
The dimension of this is the rank of $LaTeX: A$. Another is the row space of $LaTeX: A$,
$LaTeX: \mbox{row}(A) = \{x: x=A^Tu \mbox{ for some } u \in \mathbb{R}^m\}$.
The dimension of this is also the rank of $LaTeX: A$.
• Of an expression: the units of measurement (e.g., tons of apples, meters of rope, hours of labor). In a relation, such as $LaTeX: y = ax + b$, the units of $LaTeX: y$, $LaTeX: ax$ and $LaTeX: b$ must all be the same. Dimensional analysis is concerned with determining such consistency and inferring whatever units are missing.

# Diophantine equations

A system of equations whose variables must be integer-valued. A famous one is in Fermat's Last Theorem:

$LaTeX: x^n + y^n = z^n.$
In general, solving the linear Diophantine equations, $LaTeX: Ax = b$ for $LaTeX: x \in \mathbb{Z}^n$ can be solved in polynomial time. However, with bounds, $LaTeX: 0 \le x \le u$, the problem is NP-complete.

# Directed tree search

A class of algorithms designed to systematically search a decision space, where nodes correspond to partial solutions (sometimes called states). Examples are branch and bound and implicit enumeration. Also see heuristic search.

# Directional arc consistency

Directional arc consistency is a form of arc consistency.

An ordering is associated to the variables in the constraint network and constraints are made arc consistent only in in the direction of the ordering. For instance, if $LaTeX: x_i \le_{order} x_j$ and constraint $LaTeX: c_{ij}$ exists, directional arc consistency enforces that each value in the domain of $LaTeX: x_i$ is consistent with at least one value in the domain of $LaTeX: x_j$ but not necessarily the other way around.

# Directional derivative

The limit (if it exists) of the rate of change along a specified direction, say $LaTeX: h$,

$LaTeX: \lim_{t \rightarrow 0^+} \frac{f(x+th) - f(x)}{t}$.

In particular, the $LaTeX: i$-th partial derivative is with $LaTeX: h=e_i$. (Note that the directional derivative, as defined above, depends on the scale: the derivative for $LaTeX: kh$ is $LaTeX: k$ times the derivative for $LaTeX: h$. To avoid scale dependence, some authors require $LaTeX: \|h\| = 1$.) Recently, some people have called this a B-derivative (or Bouligand derivative), and functions that have directional derivatives in all feasible directions are B-differentiable. Some require the convergence to be uniform.

# Discount rate

Also a discount factor. This accounts for the time value of money and arises naturally in financial models, such as a portfolio selection problem. A discount rate of 7% means $1 earned a year from now has a present value of approximately$93.46 (1/1.07). If $1 is earned n years from now, and the discount rate is $LaTeX: r$, the present value is$$LaTeX: 1/(1+r)^n$. In continuous-time models, there are variations, such as defining the present value of $LaTeX: K$ dollars at time $LaTeX: t$ to be $LaTeX: K(1-e^{-rt})$. In infinite horizon dynamic programming, the discount factor serves to make value iteration a contraction map. In that case, the fixed point of the stationary equation,

$LaTeX: F(s) = \mbox{opt} \{r(x, s) + a F(T(s, x)): x \in X(s)\}$,

is obtained by successive approximation - i.e., value iteration for an infinite horizon DP. This converges to a unique fixed point, $LaTeX: F$, if $LaTeX: 0 < a < 1$. Here, the DP notation is used, where $LaTeX: a$ is the discount factor.

# Discrete program

A mathematical program in which the variables are restricted to a discrete set, commonly the integers. Subcases include integer programs and combinatorial programs.

# Disjunctive constraint

This is a global constraint enforcing the fact that two activities $LaTeX: a_i$ and $LaTeX: a_j$ requiring the same unary resource cannot overlap in time. Therefore, either $LaTeX: a_i$ precedes $LaTeX: a_j$ or $LaTeX: a_j$ precedes $LaTeX: a_i$. In general the disjunctive constraint achieves arc_consistency on the formula: $LaTeX: end(t_i)\leq start(t_j) \vee end(t_j)\leq start(t_i)$.

For discrete (cumulative) resource capacity, the disjunctive constraint holds when $LaTeX: c_i +c_j > C$ where $LaTeX: c_i$ is the capacity requirement of task $LaTeX: a_i$, $LaTeX: c_j$ is the capacity requirement of task $LaTeX: a_j$ and $LaTeX: C$ the capacity of the resource.

# Disjunctive normal form

A logical expression that is written as a disjunction,

$LaTeX: D_1 \vee D_2 \vee \ldots \vee D_m$,

where $LaTeX: \vee$ is the disjunction (logical inclusive or). Each $LaTeX: D_i$ has the form

$LaTeX: L_{i_1} \wedge L_{i_2} \wedge \ldots \wedge L_{i_n}$,

where $LaTeX: \wedge$ is the conjunction (logical and) and each $LaTeX: L_i$ is a literal – a proposition or its negation.

# Disjunctive program

In its most general form, this seeks a solution to one of a finite list of mathematical programs:

$LaTeX: x^* \in \mbox{argopt} \{f_k(x): x \in X_k \}$,

for some $LaTeX: k=1,...,K$. Usually, they have the same objective functions, $LaTeX: f_k=f$ for all $LaTeX: k$, so the mathematical program with disjunctive constraints has the following form:

$LaTeX: \mbox{opt} \left\{ f(x): x \in X, \left(g_1(x) \le 0 \mbox{ or } g_2(x) \le 0 \mbox{ or } \ldots \mbox{ or } g_k(x) \le 0 \right) \right\}$.

This can be reformulated as a mixed-integer program, using binary variables to represent the disjunction.

# Divide and conquer

An algorithm strategy, such as dynamic programming, that divides a problem into parts. Suppose $LaTeX: T(n)$ is the time complexity for a problem of size $LaTeX: n$, and that dividing the problem into $LaTeX: s$ subproblems, each of size $LaTeX: b$, takes $LaTeX: S(n)$ additional time. Then, $LaTeX: T(n) = sT(n/b) + S(n)$ is called the divide-and-conquer recurrence relation. If $LaTeX: S(n)=c > 0$ (a constant) and $LaTeX: b > 1$, $LaTeX: T(n) = O(n^[\log_b(s)])$ for $LaTeX: n=b,2b,3b, \ldots,$ and $LaTeX: T(n) = O(log(n))$ if $LaTeX: s=1$. For example, if a bisection method is used (as in searching a sorted list), the divide and conquer recurrence is $LaTeX: T(n) = T(n/2) + 2$ for $LaTeX: n=2,4,6,8, \ldots$. This implies $LaTeX: T(n) = O(log(n))$ since $LaTeX: s=1$ and $LaTeX: b=2$.

# Domain

The domain of a variable is a set of all possible values that can be assigned to the variable. When the domain contains numbers only, the variables are called numerical variables. The domain of a numerical variable may be further restricted to integers, rational numbers or real numbers. When the domain contains boolean values only, the variables are called boolean variables. When the domain contains an enumerated type of objects, the variables are called symbolic variables (e.g. a variable that represents a day of the week).

A common propagation technique in constraint programming is to dynamically remove values from the domain of a variable when it can be inferred that the value does not belong to any solution given the set of variable assignments that have already been made. See

# Dominance

This is used in many contexts, but the general meaning is that something is uniformly better than something else. For example, consider two activities in a linear program, say $LaTeX: j$ and $LaTeX: k$, such that:

• $LaTeX: j$ has greater cost: $LaTeX: c_j \ge c_k$
• $LaTeX: j$ produces less of each requirement: $LaTeX: A_{i, j} \le A_{i, k}$ for $LaTeX: i$ such that we require $LaTeX: A_{i,*}x \ge b_i$
• $LaTeX: j$ consumes more of each resource: $LaTeX: A_{i, j} \ge A_{i, k}$ for $LaTeX: i$ such that we require $LaTeX: A_{i,*}x \le b_i$
• $LaTeX: j$ produces or consumes at the same rate of goods to be conserved: $LaTeX: A_{i, j} = A_{i, k}$ for $LaTeX: i$ such that we require $LaTeX: A_{i,*}x = b_i$

Then, activity $LaTeX: k$ dominates  activity $LaTeX: j$.

# Doubly stochastic matrix

A non-negative matrix such that each row and each column sums to 1.

# Dual

Another mathematical program with the property that its objective is always a bound on the original mathematical program, called the primal. Suppose the dual is $LaTeX: \min \{ F(y): y \in Y\}$. Then, $LaTeX: F(y) \ge f(x)$ for all feasible $LaTeX: x$ in the primal and all $LaTeX: y$ in Y. This immediately implies that if the primal is feasible, the dual cannot be unbounded, and vice versa: if the dual is feasible, the primal cannot be unbounded. It also implies that if the dual is unbounded, the primal must be infeasible (and vice versa). A dual provides a sufficiency test for optimality, for if feasible $LaTeX: x$ and $LaTeX: y$ can be found such that $LaTeX: f(x)=F(y)$, it follows that $LaTeX: x$ is optimal in the primal and $LaTeX: y$ is optimal in the dual. Weak duality is when this sufficient condition is all that can be guaranteed. Strong duality is when a finite optimal value to one problem ensures the existence of an optimal solution to the other and that their optimal objective values are equal. There are particular duals of significance.

See Degeneracy.

# Dual method

An algorithm for which each iterate is feasible in a dual mathematical program. An example is that the class of cutting plane methods, associated with the Lagrangian dual, is dual to the column generation method of Dantzig-Wolfe decomposition.

# Dual norm

Given the norm is $LaTeX: L_p$, where $LaTeX: p \ge 1$, its dual norm is $LaTeX: L_q$, where $LaTeX: (1/p) + (1/q) = 1$. Note, if $LaTeX: p=1$ then $LaTeX: q=\infty$, and if $LaTeX: q=1$ then $LaTeX: p=\infty$. This is based on Hölder's inequality.

# Dual price

A dual variable associated with a primal constraint.

# Duality

The study and use of dual mathematical programs.

# Duality gap

Numerically, it is the difference between primal and dual objective values. The phrase by itself means the presence (or existence) of a gap -- i.e., that the value is not zero.

# Duality theorems

These establish the fundamental duality relations (weak or strong). Several significant ones are given here.

# Dynamic CSP

Dynamic CSPs deal with the problems that are subject to exogenous change over time. A changing problem is viewed as a sequence of (static) CSPs in which one problem in the sequence is transformed to another through restrictions (i.e. additions of constraints) and/or relaxations (removal of constraints). Various solution definitions exist, most dealing with some notion of robustness (i.e., a solution to one static CSP is likely to remain a solution after a dynamic change in the problem) or ease of modification (i.e., a solution to one static CSP can be easily modified to become a solution for the next one).

# Dynamic program

Based on the Principle of Optimality, this was originally concerned with optimal decisions over time. For continuous time, it addresses problems in variational calculus. For discrete time, each period is sometimes called a stage, and the dynamic program (DP) is called a multi-stage decision process. Here is the Fundamental Recurrence Equation for an additive process:

$LaTeX: F(t, s) = \mbox{opt} \{r(t, s, x) + aF(t', s'): x \in X(t, s) \mbox { and } s'=T(t, s, x)\}$,

where $LaTeX: F(t, s)$ is the optimal total return upon reaching time $LaTeX: t$ in state $LaTeX: s$, and $LaTeX: x$ is decision variable(s); $LaTeX: s$, $LaTeX: s'$ are state variables; $LaTeX: t$, $LaTeX: t'$ are time; $LaTeX: X(t, s)$ is the decision space (usually depends only on state); $LaTeX: r(t, s, x)$ is the immediate return; $LaTeX: T(t, s, x)$ is the state transform.

In words, the optimal return upon reaching time $LaTeX: t$ in state $LaTeX: s$ equals the optimal return from an immediate decision plus the optimal return for the remainder of the process upon the transition to state $LaTeX: s'$. This new state is a function of the current time, state, and decision. For discrete time, $LaTeX: t'=t+1$ for the forward equation, and $LaTeX: t'=t-1$ for the backward equation. The multiplier, $LaTeX: a$ is generally a positive scalar, such as a discount factor to yield the present value of money. (For some problems $LaTeX: a=1$, in which case the infinite horizon model might be ill posed.) More generally, the process need not be additive, so that '+' could be another composition operation.

A decision rule is a function, $LaTeX: x^*(t, s)$, that specifies an optimal value of $LaTeX: x$ upon reaching time $LaTeX: t$ in state $LaTeX: s$. For a completely deterministic process, backtracking is used to obtain the usual form of an optimal solution from a known initial or final state. The process could be stochastic, in which case $LaTeX: r$ and $LaTeX: F$ are expected values, and the state transform is random with probability distribution, $LaTeX: P[T(t,s,x) = s' | s,x]$. Thus, $LaTeX: F(t',s')$ is replaced by $LaTeX: \sum_{s'} F(t',s')P[T(t,s,x)=s' | s,x]$.

DP is a technique that applies to static problems too. For example, consider the following recurrence equation for the knapsack problem:

$LaTeX: F(n, s) = \max \{c(n) x + F(n-1, s-a(n)x): x \in \{0,1, ..., |_s/a(n)_|\} \}$.

In words, this says that the max total return with $LaTeX: n$ objects having total knapsack space $LaTeX: s$ equals the max choice, $LaTeX: x$ of how much of the $LaTeX: n$-th object we put into the knapsack, with immediate return $LaTeX: c(n)$, plus the max total return from the remaining $LaTeX: n-1$ objects with the remaining space, $LaTeX: s - a(n)x$. The value of $LaTeX: x$ (the stage decision variable) is any non-negative integer up to the space allowed: $LaTeX: \lfloor s/a(n) \rfloor$ is the max number of objects of this type that could fit in the knapsack. There could also be explicit bounds on $LaTeX: x$, such as the 0-1 knapsack problem, in which case the policy set, $LaTeX: X(n,s)$ simply has only those values.

# Economic order quantity

Abbreviated EOQ. This is the level of inventory to order that minimizes the sum of holding and ordering costs. The inventory function, $LaTeX: I(t)$, is the following periodic sawtooth function, where $LaTeX: T$ is the time between orders, and $LaTeX: Q$ is the ordering quantity:

$LaTeX: I(t) = Q - dt \;$ for $LaTeX: \; 0 \le t \le T$,

where $LaTeX: d$ is the rate of demand (inventory units per time units), and $LaTeX: I(t) = I(t-T)$ for $LaTeX: t > T$. The inventory becomes zero at $LaTeX: T = Q/d$, which requires a new order of $LaTeX: Q$ units. The model is thus:

$LaTeX: \min \; (1/2) hdT + K/T$,

where $LaTeX: h$ is the holding cost (currency per time units), so $LaTeX: (1/2) hdT$ is the average holding cost, and $LaTeX: K$ is the fixed cost of ordering, so $LaTeX: K/T$ is the average ordering cost. The solution is $LaTeX: T^* = (2K/hd)^{1/2}$, which yields the Economic Order Quantity (EOQ): $LaTeX: Q^* = (2Kd/h)^{1/2}$. See the more general production scheduling problem.

# Edge-finding

The edge-finding rule is a constraint propagation technique that consists of deducing whether an activity $LaTeX: a_i$ can, cannot, or must be executed before or after a set of activities $LaTeX: \Omega, a_i \notin \Omega$ that all (including $LaTeX: a_i$) require the same resource. Two types of conclusions can then be drawn: new ordering relations (“edges” in the graph representing the possible orderings of activities) and new time-bounds (i.e., tightening of the earliest and latest start and end times of activities).

An edge-finding algorithm is a procedure that performs all such deductions.

# Effective domain

The domain of a function for which its value is finite.

# Efficient frontier

The collection of efficient points, see multiple objectives.

# Eigenvalue

If a (scalar) value, $LaTeX: \lambda$, satisfies $LaTeX: Ax = \lambda x$ for some vector, $LaTeX: x \neq 0$, it is an eigenvalue of the matrix $LaTeX: A$, and $LaTeX: x$ is an eigenvector. In mathematical programming this arises in the context of convergence analysis, where $LaTeX: A$ is the hessian of some merit function, such as the objective or Lagrangian.

# Elastic program

Same as Phase I, the artificial variables are called elastic with the idea that the constraints can "stretch".

# Element constraint

is a global constraint that is true iff a variable $LaTeX: z$ is the $LaTeX: y$th element of an array $LaTeX: X$. More specifically, it involves a 1-dimensional variable array $LaTeX: X = \langle x_1, \dots, x_n \rangle$, an integer variable $LaTeX: y$ and variable $LaTeX: z$, where $LaTeX: X[y] = z$, typically written as "element(X,y,z)".

# Elementary matrix

A matrix formed by performing a single row operation on the identity matrix. These arise in linear programming, particularly for the product form of the basis: $LaTeX: B = E_1 E_2 ... E_m$, where each $LaTeX: E_i$ is an elementary matrix.

# Elementary simplex method

See simplex method.

# Elementary vector

Let $LaTeX: V$ be a subspace of $LaTeX: \mathbb{R}^n$. For $LaTeX: v \in S$, let $LaTeX: S(v)$ denote its support set: $LaTeX: \{j: v_j \neq 0\}$. Then, $LaTeX: v$ is an elementary vector of $LaTeX: V$ if there does not exist $LaTeX: v' \in V$ such that $LaTeX: v' \neq 0$ and $LaTeX: S(v') \subseteq S(v)$. This extends to where $LaTeX: V$ is not a subspace, such as the collection of non-negative vectors in$LaTeX: \mathbb{R}^n$, in which case $LaTeX: S(v)$ is the set of coordinates with positive value. This is of particular importance in defining the optimal partition of an LP solution.

# Ellipsoid

An ellipsoid centered at $LaTeX: c$ is the set

$LaTeX: E(c,Q) = \{x: (x-c)^T Q (x-c) \le 1\}$,

where $LaTeX: Q$ is a symmetric, positive definite matrix.

# Ellipsoid method

This algorithm seeks to solve $LaTeX: Ax \le b$ by iterations that begin with $LaTeX: x=0$ and $LaTeX: Q=\mbox{diag}(2^{-L})$, where $LaTeX: L$ is a measure of the smallest datum that can be represented -- i.e., $LaTeX: x$ is a solution if, and only if, it satisfies $LaTeX: Ax \le b + 2^{-L}$. If the current iterate, $LaTeX: x$, does not satisfy this, a new pair $LaTeX: (x', Q')$ is defined by choosing one of the violated inequalities, say $LaTeX: A_{i,*}x > b_i$. Then, let $LaTeX: v = A_{i,*}^T$, and apply the following rules:

1. $LaTeX: x' = \frac{1} {(n+1)\sqrt{v'Qv}} \; (x - Qv)$
2. $LaTeX: Q' = \frac{n^2}{(n^2 - 1)(n+1) (v^T Q v)} \; (Q - 2 v v^T)$
Now $LaTeX: E(x',Q')$ defines an ellipsoid centered at $LaTeX: x'$, and it contains all the feasible points in $LaTeX: E(x,Q)$. After at most $LaTeX: O(n^2 L)$ iterations, the algorithm terminates with a solution, or with the fact that no solution exists.

By treating the objective as a constraint, say $LaTeX: c^T x \le c^Tx^* - e$ and/or $LaTeX: c^T x \ge c^Tx^* + e$, the ellipsoid method can be used to solve a linear program. Its significance is that it was the first algorithm for linear programming shown to have polynomial time complexity.

# Elliptope

This arises in semi-definite programming. It is the set of all real, square symmetric matrices whose diagonal entries are all equal to one and whose eigenvalues are all non-negative. The dimension of the set is $LaTeX: n(n-1)/2$ (where the matrix is $LaTeX: n \times n$). For example, consider $LaTeX: n=2$:

$LaTeX: M = \left[ \begin{array}{cc} 1 & a \\ a & 1 \end{array} \right]$.

Then, $LaTeX: M$ is in the elliptope for $LaTeX: n=2$ if, and only if, $LaTeX: -1 \le a \le 1$, i.e. it is diagonally dominate (note the dimension is 1). An elliptope is also called a set of correlation matrices.

# Epigraph

This is the region on or above the graph of a function $LaTeX: f$:

$LaTeX: \mbox{epi}(f,X) = \{(x, z): x \in X, \; z \ge f(x)\}$.

# Equilibrium basis

In linear programming this is a basis that is optimal in both the primal and dual. It arises in compatibility theory and degeneracy graphs.

# Equivalent CSPs

Consider two constraint satisfaction problems $LaTeX: P_1$ and $LaTeX: P_2$ and a sequence $LaTeX: X$ of their common variables. We say that $LaTeX: P_1$ and $LaTeX: P_2$ are equivalent w.r.t $LaTeX: X$ if

• for every solution $LaTeX: d$ to $LaTeX: P_1$ a solution to $LaTeX: P_2$ exists that coincides with $LaTeX: d$ on the variables in X, and
• for every solution $LaTeX: e$ to $LaTeX: P_2$ a solution to $LaTeX: P_1$ exists that coincides with $LaTeX: e$ on the variables in X.

See norm.

# Euler-Lagrange equation

This is a necessary condition for $LaTeX: y$ to solve the classical problem in variational calculus:

$LaTeX: \mbox{div}(F_{\zeta} (x, y(x), \nabla y(x))) = F_{\lambda}(x, y(x), \nabla y(x))$,

where $LaTeX: F(x, \lambda, \zeta): \Omega \times \mathbb{R} \times \mathbb{R}^n \rightarrow \mathbb{R}$.

# Evolutionary algorithm

This is a term that applies to any metaheuristic based on some biological metaphor that is a dynamical system whose evolution rules have a stochastic component. In particular, it includes genetic algorithms, scatter search, Particle Swarm Optimization, and some neural networks. It emerged as a unification of related computational methods, including cellular automata, which was conceived by Ulam and von Neumann in the 1940s. This unification is not universal. Some researchers cling to the GA metaphor but say that evolutionary algorithms have more operators; some say that the GA population is composed of bit strings, whereas evolutionary algorithms can have a broader range of population structures. It is the concept of evolution dynamics, which is population-based and stochastic, that distinguishes this class of algorithms from other methods of global optimization. Evolutionary computation uses the evolutionary algorithm as its foundation; typically, it is massively parallel.

# Exact penalty function

A penalty function is exact if there exists a finite parameter value such that its maximum is a solution to the original mathematical program.

# Existence of solution

This pertains to whether a mathematical program has a solution. One source of non-existence is that the feasible region is empty. Another is that it is unbounded, and it is possible to have a sequence of feasible points whose objective value diverges. In linear programming, these are the only sources of non-existence. In nonlinear programming, however, we can have asymptotes such that the objective is bounded but cannot achieve its infimum or supremum. See the Bolzano-Weierstrass theorem, which is fundamental. For linear programming, see theorems of the alternative.

# Expanding subspace theorem

Let $LaTeX: S_k$ be the subspace spanned by vectors $LaTeX: d^k$, which are Q-conjugate. Let $LaTeX: x^0$ be any initial point in $LaTeX: \mathbb{R}^n$, and let $LaTeX: x^k$ be generated by these conjugate directions with optimal line search:

$LaTeX: \displaystyle x^{k+1} = x^k + s_k d^k \;\; \mbox{ and } \;\;

s_k = \frac{-(Qx^k +c)}{(d^k)^T Q d^k}$
.

Then, $LaTeX: x^k$ minimizes $LaTeX: x^{T} Q x + c^{T} x$ on the affine set, $LaTeX: x^0 + S_k$.

# Explicitly quasiconcave function

Negative is explicitly quasiconvex.

# Explicitly quasiconvex function

A quasiconvex function that is also strictly quasiconvex. Its main property is that it rules out flat portions that cause the closure condition to fail. (Every convex function is explicitly quasiconvex.)

# Exponent matrix

Powers of variables in posynomials in a geometric program.

# Extended reals

Typically denoted by $LaTeX: \mathbb{R}^*$, this is the set of real numbers augmented with positive and negative infinity, i.e. $LaTeX: \mathbb{R}^{*} = \mathbb{R} \cup \{\infty, -\infty\}$

# Extensional constraint

Constraints on a set of variables $LaTeX: X =\{x_1, \dots, x_n\}$ can be stated in an extensional fashion by specifying a table that summarizes all allowed (or in an alternative formulation, all disallowed) combinations of assignments to the variables in $LaTeX: X$.

For instance, the extensional constraint $LaTeX: table(\{x,y\},\{[1,2],[1,3],[2,3]\})$ expresses the intensional constraint $LaTeX: x < y$ where $LaTeX: x$ and $LaTeX: y$ can take values $LaTeX: \{1,2,3\}$.

Also called a table constraint.

# Exterior penalty function

A penalty function in which the associated algorithm generates infeasible points, approaching feasibility in the limit towards a solution. An example is to maximize $LaTeX: f(x) - u h(x)^2$ over $LaTeX: x \in X$ and let $LaTeX: u \rightarrow \infty$ in order to approach the solution to

$LaTeX: \max \{f(x): x \in X, \; h(x)=0\}$.

If, during the penalty iterations, $LaTeX: h(x^*)=0$ for some finite $LaTeX: u$, then $LaTeX: x^*$ solves the original mathematical program. Otherwise, the idea is that $LaTeX: x^*(u)$ approaches the feasibility condition, $LaTeX: h(x^*)=0$, as $LaTeX: u$ gets large (though this need not happen without assumptions on $LaTeX: f$ and $LaTeX: h$).

# Extrapolation

The idea of estimating a value by extending information at hand outside its immediate range. In linear programming, an extrapolation estimate of the optimal objective value uses dual price $LaTeX: (y)$ as a slope: $LaTeX: z(b + h) = z(b) + yh$. For a sequence $LaTeX: x^k$, an extrapolation is an estimate of a limit point.

# Extreme point

A point in the closure of a set, say $LaTeX: S$, that is not the midpoint of any open line segment with end points in closure of $LaTeX: S$. Equivalently, $LaTeX: x$ is an extreme point of a closed set, $LaTeX: S$, if there do not exist $LaTeX: y, z \in S \backslash {x}$ for which $LaTeX: \textstyle x = (y + z)/2$. When $LaTeX: S$ is a polyhedron of the standard form, $LaTeX: S=\{x: Ax=b, \; x \ge 0\}$, with $LaTeX: A$ of full row rank, we have one of the fundamental theorems of linear programming that underlies the simplex method:

$LaTeX: x$ is an extreme point of the feasible region if, and only if, $LaTeX: x$ is a basic feasible solution.

# Extreme ray

An extreme ray of a closed set $LaTeX: S$ is a ray in $LaTeX: S$ that cannot be expressed as a (simple) sum of other rays in $LaTeX: S$. For example, the axes, $LaTeX: \{t e_j: t \ge 0\}$, are extreme rays of the non-negative vectors in $LaTeX: \mathbb{R}^n$. The ray $LaTeX: \{t e: t \ge 0\}$, however, is the sum of the axes since $LaTeX: (t,...,t) = t e_1 + \ldots + t e_n$. The recession direction of an extreme ray is sometimes called an extreme direction.

# Extreme value

An extremum (pl. extrema). A minimum or maximum.

# FTRAN

This applies to the factored system,

$LaTeX: E_1 E_2 ... E_n x = b$,

where each $LaTeX: E_i$ is an elementary matrix. The recursion is:

$LaTeX: \begin{array}{rcl} E_1 x_1 & = & b \\ E_2 x_2 & = & x_1 \\ & \vdots & \\ E_n x_n & = & x_{n-1} \end{array}$

In the end, $LaTeX: x = x_n$ is the solution. Each elementary system is solved as follows. For notational convenience, suppose the system is $LaTeX: Ex = y$, and $LaTeX: v$ is the distinguished $LaTeX: p$-th column of $LaTeX: E$. Then,

$LaTeX: x(p) = y(p)/v(p)$, and $LaTeX: x(i) = y(i) - v(i)x(p)$ for $LaTeX: i \ne p$.

This is what is done in the (revised) simplex method, and each elementary solution is a pivot operation.

# Face

A convex subset, say $LaTeX: S$, of a convex set, say $LaTeX: C$, such that for any $LaTeX: x$ and $LaTeX: y$ in $LaTeX: C$

$LaTeX: \{(1 - \alpha) x + \alpha y : 0 \le \alpha \le 1 \} \cap \mbox{ri}(S) \neq \emptyset \Rightarrow x, \; y \in S$.

The set $LaTeX: C$ is itself a face of $LaTeX: C$, and most authors consider the empty set a face. The faces of zero dimension are the extreme points of $LaTeX: C$. When $LaTeX: C$ is a polyhedron, i.e. $LaTeX: \{x : Ax \le b\}$, the faces are the subsystems with some inequalities holding with equality: $LaTeX: \{x: Bx = c, \; Dx \le d\}$, where $LaTeX: A = [B \; D]$ and $LaTeX: b = [c^T \; d^T]^T$.

# Facet

A face of a convex set, not equal to the set, of maximal dimension. If the set is polyhedral, say $LaTeX: P = \{x: Ax \le b\}$, where the defining inequalities are irredundant, then the facets are in one-to-one correspondence with $LaTeX: \{x \in P: A_{i,:} x = b_i\}$ for $LaTeX: i$ such that the equality is not forced – i.e., there exists $LaTeX: x$ in $LaTeX: P$ for which $LaTeX: A_{i,:} x < b_i$. Here $LaTeX: A_{i,:}$ is the i-th row of $LaTeX: A$.

# Facet-defining inequality

In integer programming, an inequality, $LaTeX: a^Tx \ge b$, such that $LaTeX: \{x \in P: a^Tx \ge b\}$ is a facet of the integer polyhedron, $LaTeX: P$.

# Facility location problem

Find a location (say x-y coordinates) that minimizes a weighted sum of distances to each of several locations. An example is to locate a hospital or school to serve a community defined by population centers. Originally considered by Fermat in the 17th century, there are now many variations of this problem. One variation is the location of an obnoxious facility, such as a garbage dump. In that case the objective is to maximize total (weighted) distance from the given points (there are constraints, such as cost, that require the location to be within some distance of the points).

# Factorable function

This has a recursive definition, relative to a collection of elementary operations (e.g., sum, product) on other factorable functions, starting with some set of primitive functions. For example, starting with the power functions, $LaTeX: ax^n$, for any $LaTeX: a$ and $LaTeX: n$, and allowing only the sum operation, we can obtain all polynomials. The polynomial $LaTeX: x + 2x^2 + 9x^3$ factors into 3 parts.

# Factorable program

All functions (objective and constraints) are factorable. An example is the separable program, provided that each of the univariate functions is factorable. Another is the geometric program, where each posynomial is factorable.

# Factored form of basis

Originally for linear programming, this pertains to writing a basis $LaTeX: B$ in the form, $LaTeX: B = F_1 F_2 ... F_k$ , where $LaTeX: F_i$ are factors. Two forms of interest are: elementary factors, where each $LaTeX: F_i$ is an elementary matrix, and LU decomposition: $LaTeX: B=LU$, where $LaTeX: L$ is lower triangular and $LaTeX: U$ is upper triangular. ($LaTeX: L$ and $LaTeX: U$ might be factored, notably into elementary matrices, which are lower and upper triangular, resp.) These have inexpensive updates when $LaTeX: B$ changes to an adjacent basis.

# Fail first principle

The fail first principle is an idea of what a variable ordering heuristic should attempt to do in tree search. It suggests that the variable to be assigned next should be the one which is most likely to lead to a dead-end. The heuristic aims at proving that the search is in a sub-tree with no feasible solutions as soon as possible.

One way of operationalizing this principle, is to choose the next variable to assigned to be the variable which is the most constrained. The extent to which a variable is constrained can be measured in different ways. One simple measure being the current size of the domain. Under this measure, the variable which has the smallest domain should be assigned next. Another common measure is to select the variable that appears in the largest number of constraints. More sophisticated estimates exist.

# Faithfully convex

A convex function is faithfully convex if it is constant along some segment only if it is constant along the whole line containing this segment.

# Fallacy of averages

Imagine standing with one foot on a keg of ice and the other in a fire. Your average body temperature may be fine, but the extremes will kill you! The fallacy of averages is the fallacious results you may get when replacing data with their expected values. Formally, the fallacy is stated as $LaTeX: E(XY) \ne E(X)E(Y)$ – viz., the covariance is not zero. Another form of this fallacy is that $LaTeX: E(f(X)) \ne f(E(X))$ (unless $LaTeX: f$ is linear). In particular, suppose we have

$LaTeX: \max f(x; p): g(x; p) \le 0$,

where $LaTeX: p$ is a vector of parameters with some uncertainty. The fallacy of averages suggests that it is a mistake to replace $LaTeX: p$ with its expected value for at least 2 reasons: (1) members of $LaTeX: p$ may be correlated, and (2) the average values of $LaTeX: f$ and $LaTeX: g$ need not equal the functional values at the mean.

# Farkas lemma

This result gives an alternative system, of which there are many. Farakas' Lemma states that exactly one of the following is consistent,

$LaTeX: Ax = b, \; x \ge 0$   (exclusive or)   $LaTeX: A^Ty \ge 0, \; b^Ty < 0$.

An often used variant is

$LaTeX: Ax \ge b$   (exclusive or)   $LaTeX: A^Ty = 0, \; y \ge 0, \; b^Ty < 0$.

# Fathom

Arises in directed tree search. A node is fathomed if it is determined that no completion from this point could produce a better solution than one already obtained. This could happen by bounding the objective, or it could happen by determining there is no feasible solution with the partial specifications corresponding to the node.

# Feasibility map

The point-to-set map that maps a right-hand side to a feasible region:

$LaTeX: (b, c) \mapsto \{x \in X: g(x) \le b, \; h(x) = c\}$.

# Feasible

A point is feasible if it satisfies all constraints. The feasible region (or feasibility region) is the set of all feasible points. A mathematical program is feasible if its feasible region is not empty.

The term feasible doesn't imbue other properties such as convexity or connectedness. For example, a feasible region to a nonlinear program could be $LaTeX: \{ x : x^2 \ge 1 \}$, which is the disjoint union $LaTeX: \{x : x \ge 1 \} \cap \{x : x \le -1\}$.

# Feasible direction

Given a feasible point $LaTeX: x$, a direction vector, say $LaTeX: d$, is feasible if there exists $LaTeX: s > 0$ such that $LaTeX: x + sd$ is feasible. The method of feasible directions is designed to choose a feasible direction at each iteration, along which (as $LaTeX: s$ becomes positive) the objective value improves. Such directions exist for continuous mathematical programs (where the line segment $LaTeX: [x, x + sd]$ is feasible for some $LaTeX: s > 0$) unless $LaTeX: x$ is a local optimum. (Note: with nonlinear constraints, there is typically no feasible direction according to this (classical) definition. See the generalized reduced gradient method.)

# Fibonacci search

This finds the maximum of a unimodal function on an interval, $LaTeX: [a, b]$, by evaluating points placed according to a fibonacci sequence, $LaTeX: F_N$. If there are $LaTeX: F_N$ points in the interval, only $LaTeX: N$ evaluations are needed. In the continuous case, we begin with some interval of uncertainty, $LaTeX: [a,b]$, and we reduce its length to $LaTeX: (b-a)/F_N$. The ratio, $LaTeX: g_n = F_(n-1)/F_n$, is the key to the placements.

Here is the method for the continuous case:

1. Initialization. Let $LaTeX: x = a + (1 - g_N)(b-a)$ and $LaTeX: y = a + g_N(b-a)$. Evaluate $LaTeX: f(x)$ and $LaTeX: f(y)$ and set $LaTeX: n=N$.
2. Iteration. If $LaTeX: f(x) < f(y)$, reduce the interval to $LaTeX: (x, b]$ (i.e., set $LaTeX: a=x$), decrement $LaTeX: n$ to $LaTeX: n-1$, and set $LaTeX: x = y$ and $LaTeX: y = a + g_n(b-a)$. If $LaTeX: f(x) \ge f(y)$, reduce the interval to $LaTeX: [a, y)$ (i.e., set $LaTeX: b=y$), decrement $LaTeX: n$ to $LaTeX: n-1$, and set $LaTeX: y = x$ and $LaTeX: x = a + (1-g_n)(b-a)$.

The fibonacci search method minimizes the maximum number of evaluations needed to reduce the interval of uncertainty to within the prescribed length. For example, it reduces the length of a unit interval $LaTeX: [0,1]$ to 1/10946 = 0.00009136 with only 20 evaluations. In the case of a finite set, fibonacci search finds the maximum value of a unimodal function on 10,946 points with only 20 evaluations, but this can be improved – see lattice search.

For very large $LaTeX: N$, the placement ratio $LaTeX: (g_N)$ approaches the golden mean, and the method approaches the golden section search. Here is a comparison of interval reduction lengths for fibonacci, golden section and dichotomous search methods. In each case $LaTeX: N$ is the number of evaluations needed to reduce length of the interval of uncertainty to $LaTeX: 1/F_N$. For example, with 20 evaluations dichotomous search reduces the interval of uncertainty to $LaTeX: 0.0009765$ of its original length (with separation value near $LaTeX: 0$). The most reduction comes from fibonacci search, which is more than an order of magnitude better, at $LaTeX: 0.0000914$. Golden section is close (and gets closer as N gets larger).

    Evaluations  Fibonacci      Dichotomous     Golden section
N          1/F_N         1/2^(N/2)      1/(1.618)^(N-1)
========================================================================
5         0.125            0.25            0.146
10         0.0112           0.0312          0.0131
15         0.00101          0.00781         0.00118
20         0.0000914        0.000976        0.000107
25         0.00000824       0.0002441       0.0000096
========================================================================



# Fibonacci sequence

The numbers satisfying: $LaTeX: F_{n+2} = F_{n+1} + F_n$, with initial conditions $LaTeX: F_0 = F_1 = 1$. As shown in the following table, the fibonacci sequence grows rapidly after $LaTeX: N=10$, and $LaTeX: F_N$ becomes astronomical for $LaTeX: N > 50$,

         N    F_N
==============
0      1
1      1
2      2
3      3
4      5
5      8
6     13
7     21
8     34
9     55
10     89
20  10946
50    2.0E10
100    5.7E20
==============


Named after the 13th century mathematician who discovered it, this sequence has many interesting properties, notably for an optimal univariate optimization strategy, called fibonacci search.

# Fill-in

Arises in the context of sparse matrices subjected to certain operations. In particular, a basis may be factored in a product of elementary matrices to represent Gaussian elimination. The nonzeroes that appear in positions that were not in the original matrix are called fill-in coefficients. An example of a matrix that has no fill-in for this factorization is one that is lower triangular. In that case the factors appear as:

$LaTeX: \left[ \begin{array}{ccc}

• & 0 & 0 \\ * & * & 0 \\ * & * & *

\end{array} \right] = \left[ \begin{array}{ccc}

• & 0 & 0 \\ * & 1 & 0 \\ * & 0 & 1

\end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & * & 0 \\ 0 & * & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & * \end{array} \right]$

On the other hand, a lower triangular matrix could cause fill-in for some other factorization, such as the Cholesky factorization. A completely dense matrix also has no fill-in since there were no zeroes to begin with. Here is an example of fill-in, taking the original order of rows and columns in the product form:

$LaTeX: \left[ \begin{array}{ccc}

• & 0 & * \\ * & * & 0 \\ * & * & *

\end{array} \right] = \left[ \begin{array}{ccc}

• & 0 & 0 \\ * & 1 & 0 \\ * & 0 & 1

\end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & * & 0 \\ 0 & * & 1 \end{array} \right] \left[ \begin{array}{ccc} 1 & 0 & * \\ 0 & 1 & \# \\ 0 & 0 & * \end{array} \right],$

where $LaTeX: \#$ is fill-in.

# First-order conditions

This descends from classical optimization, using Taylor's Theorem. For unconstrained optimization, this is simply that $LaTeX: \nabla f(x*) = 0$. (In variational calculus, it is the Euler-Lagrange equation.) For constrained optimization, the first-order conditions of a regular mathematical program is given by the Lagrange Multiplier Rule.

# Fixed charge

A cost that is some value, say $LaTeX: C$, regardless of the level as long as the level is positive; otherwise the fixed charge is zero. This is represented by $LaTeX: Cv$, where $LaTeX: v$ is a binary variable. If $LaTeX: v=0$, the fixed charge is 0; if $LaTeX: v=1$, the fixed charge is $LaTeX: C$. An example is whether to open a plant $LaTeX: (v=1)$ or not $LaTeX: (v=0)$. To apply this fixed charge to the non-negative variable $LaTeX: x$, the constraint $LaTeX: x \le Mv$ is added to the mathematical program, where $LaTeX: M$ is a very large value, known to exceed any feasible value of $LaTeX: x$. Then, if $LaTeX: v=0$ (e.g., not opening the plant that is needed for $LaTeX: x > 0$), $LaTeX: x=0$ is forced by the upper bound constraint. If $LaTeX: v=1$ (e.g., plant is open), $LaTeX: x \le Mv$ is a redundant upper bound. Fixed charge problems are mathematical programs with fixed charges. (See Myths and Counterexamples in Mathematical Programming to avoid a misconception.)

# Fixed point

The fixed point of a function, $LaTeX: f:X \rightarrow X$, is an element $LaTeX: x \in X$ such that $LaTeX: f(x)=x$. Of a point-to-set map, $LaTeX: F:X \rightarrow 2^X$, we instead have that $LaTeX: x \in F(x)$. The study of fixed points has been at the foundation of algorithms. Moreover, forcing substructure.

# Fixed variable

A variable whose value is fixed, either explicitly or implicitly. Explicit fixing arises for convenience of scenario design, particularly in linear programming, and during an algorithm's progression, particularly in integer programming. Implicit fixing occurs from a forcing substructure.

# Fleet mix problem

To determine how many of each type of aircraft should be in a fleet to meet demands and minimize total cost. Here is an integer linear program model:

$LaTeX: \min c^T x : a \ge Ax \ge b, \; x_j \in \{0,1,...,N_j\},$

where $LaTeX: N_j$ is the number of aircraft of type $LaTeX: j$ available; $LaTeX: A_{ij}$ is the capacity of aircraft type $LaTeX: j$ for mission $LaTeX: i$; $LaTeX: a_i$ is the least number of missions of type $LaTeX: i$ that must be flown; $LaTeX: b_i$ is the greatest number of missions of type $LaTeX: i$ that must be flown. The variables are $LaTeX: x_j$ are the number of aircraft of type $LaTeX: j$ in the fleet, and $LaTeX: c_j$ is its maintenance cost. If the aircraft must be purchased, binary variables are introduced, as $LaTeX: x_j - N_j y_j \le 0$, with a fixed charge, $LaTeX: fy$, in the objective $LaTeX: (f > 0)$. There could be additional constraints, such as a budget on total purchases $LaTeX: (fy \le f_0)$ or on total maintenance $LaTeX: (gx \le g_0)$.

# Floor

This is the integer round-down of a value, $LaTeX: x$:

$LaTeX: \mbox{Floor}(x) = \max \{z: z \in \mathbb{Z}, \; z \le x\}.$

Examples: $LaTeX: \mbox{Floor}(5)=5$, $LaTeX: \mbox{Floor}(4.999)=4$, $LaTeX: \mbox{Floor}(-1.2) = -2$. Also, see ceiling.

# Flow augmenting path

This arises in the Ford-Fulkerson labeling algorithm to find a maximum flow in a network with arc flow bounds, $LaTeX: L$ and $LaTeX: U$. Given a flow, $LaTeX: f$, from a source to a sink, a flow augmenting path, relative to $LaTeX: f$, is a path from that source to the sink such that

$LaTeX: f(x, y) \lt U(x, y)$

along all forward arcs, and

$LaTeX: f(x, y) \ge L(x, y)$

along all backward arcs.

The flow, $LaTeX: f<$, is a maximum flow from the source to the sink if and only if there is no flow augmenting path relative to $LaTeX: f$. If there is a flow augmenting path, it can be changed along the path such that total flow increases (by at least 1 if all data are integer).

# Forced equality

This is an inequality constraint that must hold with equality in every feasible solution.

# Forcing substructure

A subsystem of the constraints and the variables in them that forces some value. For example, the following system is a forcing substructure because each variable, which is required to be non-negative, is forced to be zero.

$LaTeX: \begin{array}{rcl} -x -y & = & 0 \\ y - z & = & 0 \\ y & \ge & 0 \\ z & \ge & 0. \end{array}$

# Forward checking

Forward checking is a propagation procedure that guarantees that at each step of the search, all the constraints between already assigned variables and not yet assigned variables are arc consistent.

Formally, let $LaTeX: N = (X, D, C)$ be a binary constraint network and $LaTeX: Y \subseteq X$ such that $LaTeX: |D(x_i)| = 1$ for all $LaTeX: x_i \in Y$. $LaTeX: N$ is forward checking consistent according to the instantiation $LaTeX: I$ on $LaTeX: Y$ iff $LaTeX: I$ is locally consistent and for all $LaTeX: x_i \in Y$, for all $LaTeX: x_j \in X \setminus Y$, for all $LaTeX: c(x_i, x_j) \in C$, $LaTeX: c(x_i, x_j)$ is arc consistent.

# Forward substitution

The recursion to solve a nonsingular lower triangular system, $LaTeX: Lx=b$. It starts with $LaTeX: x_1 = b_1 / L_{1,1}$, then

$LaTeX: x_j = \left( b_j - \sum_{i < j} L_{i, j} x_i \right) / L_{j,j}.$

for $LaTeX: j=2,...,n$.

See FTRAN.

# Forward triangularization

An algorithm to rearrange a matrix by recursively assigning a singleton row to the front (with its only column, as its pivot). The recursion applies to the submatrix defined by omitting the assigned row and column (and reducing other row counts accordingly). This results in a lower triangular rearrangement if and only if such a rearrangement exists.

# Fourier-Motzkin elimination

A procedure by which a variable is eliminated in a system of linear inequalities. The remaining inequalities, which include some new ones, has a solution if and only if the original system has a solution. Eventually, this reduces to one variable or an inconsistency is determined during the elimination. The one variable can be assigned any value in its range, then a backtracking procedure can be executed to obtain values of the other variables, which were eliminated. Originally presented by Fourier in 1826, Motzkin analyzed it in 1933 in light of new developments of the theory of linear inequalities to solve linear programs. Its computational complexity is exponential, and it gave way to the simplex method.

# Fractional program

Objective and/or constraint functions are sums of ratios of the form $LaTeX: V(x)/D(x)$, where typically $LaTeX: D(x) > 0$ over some domain $LaTeX: X$. The case of one term is special, and the linear fractional program has affine numerator and denominator. The linear 1-term case, $LaTeX: (ax+b)/(cx+b)$ (where $LaTeX: cx+b > 0$ over the feasible region), is equivalent to solving a parametric linear program:

$LaTeX: \mbox{opt } (ax+b) - u(cx+b) : x \in X,$
where $LaTeX: u$ is the parameter.

# Frank-Wolfe Theorem

If a quadratic function is bounded from below on a nonempty polyhedron, it attains a minimum there. In mathematical notation, we are given that $LaTeX: X$ is a nonempty polyhedron. Then, if there exists $LaTeX: L$ such that $LaTeX: f(x) \ge L$ for all $LaTeX: x \in X$, there exists $LaTeX: x^* \in X$ such that $LaTeX: f(x^*) \le f(x)$ for all $LaTeX: x \in X$.

# Free variable

A variable with no (explicit) sign restriction. (The term is generally used in linear programming because the standard form has x >= 0.)

# Fritz John conditions

These extend Carathéodory's conditions to deal with inequality constraints:

Suppose $LaTeX: x^*$ is in $LaTeX: \arg\!\max\{f(x): x \in \mathbb{R}^n, \; g(x) \le 0\}$, where $LaTeX: f$ and $LaTeX: g$ are in smooth. Then, there exists multipliers $LaTeX: (y_0, y) \ge 0$, not all zero, such that

$LaTeX: y_0 \nabla f(x) - y^T \nabla g(x) = 0 \mbox{ and } y^Tg(x) = 0.$

# Fuzzy CSP

In a fuzzy constraint satisfaction problem each constraint $LaTeX: C$ and instantiation $LaTeX: I$ is assigned a degree of satisfaction $LaTeX: sat(C,I)$, which is a value in the [0,1] real interval indicating the degree that $LaTeX: C$ is satisfied by $LaTeX: I$. If this value is 1, $LaTeX: C$ is satisfied and if this value is 0, $LaTeX: C$ is violated. In the most common interpretation of fuzzy CSPs, the task is to find an instantiation $LaTeX: I$ for which the minimum of $LaTeX: sat(C,I)$ with $LaTeX: C$ ranging over all the constraints (i.e. the smallest degree of satisfaction for the instantition $LaTeX: I$) is maximal.

# Fuzzy math program

Some (or all) constraints are fuzzified by replacing the level set, $LaTeX: G_i = \{x \in X : g_i(x) \le 0\}$, with the requirement that $LaTeX: u_G_i(x) \ge a_i$, where $LaTeX: u_G_i$ is a membership function that is given (or derived from primitive membership functions on the level sets of each $LaTeX: g_i$, and $LaTeX: a_i$ is a parameter in $LaTeX: [0,1]$ to be set for each instance of the model. Typically (but not necessarily), $LaTeX: a_i = 1$ means that the constraint must hold, and the lower the value of $LaTeX: a_i$, the more $LaTeX: x$'s are admitted. (This appears similar to the chance-constraint model of stochastic programming, but it is more closely related to goal programming.) While fuzzy sets are used to represent uncertainty, they can also be used to represent preferences, e.g. a requirement, $LaTeX: u_S(x) \ge u_T(x)$ could mean that it is preferred that $LaTeX: x$ be in $LaTeX: S$ at least as much as it is preferred that $LaTeX: x$ be in $LaTeX: T$.

# Fuzzy set

Given a universe set, $LaTeX: X$, and a membership function, $LaTeX: u : X \rightarrow [0,1]$, a fuzzy set is a collection of pairs: $LaTeX: \{(x, u(x)): x \in X\}$. Often, the membership function is subscripted by the set name, say $LaTeX: u_S$. Generally, for all $LaTeX: x \in X$, $LaTeX: u_{X}(x)=1$, and $LaTeX: u_{\emptyset}(x)=0$. In the context of uncertainty, the value $LaTeX: u_{S}(x)$ is used to model the statement of how possible it is for $LaTeX: x$ to be in $LaTeX: S$. For this reason, $LaTeX: u_S$ is sometimes called the possibility function of $LaTeX: S$. What we consider the usual set (without a membership function) is called a crisp set in fuzzy mathematics.

Fuzzy set operations, say on fuzzy sets $LaTeX: S$ and $LaTeX: T$, with membership functions $LaTeX: u_S$ and $LaTeX: u_T$, resp., are defined by the following:

• Union: $LaTeX: u_{S\vee T}(x) = \max \{u_S(x), u_T(x)\}$.
• Intersection: $LaTeX: u_{S\wedge T}(x) = \min \{ u_S(x), u_T(x)\}$.
• Complement: $LaTeX: u_{~S}(x) = 1 - u_S(x)$.

One must be careful when using fuzzy sets to represent uncertainty (which is not the only type of interpretation – see fuzzy mathematical program). In particular, if $LaTeX: u_S(x) = 1/2$, its complement is also $LaTeX: 1/2$. Thus, $LaTeX: u_{S\vee \sim S}(x) = 1/2$, despite the fact that $LaTeX: S\vee \sim S = X$ (in ordinary set theory). Similarly, $LaTeX: u_{S\wedge \sim S}(x) = 1/2$, despite the fact that $LaTeX: S\wedge \sim S = \emptyset$. This illustrates the fact that $LaTeX: u_S$ need not equal $LaTeX: u_T$ even if $LaTeX: S=T$ as crisp sets.

While the fuzzy set is fundamental for fuzzy mathematical programming, other concepts in fuzzy mathematics also apply, such as fuzzy arithmetic, fuzzy graphs and fuzzy logic.

# GAMS

GAMS. Generalized Algebraic Modeling System.

# GRASP

GRASP stands for Greedy Randomized Adaptive Search Procedures. It is a metaheuristic that uses restart to search many portions of the solution space. Once it is started, some Heuristic is employed to reach a local optimum. The choice for the local search can be problem-specific, such as using an n-opt neighborhood for the TSP, or it can use another metaheuristic, like simulated annealing or tabu search.

# Game theory

In general, a (mathematical) game can be played by one player, such as a puzzle, but its main connection with mathematical programming is if there are at least two players, and they are in conflict. Each player chooses a strategy that maximizes his payoff. If there are exactly two players and one player's loss is the other's gain, the game is called zero sum. In this case, a payoff matrix, $LaTeX: A$, is given where $LaTeX: A_{ij}$ is the payoff to player 1, and the loss to player 2, if player 1 uses strategy i and player 2 uses strategy j. In this representation each row of $LaTeX: A$ corresponds to a strategy of player 1, and each column corresponds to a strategy of player 2. If $LaTeX: A$ is $LaTeX: m \times n$, this means player 1 has m strategies, and player 2 has n strategies. Here is an example of a $LaTeX: 2 \times 3$ payoff matrix:

$LaTeX: A = \left[ \begin{array}{rrr} 10 & -5 & 20 \\ -15 & 5 & -15 \end{array} \right]$

A primary connection with mathematical programming is duality. Suppose $LaTeX: (x_1, x_2, \ldots, x_m)$ and $LaTeX: (y_1, y_2, \ldots, y_n)$ are strategy probability vectors chosen by the two players, i.e. in a particular play, $LaTeX: x_i$ is the probability player 1 chooses strategy i and $LaTeX: y_j$ is the probability player 2 chooses strategy j. Then, the value, $LaTeX: v$ of the game is the expected payoff:

$LaTeX: v = x A y^T = \sum_{ij} x_i A_{ij} y_j$

where player 1 seeks to maximize $LaTeX: v$ and player 2 seeks to minimize $LaTeX: v$. A pure strategy is a unit vector. For example, if $LaTeX: x = (1, 0, 0, \ldots, 0)$, then player 1 is using his first strategy 100% of the time; if $LaTeX: y = (0, 1, 0, 0, \ldots, 0)$, player 2 is using his second strategy 100% of the time. The expected payoff is simply the cell entry, $LaTeX: A_{12}$, which is -5 in the example (player 2 pays 5 units to player 1). The strategies that are not pure are called mixed. For example, $LaTeX: x = (1/2, 1/2)$ means player 1 plays each of the two strategies 50% of the time (flipping a coin on each play); $LaTeX: y =(1/4, 3/4, 0)$ means player 2 plays his first strategy 25% of the time and his second the remaining 75%. The expected payoff is -5/8.

The fundamental duality theorem of linear programming, see the supplement on mathematical programming duals,

provides the equivalent primal-dual formulation:

Primal Dual Player 1 is Player 2 is $LaTeX: \max v : x \ge 0, \; \sum_i x_i = 1,$ $LaTeX: \min v : y \ge 0, \; \sum_j y_j = 1,$ $LaTeX: v \le \sum_i x_i A_{ij} \; \forall j$ $LaTeX: v \ge \sum_j A_{ij}y_j \; \forall i$

This is the simplest 2-person game; more generally, there can be n persons, and subsets of the players can form cooperative agreements. If this is not allowed, the game is called non-cooperative.

A combinatorial game is a cooperative game with restrictions on the formation of coalitions that induce a combinatorial structure, such as a matroid.

# Gauge function

A nonnegative, convex function, from $LaTeX: \mathbb{R}^n$ to the extended reals that is positively homogeneous and $LaTeX: f(0)=0$. A norm is a gauge function. Here is an example that is not a norm:

$LaTeX: f(x,y,z) = \left\{ \begin{array}{rl}

 \| (x-z,\; x-y) \|_2, & 2x - y - z = 0 \\ \\ \infty, & \mbox{ otherwise.} \end{array} \right.

$

# Gauss-Jordan elimination

A method to solve $LaTeX: Ax=b$ that performs elementary row and column operations on $LaTeX: A$ to annihilate successive elements of $LaTeX: A$ in order to reduce $LaTeX: A$ to an identity matrix. On paper, the same operations are applied to $LaTeX: b$, then the solution is obtained by solving the resulting identity system. In a computer, the matrices effecting the elementary operations are saved as elementary matrices, say $LaTeX: E_i$ for the i-th operation. Then, the system is equivalent to $LaTeX: E_1 E_2 \ldots E_n x = b$, and forward transformation is applied to solve for $LaTeX: x$. This is what is done in the (revised) simplex method, and each iteration is a pivot operation.

# Gauss-Seidel method

An iterative method to solve $LaTeX: Ax=b$ using the latest values of the components of $LaTeX: x$ during the updates. Specifically, $LaTeX: A$ is split into $LaTeX: L-T$, where $LaTeX: L$ is lower triangular, and the update solves$LaTeX: Lx' = Tx + b$. (Surprisingly, history indicates Gauss did not know about this method, and Seidel did not recommend it.) Often people refer to the method to mean the use of latest values, rather than the specific splitting. It arises, for example, in parallel computation, where using the latest value of $LaTeX: x$ during the computation of $LaTeX: x'$ is not done since it would not be able to take advantage of the parallelism.

# Gaussian elimination

A method to solve $LaTeX: Ax=b$ that performs elementary row operations on $LaTeX: A$ to annihilate successive elements of $LaTeX: A$ in order to reduce $LaTeX: A$ to an upper triangular matrix, $LaTeX: U$. On paper, the same operations are applied to $LaTeX: b$, then the solution is obtained by solving the resulting upper triangular system. In a computer, the product of the matrices effecting the elementary row operations is a lower triangular matrix, $LaTeX: L$, with unit diagonal. Once this phase is completed, the system $LaTeX: Ax=b$ becomes $LaTeX: LUx=b$. This is then solved in two steps: forward substitution solves $LaTeX: Ly=b$; then backward substitution solves $LaTeX: Ux=y$. (Of course, computer implementations vary.)

# Generalized Lagrange multiplier method

The Generalized Lagrange Multiplier method (GLM) solves a sequence of Lagrangian optimization (relaxation) problems, searching the multiplier space by some method to seek a minimum of the maximum Lagrangian function.

1. Given $LaTeX: (u, v)$ with $LaTeX: u \ge 0$, find $LaTeX: x \in \arg\!\max \{f(x) - u^T g(x) - v^T h(x) : x \in X\} = L^*(u,v)$.
2. Test if 0 is in the convex hull of $LaTeX: S(x, u, v) := \{(b, c): g(x) \le b, \; h(x) = c,\; u^T g(x) =u^T b\}$. If so, stop ($LaTeX: x$ solves the original mathematical program if 0 is in $LaTeX: S(x, u, v)$; otherwise, search alternative optima; if no maximum generates 0 as a member of $LaTeX: S$, then the right-hand side (0) is in a duality gap).
3. Given $LaTeX: (x, u, v)$, choose $LaTeX: (u', v')$ to reduce the value of $LaTeX: L^*(u, v)$.

This can be viewed from several vantages:

# Generalized arc consistency

An extension of arc consistency to constraints with more than two variables in their scope.

A variable $LaTeX: x$ is generalized arc consistent (GAC) with a constraint if every value of the variable can be extended to all the other variables of the constraint in such a way the constraint is satisfied.

Generalized arc consistency is one of the most commonly enforced forms of consistency in constraint programming.

GAC has also been called: hyper-arc consistency and domain consistency.

# Generalized equation

This has the form: 0 is in $LaTeX: \{f(x)\} + T(x)$, where $LaTeX: f$ is a function from $LaTeX: \mathbb{R}^n$ into $LaTeX: \mathbb{R}^n$ and $LaTeX: T$ is a point to set map that maps $LaTeX: x$ into subsets of $LaTeX: \mathbb{R}^n$. If $LaTeX: T$ is absent, the condition reduces to an ordinary equation, $LaTeX: f(x)=0$. Usually, $LaTeX: T$ is assumed to satisfy the monotonicity condition:

$LaTeX: (x-x')^T(y-y') \ge 0 \mbox{ for } y \in T(x) \mbox{ and } y' \in T(x').$

This includes a variational inequality (and hence the complementarity problem) as a special case.

# Generalized inverse

Suppose $LaTeX: A$ is any $LaTeX: m \times n$ matrix. $LaTeX: A^+$ is a generalized inverse of $LaTeX: A$ if $LaTeX: A^+$ is $LaTeX: n \times m$ and $LaTeX: AA^+A = A$. Then, a fundamental theorem for linear equations is:

The equation, $LaTeX: Ax = b$, has a solution if and only if $LaTeX: AA^+b = b$ for any generalized inverse, $LaTeX: A^+$, in which case the solutions are of the form:
$LaTeX: x = A^+b + (I - A^+A)y$   for any   $LaTeX: y \in \mathbb{R}^n.$

Here is an Example.

The Moore-Penrose class additionally requires that $LaTeX: AA^+$ and $LaTeX: A^+A$ be symmetric (hermitian, if $LaTeX: A$ is complex). In particular, if $LaTeX: \mbox{rank}(A) = m$, $LaTeX: A^+ = A^T (AA^T)^{-1}$ is a Moore-Penrose inverse. Note that $LaTeX: AA^+ = I$, and $LaTeX: A^+A$ is a projection matrix for the subspace $LaTeX: \{x: x = A^T y \mbox{ for some } y \in \mathbb{R}^m\}$, since $LaTeX: x = A^T y$ implies $LaTeX: A^+Ax = A^T (AA^T)^{-1} AA^T y = A^T y = x$. Further, $LaTeX: I-A^+A$ projects into its orthogonal complement, which is the null space of $LaTeX: A$, i.e. $LaTeX: A(I-A^+A)x = (A-A)x = 0$ for any $LaTeX: x \in \mathbb{R}^n$.

# Generalized network

A network in which the flow that reaches the destination could be different from the flow that left the source. In the incidence matrix, instead of

$LaTeX: \left[ \begin{array}{crc}

 \ldots & \vdots & \ldots \\ \ldots & -1 & \ldots \\ \ldots & \vdots & \ldots \\ \ldots & +1 & \ldots \\ \ldots & \vdots & \ldots

\end{array} \right] \begin{array}{l} \leftarrow \mbox{ source row } (i) \\ \\ \\ \leftarrow \mbox{ destination row } (j) \end{array}$

we have

$LaTeX: \left[ \begin{array}{crc}

 \ldots & \vdots & \ldots \\ \ldots & -1 & \ldots \\ \ldots & \vdots & \ldots \\ \ldots & g & \ldots \\ \ldots & \vdots & \ldots

\end{array} \right] \begin{array}{l} \leftarrow \mbox{ source row } (i) \\ \\ \\ \leftarrow \mbox{ destination row } (j) \end{array}$

where $LaTeX: g > 0$. If $LaTeX: g < 1$, there is a loss; if $LaTeX: g > 1$, there is a gain.

The Generalized reduced gradient method (GRG) is a generalization of the reduced gradient method by allowing nonlinear constraints and arbitrary bounds on the variables. The form is:

$LaTeX: \max f(x) : h(x) = 0, \; L \le x \le U,$

where $LaTeX: h$ has dimension $LaTeX: m$. The method supposes we can partition $LaTeX: x = (v, w)$ such that:

• $LaTeX: v$ has dimension $LaTeX: m$ (and $LaTeX: w$ has dimension $LaTeX: n-m$);
• the values of $LaTeX: v$ are strictly within their bounds: $LaTeX: L_v < v < U_v$ (this is a nondegeneracy assumption);
• $LaTeX: \nabla_v h(x)$ is nonsingular at $LaTeX: x = (v,w)$.

As in the linear case, for any $LaTeX: w$ there is a unique value, $LaTeX: v(w)$, such that $LaTeX: h(v(w),w)=0$ (c.f., Implicit Function Theorem), which implies that $LaTeX: dv/dw = (\nabla_v h(x))^{-1} \nabla_w h(x)$. The idea is to choose the direction of the independent variables to be the reduced gradient: $LaTeX: \nabla_w (f(x) - y^T h(x))$, where $LaTeX: y = dv/dw = (\nabla_v h(x))^{-1}\nabla_w h(x)$. Then, the step size is chosen and a correction procedure applied to return to the surface, $LaTeX: h(x)=0$.

The main steps (except the correction procedure) are the same as the reduced gradient method, changing the working set as appropriate.

# Generalized upper bound

A Generalized upper bound (GUB) is an upper bound constraint on a sum of variables: $LaTeX: \textstyle\sum_{j \in J} x_j \le U$. A collection of these that do not overlap (i.e., index sets $LaTeX: J$ are disjoint) comprise the foundation for the generalized upper bounding technique in linear programming. This is where the summation equation is marked in the data structure, rather than represented explicitly as a row in the LP, and the basis is handled in its reduced dimension to gain computational efficiency in both time and space.

# Genetic algorithm

Genetic algorithms (GAs) are a class of algorithms inspired by the mechanisms of genetics, which has been applied to global optimization (especially for combinatorial programs). It requires the specification of three operations (each is typically probabilistic) on objects, called "strings" (these could be real-valued vectors):

1. Reproduction - combining strings in the population to create a new string (offspring);

Example: Taking 1st character from 1st parent + rest of string from 2nd parent:

$LaTeX: [001001] + [111111] \rightarrow [011111]$
2. Mutation - spontaneous alteration of characters in a string;

Example: Just the left-most string:

$LaTeX: [001001] \rightarrow [101001]$

3. Crossover - combining strings to exchange values, creating new strings in their place.

Example: With crossover location at 2:

$LaTeX: [001001] & [111111] ===> [001111], [111001]$
These can combine to form hybrid operators, and the reproduction and crossover operations can include competition within populations. Here is a generic GA (strategy):

 0. Initialize population. 1. Select parents for reproduction and GA operators (viz., mutation and crossover). 2. Perform operations to generate intermediate population and evaluate their fitness values. 3. Select members of population to remain with new generation. Repeat 1-3 until some stopping rule is reached.

The original GA (1973, by John Holland) used crossover and total population replacement. This means a population with 2N objects (called chromosomes) form N pairings of parents that produce 2N offsprings. The offsprings comprise the new generation, and they become the total population, replacing their parents. More generally, a population of size N produces an intermediate population of N+M, from which Ñ is kept to form the new population. One way to choose which Ñ survive is by those with the greatest fitness values – survival of the fittest.

See convergence.

# Geometric mean

Given a nonnegative $LaTeX: x \in \mathbb{R}^n$ and a weight vector, $LaTeX: a$ in the simplex, $LaTeX: S_n$, the geometric mean is the value:

$LaTeX: x(1)^a(1) * x(2)^a(2) * ... * x(n)^a(n).$

For example, for $LaTeX: a=(1/2, 1/2)$, the geometric mean of $LaTeX: x = (1, 9)$ is $LaTeX: 1*3=3$. A fundamental inequality that provides a foundation for geometric programming is that the geometric mean is never greater than the arithmetic mean:

$LaTeX: \Pi_j x(j)^a(j) \le \textstyle\sum_j a(j)x(j).$

for all nonnegative $LaTeX: x \in \mathbb{R}^n$ and $LaTeX: a \in S_n$. Further, equality holds only if $LaTeX: x(j)$ is constant for all $LaTeX: j$.

# Geometric program

A mathematical program of the form

$LaTeX: \min P_0(x) : P_k(x) \le 1, \; x > 0,$

where each $LaTeX: P_k$ is a posynomial (k=0,...,m). Conventionally, the monomials are indexed sequentially, so that the posynomials appear as:

$LaTeX: P_{0} = \sum_{i=1}^{N_0} c_{i} \prod_{j} x_{j}^{a_{i,j}}$
$LaTeX: P_{1} = \sum_{i=N_0 + 1}^{N_1} c_{i} \prod_{j} x_j^{a_{i,j}}$
$LaTeX: \vdots$
$LaTeX: P_m = \sum_{i=N_{m-1} + 1}^{N_m} c_i \prod_j x_j^{a_{i,j}}$

For example, consider:

$LaTeX: \min 1/y^3 + 5y/z^2: x, y, z > 0, \; 1/(xyz) \le 1, \; x/y + z/x \le 1.$

Then, $LaTeX: n=3$ (variables), $LaTeX: m=2$ (constraints), $LaTeX: N_0=2$ (monomials in objective), $LaTeX: N_1=3$ ($LaTeX: N_0 + 1$ monomial in constraint 1), and $LaTeX: N_2 = 5$ ($LaTeX: N_1 + 2$ monomials in constraint 2).

The exponent matrix is the $LaTeX: N_m \times n$ matrix whose i-th row contains the exponents of the variables in the i-th monomial. The example has 5 rows and 3 columns:

$LaTeX: A = \left[ \begin{array}{rrr}

 0 & 0 & -3 \\ 0 & 1 & -2 \\ -1 & -1 & -1 \\ 1 & -1 & 0 \\ -1 & 0 & 1 \end{array} \right]

$

The degree of difficulty is the number of terms in all posynomials ($LaTeX: N_m$) minus the number of independent variables (one less than the rank of exponent matrix). In the above example, the rank of $LaTeX: A$ is 3, so the degree of difficulty is 5-3-1 = 1. If the last constraint were not present, only the first three rows comprise the exponent matrix, and its rank is 2. In that case, the degree of difficulty is 0. (Using the <a href="second.php?page=duals.html#Geometric">geometric dual</a>, the solution reduces to solving a system of linear equations, which is what motivates the terminology.)

Also see superconsistent.

Since its inception, the posynomial form has been extended to signomials. In that case, the duality need not be strong, as there can be a duality gap. The general, or signomial, geometric program is of the form:

$LaTeX: \min P_0(x) - Q_0(x): x > 0, \; P_k(x) - Q_k(x) \le 1$

for $LaTeX: i=1,\ldots,m$, where $LaTeX: P_k$ and $LaTeX: Q_k$ are posynomials ($LaTeX: k=0,\dots,m$).

# Global Cardinality constraint

This is a global constraint that is specified on $LaTeX: n$ assignment variables $LaTeX: \langle x_1 \dots x_n \rangle$ and $LaTeX: n^{'}$ count variables $LaTeX: \langle c_{v1} \dots c_{vn^{'}} \rangle$, and enforces that each value $LaTeX: v_i$ is assigned to exactly $LaTeX: c_{vi}$ of the assignment variables, $LaTeX: x_j$. The global cardinality constraint $LaTeX: gcc(x_1\dots x_n), c_{v1} \dots c_{vn^{'}})$ is a generalization of the all different constraint which requires that every value is assigned to at most one variable.

# Global constraint

A global constraint $LaTeX: C={C(i)}$ is a family of scope-parameterized constraints (normally $LaTeX: i \geq 2$), where $LaTeX: i$ is the size of the scope of the constraint. The relation represented by the constraint is often defined implicitly (i.e., intensionally) via a compact set of characteristics that all satisfying tuples must exhibit. For example, the all-different constraint is satisfied by all assignments to its variables where no pair have the same value. Such implicit definitions are then expressed and enforced by the constraint propagator corresponding to the global constraint.

Global constraint may also be explicitly (i.e., extensionally) defined by providing a set of satisfying (or non-satisfying) tuples.

The most common global constraint are:

See convergence.

# Global optimization

Generally refers to mathematical programming without convexity assumptions, which are NP-hard. In general, there could be a local optimum that is not a global optimum. Some authors use this term to imply the stronger condition there are multiple local optima. Some solution strategies are given as heuristic search methods (including those that guarantee global convergence, such as branch and bound). As a process associated with algorithm design, some regard this simply as attempts to assure convergence to a global optimum (unlike a purely local optimization procedure, like steepest ascent). See the Glover's linearization.

# Global optimum

A solution to a mathematical program (as distinct from a local optimum). When the mathematical program is not convex, there could be (and typically are) many local optima that are not global optima.

# Glovers linearization

Given a binary variable $LaTeX: x$ and a linear function $LaTeX: f(w)$ in discrete and/or continuous variables $LaTeX: w$ in $LaTeX: W$ for some set $LaTeX: W$, this linearization reformulates the product $LaTeX: f(w)x$ with a (free) continuous variable $LaTeX: z$ and enforces that $LaTeX: z = f(w)x$ by adding four linear inequalities:

$LaTeX: Lx \le z \le Ux, \; f(w) - U(1-x) \le z \le f(w) - L(1-x),$

where the values $LaTeX: L$ and $LaTeX: U$ are defined as

$LaTeX: L = \min \{ f(w) : w \in W^R\} \mbox{ and } U = \max \{f(w) : w \in W^R\},$

and $LaTeX: W^R$ is any relaxation of $LaTeX: W$.

# Goal program

A goal is neither a constraint nor an objective because it is neither a firm requirement nor a function that must be at an extreme value. In English, a goal represents a relation that is desired, but it may be violated if there is adequate compensation. For example, we may have a budget of $1000, and we prefer to operate within it, but if spending just one more dollar yields a major improvement in operations, we would consider spending$1001.

A goal program is put into a standard form mathematical program in two steps:

1. Add the variable, $LaTeX: v$, and the constraints, $LaTeX: G(x) - v \le 0$ and $LaTeX: v \ge 0$, to measure the level of violation. (If the goal were $LaTeX: H(x) = 0$, instead of $LaTeX: G(x) \le 0$, the added constraint would be $LaTeX: H(x) + u - v = 0$, where $LaTeX: u$ and $LaTeX: v$ are (non-negative) levels of violation below and above the goal, respectively.)
2. Add the penalty term to the objective: $LaTeX: P(v)$, where $LaTeX: P(0)=0$, and $LaTeX: P$ is strictly increasing -- i.e., $LaTeX: v' \ge v$ and $LaTeX: v'_i > v_i$ for some $LaTeX: i$ imply $LaTeX: P(v') > P(v)$.

The resulting mathematical program represents the goal program. If the goal is satisfied, $LaTeX: v=0$; otherwise, the penalty term, $LaTeX: P(v)$, reflects "adequate compensation" for the violation.

# Golden mean

The golden mean, or golden ratio is the positive solution to the quadratic equation, $LaTeX: x^2 + x - 1 = 0$, which is $LaTeX: (\sqrt{5}-1)/2$, or approximately $LaTeX: 0.618$. This has the proportionality property: $LaTeX: x : 1 = (1-x) : x$, which defines the placements of evaluations for the golden section search.

# Golden section search

This finds an interval, $LaTeX: [a,b]$, that contains a maximum of a unimodal function whose domain is $LaTeX: [0,1]$, such that the length of the final interval, $LaTeX: [a,b]$, satisfies: $LaTeX: b-a < e$ (where $LaTeX: e > 0$ is specified). The golden ratio is denoted below by $LaTeX: g$, which is approximately 0.618.

1. Initially compute $LaTeX: f(g)$ and $LaTeX: f(1-g)$, and set the interval of uncertainty to the original endpoints, $LaTeX: a=0$ and $LaTeX: b=1$. In general, we have $LaTeX: x$ and $LaTeX: y$ placed in $LaTeX: [a,b]$, such that the distances from the endpoints are the same:
                 g(b-a)
|----------------|
a------x---------y------b
|----------------|
g(b-a)


2. Given the interval $LaTeX: [a, b]$ contains evaluations $LaTeX: f(x)$ and $LaTeX: f(y)$, with $LaTeX: x < y$, compare: if $LaTeX: f(x) > f(y)$, replace $LaTeX: b$ with $LaTeX: y$; if $LaTeX: f(x) < f(y)$, replace $LaTeX: a$ with $LaTeX: x$. (If $LaTeX: f(x)=f(y)$, this leaves the interval $LaTeX: [x,y]$, in which case evaluate $LaTeX: f(x+g(y-x))$.) One of the following cases prevail:

              *                                          *
*                     *
a------x---------y------b       a------x---------y------b
: drop :       : drop :
f(x) > f(y)                           f(x) < f(y)
x=g(y-a)                              y=g(b-x)

*         *
a------x---------y------b
: drop :         : drop :
f(x) = f(y)



The golden ratio is the limit of successive fibonacci numbers: ($LaTeX: F_{N-1} / F_{N}) \rightarrow g$. Thus, the golden section search approaches Fibonacci search as the number of functional evaluations (N) becomes large.

# Gomory cut

This is an early cutting plane for an integer program. Consider $LaTeX: \{ (x, s) : x \in \mathbb{Z}^n, \; s \in \mathbb{R}^m, Ax + s = b, x \ge 0, s \ge 0 \}$, where $LaTeX: A$ and $LaTeX: b$ have integer values. For any $LaTeX: y \in \mathbb{R}^m$, let $LaTeX: a = A^T y - \lfloor A^T y \rfloor$ and $LaTeX: a_0 = y^T b - \lfloor y^T b \rfloor$ (i.e., $LaTeX: a$ is the fractional part of the linear combination of the equations, and $LaTeX: a_0$ is the fractional part of the same linear combination of the right-hand sides). Also, let $LaTeX: c = y - \lfloor y \rfloor$ (fractional part of $LaTeX: y$). Then, Gomory's cut is:

$LaTeX: a^T x + c^T s \ge a_0.$

The vector $LaTeX: y$ is chosen such that $LaTeX: a_0 > 0$ and the current solution (with $LaTeX: x = 0$) violates this inequality in the LP relaxation.

# Gomory function

A recursively defined class of functions on $LaTeX: \mathbb{Q}^n$ to $LaTeX: \mathbb{Q}$ (rational n-vectors to rationals) using three operations: (1) non-negative, rational linear combinations, (2) integer rounding, and (3) maximum. Letting $LaTeX: R(v)$ be the integer round-up of $LaTeX: v$, i.e. the least integer not less than $LaTeX: v$, we say $LaTeX: f$ is a Gomory function if it is the identity function, $LaTeX: f(x)=x$, or if it has one of the following forms:

1. $LaTeX: f(x) = ag(x) + bh(x)$ for some $LaTeX: a, b \ge 0$ and rational;
2. $LaTeX: f(x) = R(g(x))$;
3. f$LaTeX: (x) = \max \{g(x), h(x)}$;
where $LaTeX: g$ and $LaTeX: h$ are Gomory functions. It can be shown this is equivalent to the point-wise maximum of finitely many Chvátal functions:

$LaTeX: f(x) = \max{C_1(x), C_2(x), \ldots,C_k(x)\},$

where each $LaTeX: C_i$ is a Chvátal function.

This arises in integer linear programming in several ways. One fundamental result is that the optimal value as a function of $LaTeX: b$ is a Gomory function:

$LaTeX: f^*(b) = \inf\{c^Tx: Ax = b, x \ge 0, \; x \in \mathbb{Z}^n \}.$

Here is an example:

$LaTeX: \begin{array}{rcl} f^*(b_1, b_2) & = & \min \{ 7x_1 + 4x_2 + 3x_3 : \\

 & & \;\;\;\;\;\;\;\; 5x_1 + 3x_2 + 2x_3 = b_1 \\ & & \;\;\;\;\;\;\;\; x_1 + x_2 + x_3 = b_2 \\ & & \;\;\;\;\;\;\;\; x \in \mathbb{Z}^3, \; x \ge 0 \} \\ & = & \max \{ b_1 + b_2, \; \lceil b_1/2 - b_2 / 2 \rceil \}

\end{array}$

Note: The above definition assumes minimization; if the ILP is maximization, $LaTeX: R(.)$ would be round-down (i.e., greatest integer not exceeding the value), and condition 3 would be the point-wise minimum.

# Gomory group

The group associated with the corner polyhedron problem:

$LaTeX: \left\{v \in \mathbb{Z}^n : \sum_j r(A_j)v_j = r(b),\; v \ge 0 \right\},$

where $LaTeX: v$ corresponds to the nonbasic variables and $LaTeX: r(d)$ is the remainder, $LaTeX: d\!\mod |B|$, where $LaTeX: |B|$ is the determinate of the basis, $LaTeX: B$. The idea is that the nonbasic value chosen in this group, together with the basic levels imputed by the equations $LaTeX: Ax=b$, yields an integer solution (but a basic level might be negative).

Vector of first partial derivatives of a function (assumed to be differentiable at least once). Typically denoted $LaTeX: \nabla f(x)$, where $LaTeX: f$ is the function and $LaTeX: x$ is a point in its domain.

A feasible direction method by projecting the gradient into the working surface, $LaTeX: \{x: Ax=b\}$. Suppose $LaTeX: A$ has full row rank. Then, $LaTeX: P = I - A^T(AA^T)^{-1}A$ projects any vector into the null space of $LaTeX: A$: $LaTeX: APy = (A-A)y = 0$ for all $LaTeX: y \in \mathbb{R}^n$. The form of an iteration is $LaTeX: x' = x + sd$, where $LaTeX: d$ is the projected gradient, $LaTeX: P \nabla f(x)$, and $LaTeX: s$ is determined by line search. Since $LaTeX: Ad=0$, $LaTeX: Ax'=Ax$, thus staying in the working surface. (This extends to nonlinear constraints by using the same correction procedure as the generalized reduced gradient method.)

# Graph

In one context, this is the functional value and domain: $LaTeX: \{(x, z): x \in X, \; z=f(x)\}$, where $LaTeX: f:X \rightarrow R$. In another context, this is a (maybe undirected) network. In the former context, see also epigraph and hypograph. In the latter context, the notation $LaTeX: (V,E)$ is sometimes used to mean a graph with vertex (or node) set $LaTeX: V$ and edge (or link) set $LaTeX: E$. We say an edge is incident with the two nodes that define it, and those two nodes are called the endpoints of the edge. The endpoints are said to be adjacent nodes.

An edge whose endpoints are the same node is called a loop. Two edges with the same endpoints are called parallel. A graph is simple if it has no loops or parallel edges. Otherwise, it is called a multigraph. The degree of a node is the number of edges incident to it (counting a loop twice). If two edges have a common endpoint, they are said to be adjacent. A path is a sequence of edges that are successively adjacent. (If the graph is simple, a path can also be represented by the associated sequence of nodes.)

When the edge is directed (or oriented), it is called an arc. When all edges are directed, the graph is called a directed graph, or digraph. With additional properties (e.g., arc weights), this is a network. A path in a digraph can traverse its arcs in either the forward or backward direction. If all arcs are forward, it is a directed path, or dipath.

Here are some special graphs of interest (not exhaustive).

• bipartite. nodes decompose into 2 sets such that every link has its endpoints in opposite sets. This applies, for example, to the standard transportation problem, where there are sources in one set and destinations in the other.
• complete. There is a link between each pair of nodes. A complete graph on n nodes is usually denoted $LaTeX: K_n$. In the case of a bipartite graph, the term bicomplete is used to mean all possible links are there: between all pairs of nodes in the two sets. A bicomplete graph on sets with $LaTeX: m$ and $LaTeX: n$ nodes, resp., is usually denoted $LaTeX: K_{mn}$.
• connected. There is a path between each pair of nodes. If the graph is directed, this divides into:
1. strongly connected - arcs must be traversed in forward direction;
2. weakly connected - arcs can be traversed in either direction.
• cycle (denoted $LaTeX: C_n$ for $LaTeX: n$ nodes). The links are $LaTeX: (1,2),(2,3), \ldots, (n-1,n), (n,1)$.
• planar. The graph can be drawn in a plane (2D) such that no two links cross (they meet only at incident nodes). This has special properties in multi-commodity flows and shortest paths.
• subgraph. $LaTeX: (V',E')$ such that $LaTeX: V'$ is a subset of $LaTeX: V$ and $LaTeX: E'$ is a subset of $LaTeX: E$.
• tree. Connected and acyclic. One problem is finding a spanning tree that is optimal in some sense.
• undirected. No link is directed.

# Greedy algorithm

Applies when the optimization problem is to decide whether or not to include some element from a given set. A greedy algorithm begins with no elements and sequentially selects an element from the feasible set of remaining elements by myopic optimization. (The elements could have been sorted by some criterion, such as associated weights.) This results in an optimal solution to the problem if and only if there is an underlying matroid structure (for example, see spanning tree). Further details are in the supplement, Greedy Algorithms.

# Groebner basis

This arises in parametric integer (linear) programming, varying the right-hand side $LaTeX: (b)$. A precise definition is involved, stemming from ideal theory in abstract algebra, and there are many other applications outside of integer programming. In the context of ILP, a Gröbner basis is a minimal set of change directions, $LaTeX: \{h^1, h^2, \ldots, h^k\}$, called a test set, such that the objective function improves along those directions. To elaborate, suppose we have

$LaTeX: \mbox{ILP}(b): \min c^Tx : Ax = b, \; x \in \mathbb{Z}^n, \; x \ge 0,$

where $LaTeX: A$ and $LaTeX: b$ have integer elements. Let $LaTeX: X^*(b) = \arg\!\min \{c^Tx : Ax = b, \;x \in \mathbb{Z}^n, \; x \ge 0\}$ and $LaTeX: B = \{b: X^*(b) \neq \emptyset\}$ (i.e., $LaTeX: \mbox{ILP}(b)$ is feasible and bounded). A Gröbner basis for $LaTeX: A$, holding $LaTeX: c$ fixed, is a set, $LaTeX: G_c = \{h^i\}$, for which the following conditions hold:

1. $LaTeX: Ah^i = 0$.
2. $LaTeX: c^T h^i < 0$ (for minimization).
3. For $LaTeX: b \in B$ and $LaTeX: x \not\in X^*(b)$, there exists $LaTeX: h^i \in G_c$ such that$LaTeX: x + h^i \ge 0$.

Condition 1 says that $LaTeX: A(x+h^i)=b$ if $LaTeX: Ax=b$, and condition 2 says $LaTeX: c^T(x+h^i) < c^Tx$. Therefore, if $LaTeX: x$ is any feasible solution to $LaTeX: \mbox{ILP}(b),$ either $LaTeX: x$ is optimal for a particular right-hand side $LaTeX: b$, or condition 3 ensures there is an improvement by adding some vector, $LaTeX: h^i \in G_c$.

Here is a numerical example.

# Group problem

Solving the corner polyhedron problem from the perspective of the Gomory group.

# Haar condition

For an $LaTeX: n \times n$ matrix $LaTeX: A$,

$LaTeX: \mbox{det} (A) \le \|A_{:,1} \| ... \|A_{:,n}\| = \prod_{j=1}^n \| A_{:,j} \|,$
where $LaTeX: A_{:,j}$ is the j-th column of $LaTeX: A$ and norms are $LaTeX: L_2$.

# Half-line

A half-line is a collection of points of the form $LaTeX: \{t \, {{\bf v}: t \ge 0\}$, where $LaTeX: {\bf v}$ is in $LaTeX: {\mathbb R}^n$ such that $LaTeX: {\bf v} \neq 0$ (some state $LaTeX: \|{\bf v}\|=1$, without loss in generality). Note the half-line is rooted at the origin -- see ray for translation.

# Halfspace

Consider the hyperplane, $LaTeX: \{x \in \mathbb{R}^n: a^T x = b\}$ (where $LaTeX: a \in \mathbb{R}^n \backslash {0}$). This partitions $LaTeX: \mathbb{R}^n$ into two (closed) half spaces: $LaTeX: \{ x: a^Tx \le b\}$ and $LaTeX: \{ x: a^Tx \ge b\}$. The two associated open halfspaces are $LaTeX: \{ x: a^t x < b\}$ and $LaTeX: \{ x: a^Tx > b\}$.

# Hard constraint

A constraint is hard if it has to be satisfied in any feasible solution. Compare with soft constraint.

# Hausdorff metric

This metric arises in normed spaces, giving a measure of distance between two (point) sets, $LaTeX: S$ and $LaTeX: T$. Given a norm, let $LaTeX: d$ be the distance function between a point and a set:

$LaTeX: d(x, T) = \inf \{\|x-y\|: y \in T\}$ and $LaTeX: d(S, y) = \inf \{\|x-y\|: x \in S\}$.

Define

$LaTeX: D(S, T) = \sup \{d(x, T): x \in S\}$ and $LaTeX: D(T, S) = \sup \{d(S, y): y \in T\}.$

Then, $LaTeX: h(S, T) = \max \{D(S, T), D(T, S)\}$ is the Hausdorff metric. In words, this says that for each $LaTeX: x \in S$, let $LaTeX: y(x)$ be its closest point in $LaTeX: T$. Then, maximize this distance among all $LaTeX: x$.

For example, let $LaTeX: S$ be the interval, $LaTeX: [-1, 1]$, and let $LaTeX: T$ be the interval, $LaTeX: [0, 3]$. The Hausdorff distance between these two intervals is:

$LaTeX: h(S, T) = \max \{D(S, T), D(T, S)\} = \max\{1, 2\} = 2.$

# Hessenberg matrix

An upper Hessenberg matrix is a square matrix with $LaTeX: A_{i, j} = 0$ for $LaTeX: i > j+1$. A lower Hessenberg matrix is one whose transpose is upper Hessenberg.

# Hessian

The matrix of second partial derivatives of a function (assumed to be twice differentiable). This is often denoted$LaTeX: H_f(x)$, where $LaTeX: f$ is the function and $LaTeX: x$ is a point in its domain.

# Heuristic

In mathematical programming, this usually means a procedure that seeks an optimal solution but does not guarantee it will find one, even if one exists. It is often used in contrast to an algorithm, so branch and bound would not be considered a heuristic in this sense. In AI, however, a heuristic is an algorithm (with some guarantees) that uses a heuristic function to estimate the "cost" of branching from a given node to a leaf of the search tree. (Also, in AI, the usual rules of node selection in branch and bound can be determined by the choice of heuristic function: best-first, breadth-first, or depth-first search.)

# Heuristic function

A heuristic function is used to guide a heuristic search, much lika a human would do. In this sense, it is a rule of thumb. Formally, in AI, a heuristic function is defined on the state of a search tree and is used in the evaluation of nodes for expansion in what is sometimes called a best-first directed tree search strategy.

# Heuristic search

In mathematical programming, this is any (purposeful) search procedure to seek a solution to a global optimization problem, notably to combinatorial programs. In AI, this is a (partially) informed search (vs. blind search), using a heuristic function for guidance.

Two types of (blind) search methods are:

1. Breadth-first: expanding a node of least depth;
2. Depth-first: expanding a node of greatest depth (requiring backtracking).
(See branch and bound.)

A specific class of local heuristic search algorithms is the greedy algorithm. Another type is the 1-Opt for the TSP.

Here are heuristic search strategies that are based on some biological metaphor:

1. Ant colony optimization, based on how ants solve problems;
2. Genetic algorithm, based on genetics and evolution;
3. Neural networks, based on how the brain functions;
4. Simulated annealing, based on thermodynamics;
5. Tabu search, based on memory-response;
6. Target analysis, based on learning.

# Hirsch conjecture

If the basis of one basic feasible solution differs from another by k columns, there exists k feasible pivots to go from the first to the second. This was eventually shown to be false for general polyhedra and is an open problem for bounded polyhedra. It was posed by Warren M. Hirsch in 1957.

# Holder inequality

Holder's inequality states

$LaTeX: \|x y\|_1 \le \|x\|_p \|y\|_q$,

where $LaTeX: \|\cdot\|_k$ denotes the $LaTeX: L_k$ norm, $LaTeX: p > 1$, and $LaTeX: 1/p + 1/q = 1$. Equality holds if and only if $LaTeX: |x|^p = a|y|^q$ for some scalar $LaTeX: a$. (Note: this is the Cauchy-Schwartz inequality if $LaTeX: p = q = 2$.)

# Homogeneous function

A homogeneous function of degree p satisfies

$LaTeX: f(ax) = (a^p)f(x)$ for all $LaTeX: a$.

It is positively homogeneous if we restrict $LaTeX: a > 0$. When the degree is not specified (even by context), it is generally assumed to be 1. For example, $LaTeX: x$ is homogeneous, $LaTeX: xy + x^2$ is homogeneous of degree 2, $LaTeX: x + x^2$ is not homogeneous, and $LaTeX: xy/(x+y)$ is positively homogeneous on the positive elements of $LaTeX: {\mathbb R}^2$.

# Homotopy

A continuous transformation from one mapping to another. One example is the convex combination: $LaTeX: t f(x) + (1-t)F(x)$. As $LaTeX: t$ varies from 0 to 1, the function changes continuously from $LaTeX: F$ to $LaTeX: f$.

# Hungarian method

An algorithm to solve the standard assignment problem (presented by H. Kuhn in 1955). Let C be the $LaTeX: n\times n$ cost matrix.

• Step 0 (initialize). Subtract the least element of each row from that row of $LaTeX: C$. Then, do likewise for each column. The resulting matrix, $LaTeX: C^0$ has a zero in every row and column. (Further, a least cost assignment for $LaTeX: C^0$ is also a least cost assignment for $LaTeX: C$.) Re-define $LaTeX: C=C^0$.
• Step 1 (cover zeros). Draw the minimum number of lines through the rows and columns to cover all zeros in $LaTeX: C$. If that minimum is $LaTeX: n$, you can assign $LaTeX: i$ to $LaTeX: j$ such that $LaTeX: C_{ij}=0$; then you can remove row $LaTeX: i$ and column $LaTeX: j$, repeating the process to obtain an optimal assignment. Otherwise, if that minimum is greater than $LaTeX: n$, continue.
• Step 2 (revise $LaTeX: C$). Select the minimum uncovered element. Subtract this from each uncovered element and add it to each twice-covered element (i.e., to those with both a horizontal and vertical line intersecting). Return to step 1.

Example:


Job
_ 1  2  3  4  5 _
|                 |
|  1  2  3  4  7  | 1
|  7  4  2  6  5  | 2
C = |  2  4  3  1  4  | 3
|  2  3  1  2  3  | 4
|  5  4  3  2  4  | 5
|_               _|


After subtracting min row elements:


_ 1  2  3  4  5 _
|                 |
|  0  1  2  3  6  | 1
|  5  2  0  4  3  | 2
|  1  3  2  0  3  | 3
|  1  2  0  1  2  | 4
|  3  2  1  0  2  | 5
|_               _|


After subtracting the minimum column elements:


_ 1  2  3  4  5 _
|        |  |  |  |
|--0--0--2--3--5--| 1
|  5  1  0  4  1  | 2
|  1  2  2  0  1  | 3
|  1  1  0  1  0  | 4
|  3  1  1  0  0  | 5
|_       |  |  | _|


The minimum number of lines to cover all zeros is 4: a horizontal line through row 1 and vertical lines through columns 3, 4 and 5. The minimum uncovered element is 1. Subtract from each uncovered element and add to each twice covered element (cells (1,3), (1,4) and (1,5)):


_ 1  2  3  4  5 _
|                 |
|  0* 0  4  5  7  | 1
|  4  0* 0  4  1  | 2
|  0  2  2  0* 1  | 3
|  0  0  0* 1  0  | 4
|  2  0  1  0  0* | 5
|_               _|


The minimum cover is now 5, by drawing a line through each column. A minimum assignment is indicated by *. For example, we assign person 1 to job 1, person 2 to job 2, person 3 to job 4, person 4 to job 3, and person 5 to job 5. (There are alternative optimal assignments.)

# Hypergraph

A set of nodes (or vertices), say V, plus a set of edges, say E, such that each member of E is a subset of V. When each member of E has exactly 2 nodes, [V,E] is a graph. The hypergraph is a convenient mathematical way to describe relations that involve more than two objects (nodes).

Specific cases:

• An IIS hypergraph: each node represents an inequality and each edge represents an IIS.
• A common graphical representation of non-binary constraint networks is as a hypergraph, where nodes represent variables and edges group variables together that belong to the same scope.

# Hyperplane

An affine set of dimension $LaTeX: (n-1)$, i.e. the set $LaTeX: \{x: a^T x = b\}$, where $LaTeX: a$ is a nonzero vector called the normal of the hyperplane.

# Hypograph

Abbreviated $LaTeX: \mbox{hypo}\{f,X\}$. The hypograph of $LaTeX: f$ over $LaTeX: X$ is $LaTeX: \{(x, z): x \in X, \;z \le f(x)\}$, which is the region on or below the graph of $LaTeX: f$.

# Implicit enumeration

Applied to integer programming, this is a systematic evaluation of all possible solutions without explicitly evaluating all of them. For example, suppose a feasible solution is at hand with objective value 100. Now suppose we solve a relaxation, such as fixing some of the variables and solving the resulting linear program, and we obtain a value of only 90. If we seek a maximum, we can ignore all possible solutions having the fixed values in the relaxation since they must all have an objective value of at most 90. This is often said to be equivalent to branch and bound; however, the early versions of these methods were related, but different. The early branch and bound was a best-first (even breadth-first) heuristic search   strategy. Early implicit enumeration schemes were depth-first.

# Implicit function theorem

Suppose $LaTeX: h:R^n \rightarrow R^m$, where $LaTeX: n > m$, and $LaTeX: h$ is in smooth. Further, suppose we can partition the variables, $LaTeX: x = (y, z)$, such that $LaTeX: y$ is m-dimensional with $LaTeX: \nabla_y h(x)$ nonsingular at $LaTeX: x^* = (y^*, z^*)$. Then, there exists $LaTeX: \varepsilon > 0$ for which there is an implicit function, $LaTeX: f$, on the neighborhood, $LaTeX: N_{\varepsilon}(z*) = \{z: \|z-z*\| < e\}$ such that $LaTeX: h(f(z), z)=0$ for all $LaTeX: z \in N_{\varepsilon}(z*)$. Further, $LaTeX: f$ is smooth with $LaTeX: \nabla_z f(z^*) = -\left( \nabla_y h(x^*) \right)^{-1} \nabla_z h(z^*)$.

# Implied constraint

An implied constraint is a redundant constraint that (typically) improves propagation if it is added to a constraint model.

# Implied equality

Implied equality. An inequality constraint, say $LaTeX: g(x) \le 0$, such that $LaTeX: g(x)=0$ for all feasible $LaTeX: x$.

# Inactive constraint

A constraint that is not active (at a point).

# Incidence matrix

A matrix, say $LaTeX: M$, that represents the incidence of edges or arcs to nodes in a graph or network. In the undirected case, $LaTeX: M$ is binary valued: if edge $LaTeX: k$ has endpoints $LaTeX: i$ and $LaTeX: j$, then $LaTeX: M_{ik} = M_{jk} = 1$ and $LaTeX: M_{rk} = 0$ for $LaTeX: r \neq i,j$. In the directed case, the entry $LaTeX: -1$ indicates the tail: if the arc is directed from $LaTeX: i$ to $LaTeX: j$, $LaTeX: M_{ik} =-1$ and $LaTeX: M_{jk} = 1$.

# Inconsistent

A system of inequalities and/or equations for which there is no feasible solution.

# Incumbent solution

The current best solution found during an algorithmic search procedure. Typically in (mixed) integer programming this refers to the best known feasible solution in the branching tree.

# Independent set

Given a graph, $LaTeX: G=(V,E)$, an independent set (also called a stable set) is a subgraph induced by a subset of vertices, $LaTeX: S$, plus all edges with both endpoints in $LaTeX: S$, such that no two nodes in the subgraph are adjacent. A maximal independent set is one such that adding any node causes the subgraph to violate independence -- the added node is adjacent to some node already in the set. Given weights, $LaTeX: w(v)$ for $LaTeX: v \in V$, the weight of an independent set, $LaTeX: S$, is the sum of the weights in $LaTeX: S$. The maximum independent set problem is to find an independent set of maximum weight. This is equivalent to finding a minimum weight vertex cover, say $LaTeX: C$, since $LaTeX: V\backslash C$ is an independent set. (One can also refer to an independent set of arcs (or edges) as a subgraph with no two arcs (or edges) adjacent.)

# Indicator function

A function indicating membership in a set, say $LaTeX: A$. Such functions are usually denoted $LaTeX: \chi_A$ or $LaTeX: 1_A$ and are evaluated as

$LaTeX: \chi_A (x) = 1_A = \left\{ \begin{array}{cl} 1, & x \in A \\ 0, & x \not\in A. \end{array} \right.$

# Inequality

A relation of the form $LaTeX: g(x) \le 0$ or $LaTeX: g(x) \ge 0$. Such constraints are typical of mathematical programs. With equality allowed, these are called weak inequalities; strict inequalities are $LaTeX: g(x) < 0$ and $LaTeX: g(x) > 0$.

Also see variational inequality.

# Inequality of degree k

An inequality, usually linear, with $LaTeX: k$ variables. This arises in integer programming, where the variables are binary. In particular, an inequality of degree 2 can represent a precedence constraint, and it has special computational advantages in the node packing problem.

Not feasible.

# Inference dual

See it in the list of particular duals.

# Infimum

(abbr. Inf). The greatest lower bound on a (real-valued) function over (a subset of) its domain. If f is unbounded from below, Inf{f} = -infinity, and if the domain is empty, Inf{f} = infinity. Otherwise, suppose L is any lower bound: f(x) >= L for all x in X. L is a greatest lower bound if, for any e > 0, there exists x in the domain for which f(x) <= L+e. (That is, we can get arbitrarily close to L in the range of f.) Note that the infimum is always defined, and its range is in the extended reals. The infimum is the minimum, if it is attained by some point in its domain.

# Infinite program

An infinite [dimensional] program is a mathematical program with an infinite number of variables and constraints (also see semi-infinite program). Generally, this is approached as an abstract program.

# Inner approximation

This solves a mathematical program by a sequence of approximations whose feasible regions are contained in the original feasible region. One example is the use of a barrier penalty method. Another example is polyhedral annexation.

# Integer equivalent aggregation

Integer equivalent aggregation is a reduction of a system of linear algebraic equations with non-negative integer solutions to a single equation, which is a linear combination of the equations of the system and has the same set of non-negative integer solutions. For example, consider the system:

$LaTeX: S = \{ (x,y,z) \in \{0,1\}^3 : x + y = 1 , y + z = 1\}.$
.

By simply adding the equations, we have the equivalent description:

$LaTeX: S = \{(x,y,z) \in {0,1}^3: x + 2y + z = 2\}$.

Both sets consist of two the points $LaTeX: (0,1,0)$ and $LaTeX: (1,0,1)$.

# Integer polyhedron

The convex hull of the feasible region of a linear integer program: $LaTeX: \{x \in \mathbb{Z}^n+ : Ax \ge b\}$.

# Integer program

Abbreviated IP. The variables are required to be integer-valued. Historically, this term implied the mathematical program was otherwise linear, so one often qualifies a nonlinear integer program vs. a linear IP. Also see mixed integer program and combinatorial program.

# Integer rounding

Rounding a non-integer solution to an integer neighbor. This will generally not yield an optimal solution to an integer program -- see Myths and Counterexamples in Mathematical Programming.

# Interior penalty function

Same as barrier function.

# Interior point method

A family of algorithms that stays in the strict interior of the feasible region, possibly through the use of a barrier function. The term grew from Karmarkar's algorithm to solve a linear program. In that case, the resulting solution (if it exists) is strictly complementary, which defines the (unique) optimal partition.

# Interior solution

An optimal solution to a mathematical program that is in the relative interior of its set of optimal solutions. This arises in interior point methods, and relates to the strict complementarity of the solution.

# Interpolation

An estimate of a value between two others. In LP, an interpolation estimate of the optimal objective value, say $LaTeX: z(\alpha b+(1-\alpha)b')$, is $LaTeX: \alpha z(b) +(1-\alpha)z(b')$, where $LaTeX: b$ and $LaTeX: b'$ are right-hand side vectors.

# Intersection cut

See convexity cut.

# Inventory balance equation

The constraint that says that the amount of inventory in the next time period must equal the current inventory plus what is produced or purchased minus what is lost or sold. Let $LaTeX: y(t)$ be the inventory at the beginning of period $LaTeX: t$, with $LaTeX: y(0)$ given. Then, the inventory equation is:

$LaTeX: y(t+1) = ay(t) + P(t) - S(t),$

where $LaTeX: P(t)$ is the level of production (or somehow acquired), and $LaTeX: S(t)$ is the level of sales (or somehow consumed). Typically, $LaTeX: a=1$, but if $LaTeX: a < 1,$ it is called a loss factor, and if $LaTeX: a > 1,$ it is called a gain factor.

The language used is for the inventory control in the production scheduling problem, but this has become a standard system of equations that appears in many mathematical programs. Thus, the meaning of the variables can be substantially different. One example is the

steel beam assortment problem.

# Inverse problem

Given a point in the decision space, find parameter values that render it optimal. For example, suppose we have the LP: $LaTeX: \min \{c^T x : Ax = b, \,x \ge 0\}$. Let $LaTeX: B$ be a basis from $LaTeX: A$, and we ask for $LaTeX: (b, c)$ for which this basis is optimal. This has a simple solution. Let $LaTeX: A = [B \, N]$ and partition $LaTeX: c$ and $LaTeX: x$ conformally, so $LaTeX: Ax = Bx_B + Nx_N$ and $LaTeX: c^Tx = c^T_Bx_B + c^T_Nx_N$ Then, the set of $LaTeX: (b, c)$ for which the associated basic solution is optimal is the following cone:

$LaTeX: K_B = \{ (b, c) : B^{-1}b \ge 0, \, c_B B^{-1}N \le c_N\}.$

A more difficult inverse problem is when there is some target value for the parameters. We might, for example, fix $LaTeX: b$ and seek to minimize $LaTeX: \|c - c^*\|^2$ subject to $LaTeX: (b, c) \in K_B$, where $LaTeX: c^*$ is a target value for $LaTeX: c$.

The problem can be combinatorial. We might want to minimize $LaTeX: \|c - c^*\|$ for some norm where $LaTeX: c$'s are costs on the arcs or edges of a network. The solution at hand might be a spanning tree or a TSP tour. We might also impose constraints on $LaTeX: c$ directly, such as simple bounds.

The general inverse problem may thus be stated:

$LaTeX: \min \{ \|p - p^* \| : p \in P, \, p \in \arg\min \{ f(x; p): x \in F(p)\},$

where $LaTeX: p^*$ is the target, $LaTeX: ||\cdot||$ is some norm, $LaTeX: P$ is a non-empty, closed set, which generally does not contain $LaTeX: p^*$, and $LaTeX: F(p)$ is the feasible region for $LaTeX: x$, given $LaTeX: p$.

There are variations, such as when only a partial solution is given. An example is a partial TSP tour or some partial scheduling of jobs, and we ask for cost values for which there is an optimal

solution that contains these subsets.

# Involutionary property

Of a transformation, it means something is its own inverse. In linear programming, the dual of the dual is the primal.

# Irreducible infeasible subsystem

Abbreviated IIS. A subsystem of inequalities and equations such that the subsystem is inconsistent and every proper subsystem is consistent.

# Irredundant

A system of constraints is irredundant if it contains no redundant constraint.

# Isoperimetric problem

Among all closed plane curves with a given perimeter find one that maximizes the area. This is also known as Queen Dido's problem and serves as a classical problem for variational calculus. Its significance in mathematical programming is that it led to Lagrange's multiplier theorem. Restricting shapes to rectangles, the problem is:

$LaTeX: \max \{xy : x+y=p, \, x \ge 0, \, y \ge 0\},$

where $LaTeX: 2p$ is the perimeter. For positive $LaTeX: x$ and $LaTeX: y$ and multiplier $LaTeX: u$, Lagrange's multiplier conditions requires $LaTeX: y-u=0$ and $LaTeX: x-u=0$, so $LaTeX: x=y$, which means the solution is the square.

Same as Contour.

# Isotonic function

An order preserving function $LaTeX: f: X \rightarrow \mathbb{R}^m$, i.e. for any $LaTeX: x$ and $LaTeX: y$ in $LaTeX: X$

$LaTeX: x > y \Rightarrow f(x) > f(y) \; \mbox{ and } \; x < y \Rightarrow f(x) < f(y).$

# Jacobian

For a transformation, $LaTeX: y=f(x)$, such that $LaTeX: f$ is differentiable, the Jacobian is the determinant, $LaTeX: \mbox{det}\left( \nabla f(x) \right)$. Historically, the number of $LaTeX: y$-variables equals the number of $LaTeX: x$-variables, say $LaTeX: n$, so $LaTeX: \nabla f(x)$ is $LaTeX: n \times n$. The Jacobian matrix is defined as the matrix, $LaTeX: \nabla f(x)$, allowing the number of functions to be less than the number of variables.

# Jamming

Slow convergence, perhaps to a non-solution point -- see zigzag phenomenon.

# Jensen inequality

Let $LaTeX: f$ be convex on $LaTeX: X$, and let $LaTeX: x$ be a random variable with expected value, $LaTeX: E(x)$. Then, $LaTeX: E(f(x)) \ge f(E(x))$. For example, let $LaTeX: X=R$ and $LaTeX: f(x) = x^2$. Suppose $LaTeX: x$ is normally distributed with mean zero and standard deviation $LaTeX: s$. Then,

$LaTeX: E(f(x)) = E(x^2) = s^2 \ge 0 = E(x) = f(E(x))$.

# Jeroslow formula

This arises in mixed integer linear programming as an extension of Gomory functions. Consider the following standard form

MILP: $LaTeX: \max \{ c^T x + d^T y : Ax + Dy = b, \, (x, y) \ge 0, \,x \in \mathbb{Z}^n\},$

where $LaTeX: \mbox{rank}(D)=m$ ($LaTeX: m$ is the number of rows). Let $LaTeX: \{B^k\}$ be the set of bases of $LaTeX: D$ such that there exists $LaTeX: \{u^k\}$ to satisfy the dual conditions: $LaTeX: u^k B \ge d$ and $LaTeX: u^k B^k = d^k$. Let $LaTeX: \lfloor v \rfloor$ denote the floor (or round-down) function, and define $LaTeX: \lfloor v\rfloor_k = B^k \lfloor (B^k)^{-1} v \rfloor$. Let $LaTeX: V_k$ be the set of integer linear combinations of vectors spanned by a dual feasible basis, $LaTeX: B^k$:

$LaTeX: V_k = \{v: (B^k)^{-1} v \in \mathbb{Z}^m\}.$

Let $LaTeX: V = \cap_k V_k$, and define

$LaTeX: U_k = \{v \in V_k : (B^k)^{-1}v \ge 0, \, (B^k)^{-1}u \neq 0, \, 0 \le (B^k)^{-1}u \le (B^k)^{-1}v \Rightarrow u \not\in V \},$

Let G be a Gomory function on $LaTeX: \mathbb{R}^m$. A Jeroslow formula (or function) for given $LaTeX: D$ is:

$LaTeX: J(v) = \max_k \left( \max \{G(w) + u^k (v - w): w \in V, \, \lfloor v \rfloor_k - w \in U_k \} \right).$

# K-consistency

A consistency notion in constraint programming.

Let $LaTeX: P = (X, D, C)$ be a CSP.

• Given a set of variables $LaTeX: Y \subseteq X$ with $LaTeX: |Y| = k -1$, a locally consistent instantiation $LaTeX: I$ on $LaTeX: Y$ is k-consistent iff for any kth variable $LaTeX: x_{i_{k}} \in X \setminus Y$ there exists a value $LaTeX: v_{i_{k}} \in D(x_{i_{k}})$ such that $LaTeX: I \cap \{ (x_{i_k}, v_{i_k}) \}$ is locally consistent.
• The CSP $LaTeX: P$ is k-consistent iff for any set $LaTeX: Y$ of $LaTeX: k-1$ variables, any locally consistent instantiation on $LaTeX: Y$ is k-consistent.

For example, 3-consistency ensures that any instantiation any pair of variables can be extended to an instantiation involving any third variable without violating any constraint. It is equivalent to path consistency. Similarly, 2-consistency is also known as arc consistency.

See n-Opt.

# Kantorovich inequality

Let $LaTeX: Q$ be a symmetric, positive definite matrix, with eigenvalues in $LaTeX: [a,b]$, and let $LaTeX: x \in \mathbb{R}^n$. Then,

$LaTeX: \frac{\| x \|^2}{(x^T Q x)(x^T Q^{-1}x)} \le \frac{4 \, a \, b}{(a+b)^2}.$

Its significance in mathematical programming is in convergence analysis: it bounds the convergence rate of Cauchy's steepest ascent.

# Karmarkar algorithm

An interior point method for linear programming, which began a series of variations (e.g., see affine scaling). The original algorithm applies to the system $LaTeX: Ax=0$, for $LaTeX: x \in S_n$, i.e. x is an element of the n-dim. simplex, and assumes $LaTeX: Ae=0$ (so $LaTeX: e/n$ is a feasible solution). It further assumes the optimal objective value is known to be zero. (All of these assumptions have been relaxed in subsequent developments.)

Here are the basic steps of Karmarkar's (original) algorithm, given $LaTeX: x > 0$, $LaTeX: e^T x=1$ and $LaTeX: Ax=0$.

1. Form $LaTeX: D = \mbox{diag}(x_j)$ and
$LaTeX: B = \begin{bmatrix} AD \\ e \end{bmatrix}$
(assume $LaTeX: \mbox{rank}(B)=m+1$).
2. Project: Project $LaTeX: Dc$ to null space of $LaTeX: B$: $LaTeX: c* = [I-B^T(BB^T)^{-1}B]Dc.$
3. Normalize Normalize the ascent direction: $LaTeX: d = c* /(\|c*\| \sqrt{n(n-1)}.$ If c*=0, terminate since the solution is optimal.)
4. Move Move in projected space to $LaTeX: y = e/n - sd$, where $LaTeX: s$ is a fixed step size (= 1/4).
5. Project Project back into x-space: $LaTeX: x = Dy /(e^TDy)$.
The time complexity is $LaTeX: O(n^{3.5}L^2 \ln(L)\ln(\ln(L))),$ where $LaTeX: L$ is a measure of data storage for a given precision. The solution obtained is in the relative interior of the set of optimal solutions.

See basis.

# Klee-Minty polytope

This is an example to show that the elementary simplex method does not have polynomial time complexity. The polytope is

$LaTeX: \left\{ x : \left(\sum_{i=1}^{k-1} 2^{k-i+1} x_i \right) + x_k \le 5^k,

 \; \mbox{ for } k = 1, 2, \ldots, n, \; x \ge 0 \right\}

$

Maximizing

$LaTeX: \sum_{i=1}^n 2^{n-i} x_i$

over this polytope shows the exponential complexity.

# Knapsack problem

An integer program of the form,

$LaTeX: \max \{c^Tx : x \in \mathbb{Z}^n, \, x \ge 0,\, \mbox{ and } a^Tx \le b \},$

where $LaTeX: a > 0$. The original problem models the maximum value of a knapsack that is limited by volume or weight $LaTeX: (b)$, where $LaTeX: x_j$ is the number of items of type $LaTeX: j$ put into the knapsack at unit return $LaTeX: c_j$, that uses $LaTeX: a_j$ units per item.

The group knapsack problem has this form, except that it pertains to Gomory's corner polyhedron problem for general integer programming.

The multi-dimensional knapsack problem has more constraints (e.g., volume and weight), $LaTeX: Ax <= b$, where $LaTeX: A >= 0$ with $LaTeX: Ae > 0$ and $LaTeX: e^TA > 0$.

# Kuhn-Tucker conditions

Abbreviated KT and KKT (latter credits Karush, who had the conditions in his earlier thesis).

This is the Lagrange Multiplier Rule, which Kuhn and Tucker published as an extension to allow inequality constraints. They also introduced the notion of a constraint qualification (which had been only the linear independence condition stemming from Lagrange's multiplier theorem). The most important contribution that sets KT apart from similar discoveries is the connection with saddle points that led to Lagrangian duality.

# Kuhn-Tucker point

Abbreviated KKT point. A feasible point that satisfies the Kuhn-Tucker conditions.

# LINDO

A solver using a simplified problem specification for Linear, Integer, Nonlinear and Dynamic optimization. See [1]

# LINGO

A modeling language companion to LINDO.

# LPL

A Linear Programming Language

# LU decomposition

A factorization of a matrix $LaTeX: A$ into a lower triangular matrix $LaTeX: (L)$ and an upper triangular matrix $LaTeX: (U)$ so that $LaTeX: A=LU$. The advantage is to solve the system $LaTeX: Ax=b$, we first apply forward substitution to solve $LaTeX: Ly=b$, then backward substitution to solve $LaTeX: Ux=y$.

# Label correcting algorithm

Arises in labeling algorithms for shortest path problem. Each iteration a label is set to an estimate of the shortest path from a given node. All labels become exact values at termination.

# Label setting algorithm

Arises in labeling algorithms for shortest path problem. Each iteration a label becomes the actual shortest path from some node. (Termination occurs when the destination node(s) are permanently lablelled.)

# Labeling algorithm

A class of algorithms for path-related problems on graphs and networks. To illustrate, consider a network $LaTeX: [V,A]$ with arc values $LaTeX: c$.

• The Ford-Fulkerson max flow labeling algorithm ($LaTeX: c$ is capacity).
• Input is source $LaTeX: (s)$, destination $LaTeX: (t)$, and capacities $LaTeX: (c)$. Output is the max flow from $LaTeX: s$ to $LaTeX: t$.
• Initialize all flows to be zero and assign the label $LaTeX: (-,\infty)$ to the source. All other nodes are unlabeled, and all nodes (including the source) are unscanned.
• Labeling process: Select any labeled, unscanned node, say $LaTeX: k$ with label $LaTeX: (a,b)$. If node $LaTeX: j$ is an unlabeled neighbor and $LaTeX: x_{kj} < c_{kj}$, assign the label $LaTeX: (k+,b^*)$ to node $LaTeX: j$, where $LaTeX: b^* = \min \{b, c_{kj}-x_{kj}\}$. If node $LaTeX: j$ is an unlabeled neighbor and $LaTeX: x_{jk} > 0$, assign the label $LaTeX: (k-,b^*)$ to node $LaTeX: j$, where $LaTeX: b* = \min\{b, x_{jk}\}$. (In either case, define node $LaTeX: j$ as labeled, but unscanned, while node $LaTeX: k$ is now labeled and scanned.) Repeat this until either the sink is labeled and unscanned, or until no more labels can be assigned. In the latter case terminate; in the former case, go to the flow change step.
• Flow change: Let the sink $LaTeX: (t)$ have label $LaTeX: (a,b)$. If $LaTeX: a$ is $LaTeX: k+$, replace $LaTeX: x_{kt}$ by $LaTeX: x_{kt}+b$; if it is $LaTeX: (k-,b)$, replace $LaTeX: x_{kt}$ by $LaTeX: x_[kt}-b$. In either case, apply the flow change ($LaTeX: b$ or $LaTeX: -b$) along the path back to the source. The result is a new flow that is $LaTeX: b$ more than it was. Erase labels, except on the source, and go to the labeling process.
• Dijkstra's shortest path algorithm ($LaTeX: c$ is distance).
• Input is source $LaTeX: (s)$, destinations $LaTeX: (T \neq \emptyset)$, and distances $LaTeX: (c)$. Output is a shortest path from $LaTeX: s$ to each node in $LaTeX: T$.
• Initialize labels: $LaTeX: L_v = \infty$ for $LaTeX: v \neq s$ and $LaTeX: L_s=0$. Set $LaTeX: S=\emptyset$.
• Label: select $LaTeX: u \in \arg\min {L_v: v \in V\backslash S\}$ (terminate if $LaTeX: S=V$). Then, update $LaTeX: S := S \cup \{u\}$, and for $LaTeX: v \in V\backslash S$: if $LaTeX: L_u + c_{uv} < L_v$, set $LaTeX: L_v := L_u + c_{uv}.$
• Terminate when $LaTeX: T$ is a subset of $LaTeX: S$ (i.e., all destination nodes are labeled with their shortest distance from node $LaTeX: s$).
• A generic shortest path labeling algorithm from all nodes to destination node $LaTeX: t$ ($LaTeX: c$ is distance).
• Input is destination $LaTeX: (t)$ and distances $LaTeX: (c$). Output is a shortest path from each node to $LaTeX: t$.
• Initialize labels, $LaTeX: L_v$ is the estimate of a shortest path length from node $LaTeX: v$ to node $LaTeX: t$, with $LaTeX: L_t = 0.$
• Label: Find an arc, say $LaTeX: (i, j)$, that violates the distance equations, say $LaTeX: L_j > L_i + c_{ij}$. If none, terminate; otherwise, update the label: $LaTeX: L_j = L_i + c_{ij}$.
• The labeling step is repeated until there are no violations. At termination, $LaTeX: L_j$ is the length of a shortest path from node $LaTeX: j$ to node $LaTeX: t$.

Another type of labeling algorithm occurs in simplicial subdivision, used to compute a fixed point that represents an economic equilibrium.

# Lagrange conditions

The optimality equation (plus feasibility conditions) in Lagrange's multiplier theorem.

# Lagrange multiplier rule

From the extension of Lagrange's multiplier theorem. Suppose

$LaTeX: x^* \in \arg\max \{f(x): g(x) \le 0, \, h(x) = 0\},$

where $LaTeX: f$, $LaTeX: g$, and $LaTeX: h$ are smooth. Then, there exist multipliers $LaTeX: (u, v)$ for which the following conditions hold:

• $LaTeX: \nabla_x [f(x^*) - u^T g(x^*) - v^T h(x^*)] = 0$;
• $LaTeX: u >= 0$;
• $LaTeX: u^T g(x*) = 0$.

Since $LaTeX: g(x^*) \le 0,$ the last condition, given $LaTeX: u \ge 0$, is equivalent to complementary slackness. These are considered first-order optimality conditions, though the Lagrange Multiplier Rule is not always valid -- see constraint qualifications.

For extensions see the Generalized Lagrange multiplier method.

# Lagrange multiplier theorem

Let $LaTeX: f$ and $LaTeX: h$ be smooth functions and suppose $LaTeX: \nabla h(x^*)$ has full row rank. Then, $LaTeX: x^* \in \arg\max \{f(x): h(x) = 0\}$ only if there exists $LaTeX: v \in \mathbb{R}^m$ such that:

$LaTeX: \nabla f(x*) - v^T \nabla h(x*) = 0.$

The $LaTeX: i$-th component of $LaTeX: v$, $LaTeX: v_i,$ is called the Lagrange multiplier associated with the $LaTeX: i$-th constraint, $LaTeX: h_i(x)=0$. Extensions to remove the rank condition and/or allow inequality constraints were by Carathéodory, John, Karush, and Kuhn & Tucker. Also see the Lagrange Multiplier Rule.

# Lagrangian

For the mathematical program

$LaTeX: \min \{ f(x) : g(x) \ge 0, \, h(x) = 0, \, x \in X \},$
the Lagrangian is the function:

$LaTeX: L(x, u, v) = f(x) - u^T g(x) - v^T h(x).$

for $LaTeX: x \in X$ and $LaTeX: u \ge 0$. Note that the Lagrange Multiplier Rule can be written as the first-order conditions for $LaTeX: (x^*, u^*,v^*)$ to be a saddle point of $LaTeX: L$. In Lagrange's multiplier theorem (where $LaTeX: X=\mathbb{R}^n$ and $LaTeX: g$ is vacuous), this is simply that $LaTeX: \nabla L(x^*,v^*)=0$, which could be any type of stationary point.

# Lagrangian duality

See it in the list of particular duals.

# Lagrangian relaxation

The removal of constraints and putting them into the objective as in the Lagrangian function. Specifically, suppose we have

$LaTeX: \max \{ f(x): x \in X, \, g(x) \le 0, \, h(x) = 0\}.$

The Lagrangian relaxation is

$LaTeX: \max \{f(x) - u^T g(x) - v^T h(x): x \in X\},$

where $LaTeX: u \ge 0$ ($LaTeX: v$ is unrestricted).

This is a relaxation of the original mathematical program because the constraints have been removed (expanding the feasible region), and for $LaTeX: x$ feasible (i.e., satisfying these constraints), the relaxed objective satisfies:

$LaTeX: f(x) - u^T g(x) - v^T h(x) \ge f(x),$

because $LaTeX: v^T h(x)=0$ and $LaTeX: u^Tg(x) \le 0.$

The objective is the usual Lagrangian function when $LaTeX: X$ is simply $LaTeX: \mathbb{R}^n$. It provides a foundation for Lagrangian duality and the Generalized Lagrange Multiplier Method.

Note that $LaTeX: X$ could retain some constraints, such as in the separable form $LaTeX: X = X_1 \times \ldots \times X_n$. Now suppose $LaTeX: f(x) = f_1(x_1) + \ldots + f_n(x_n),$ $LaTeX: g(x) = g_1(x_1) + \ldots + g_n(x_n),$ and $LaTeX: h(x) = h_1(x_1) + \ldots + h_n(x_n).$ Then, the Lagrangian relaxation decomposes into $LaTeX: n$ independent mathematical programs for any multiplier value:

$LaTeX: \max \{f(x) - u^T g(x) - v^T h(x): x \in X\} = \sum_{j=1}^n \max {f_j(x_j) - u_j g_j(x_j) - v_j h_j(x_j): x_j \in X_j \}.$

The tuple $LaTeX: (x^*, u^*, v^*)$ in $LaTeX: X \times \mathbb{R}^m \times \mathbb{R}^M$, with $LaTeX: u^* \ge 0$, is a saddlepoint of of the Lagrangian $LaTeX: L$ if, and only if, the strong duality properties hold:

1. $LaTeX: x^* \in \arg\max \{L(x, u^* v^*): x \in X\}$
2. $LaTeX: g(x^*) \le 0$ and $LaTeX: h(x^*) = 0$ ($LaTeX: x^*$ is feasible)
3. $LaTeX: (u^*)^T g(x^*) = 0$ ($LaTeX: x^*$ and $LaTeX: u^*$ satisfy complementary slackness)

See the supplement on Lagrangian Saddle Point Equivalence for further information.

# Lattice

A set $LaTeX: S$ that contains $LaTeX: x \vee y$ and $LaTeX: x \wedge y$ for all $LaTeX: x$ and $LaTeX: y$ in $LaTeX: S$.

# Lattice program

Minimizing a subadditive function on a lattice.

# Lattice search

This finds the maximum of a unimodal function on a finite set, $LaTeX: \{x_1, x_2, \ldots, x_m\}$, by evaluating the function at points placed by the modified fibonacci sequence: $LaTeX: K_{n+2} = K_{n+1} + K_n + 1$. This modification comes from the improvement obtained by dropping the inferior end point after an evaluation: if the set is in $LaTeX: [a, b]$ and $LaTeX: f(x) < f(y)$, the new interval is $LaTeX: [x+1, b]$ dropping the point $LaTeX: x$ from the remaining search set.

The method proceeds the same as fibonacci search, except the placements follow the modified fibonacci sequence, which is equivalent to $LaTeX: K_n = F_{n+1} - 1$, reflecting the improvement that accumulates as $LaTeX: n$ increases, for the extra point dropped after each evaluation. The placement ratio, $LaTeX: K_{n-1} / K_n$, also approaches the golden mean, so lattice search also approaches the golden section search.

Some refer to fibonacci search when they mean lattice search. The following table shows the difference. For example, with 20 evaluations, fibonacci search can guarantee finding the maximum on a set of 10,946 points, while lattice search can find it on a set of 17,710 points. Golden section is included for comparison as well. With 20 evaluations it can find the maximum on a set of only 9,349 points.

           n        5     10     15     20
===================================
F_n      8     89    987  10946
K_n     12    143   1596  17710
golden      6     76    843   9349
section
===================================


Lattice search minimizes the maximum number of evaluations needed to find the maximum on a given set of points. If the number in that set is $LaTeX: K_N$, the number of evaluations is $LaTeX: N$. If the number in the set is not exactly some modified fibonacci number, one can fill in dummy points at the end of the set. For example, if $LaTeX: K_{N-1} < m < K_N,$ let the set be $LaTeX: {x_1, \ldots, x_m, y_1, \ldots, y_k\}$, where $LaTeX: k = K_N- m$. Then, the (minimum) number of evaluations to guarantee finding the maximum of any unimodal function is $LaTeX: N$.

# Level set

For the function $LaTeX: f:X \rightarrow R,$ the $LaTeX: c$-level set is $LaTeX: \{x: x \in X, \, f(x) \le c\}$. Sometimes, the 0-level set is called simply the level set of f.

# Levenberg Marquardt algorithm

This is designed for solving nonlinear programs using the equation:

$LaTeX: (\nabla^2 f(x) + p I)d = -\nabla f(x),$

where $LaTeX: \nabla^2 f(x)$ is the Hessian. For unconstrained optimization, the solution $LaTeX: d$ serves as a direction vector for the iteration. This is also used in a trust region method. The parameter $LaTeX: p$ is set to give a balance between Newton's method $LaTeX: (p=0)$ and Cauchy's steepest descent $LaTeX: (p >> 0)$. A low value of $LaTeX: p$ helps get through difficult landscape curvature, and a high value yields some descent.

# Lexicographic Ordering constraint

is a global constraint that allows the comparison of sequences of variables. Given two sequences $LaTeX: x$ and $LaTeX: y$ of $LaTeX: n$ variables, $LaTeX: \langle x_1 \dots x_n \rangle$ and $LaTeX: \langle y_1 \dots y_n \rangle$, let $LaTeX: x \preceq_{lex} y$ denote the lexicographic ordering constraint. The constraint holds iff $LaTeX: n = 0$ or $LaTeX: x_1 < y_1$ or $LaTeX: x_1 = y_1$ and $LaTeX: \langle x_2 \dots x_n \rangle \preceq_{lex} \langle y_2 \dots y_n \rangle$.

# Lexicographic order

A nonzero vector is lexicographically positive if its first non-zero coordinate is positive. The vector $LaTeX: x$ is lexicographically greater than the vector $LaTeX: y$ if $LaTeX: x-y$ is lexicographically positive, and this defines the lexicographic order in $LaTeX: \mathbb{R}^n.$ This is a total ordering in that every two vectors are either equal, or one is lexicographically greater than the other.

This was first used in mathematical programming to resolve cycling in the simplex method. It also provides a way to obtain solutions for multiple objectives with the property that $LaTeX: x$ is a Pareto maximum if $LaTeX: f(x)$ is lexicographically greater than or equal to $LaTeX: f(y)$ for all feasible $LaTeX: y$.

# Lifting

In integer programming, it is creating a valid inequality for the original set, given one for a lower dimensional set, by augmenting a binary variable that is absent from the original inequality. That is, suppose $LaTeX: S$ is a subset of $LaTeX: \{0,1\}^n$ and $LaTeX: a^T x \le b$ is a valid inequality for $LaTeX: S\cap \{x \in \{0,1\}^n: x_1=0\}$ (where it does not matter what $LaTeX: a_1$ is since $LaTeX: x_1=0$). Then, under certain conditions, the inequality is valid for $LaTeX: S$. The "conditions" are problem dependent, and $LaTeX: a_1$ is set as large as possible.

For example, suppose $LaTeX: x_2 + x_3 + x_4 \le 2$ is a valid inequality for $LaTeX: S$. A lifting is the strengthening, $LaTeX: 2x_1 + x_2 + x_3 + x_4 \le 2$, if it can be ascertainted that this is a valid inequality for $LaTeX: S$. The coeffiecient of $LaTeX: x_1$ ($LaTeX: 2$ in the example) is determined by considering how large $LaTeX: a_1$ can be with $LaTeX: x_1=1$. (In this case, the lifted inequality is valid if $LaTeX: x_2 + x_3 + x_4 \le 2$ is valid for $LaTeX: \{x \in S: x_1=0\}$ and if $LaTeX: x_1=1$ implies $LaTeX: x_2 = x_3 = x_4 = 0.$) The motivation for doing this arises in a branch and cut algorithm strategy. At a node in the search tree we might have a valid inequality, $LaTeX: a^Tx \le b$, when the branch had $LaTeX: x_1=0$, and we want to make it valid even if $LaTeX: x_1=1$. The inequality is valid for all

$LaTeX: a_1 \le b - \max \{a_2 x_2 + \ldots + a_n x_n : x \mbox{ feasible and } x_1 = 1\}.$

# Line search

Optimizing the objective along a line: $LaTeX: \mbox{opt} \{ f(x + \alpha \, d): s \ge 0\}$, where $LaTeX: x$ is a given point and $LaTeX: d$ is a given direction vector ($LaTeX: \alpha$ is a scalar). A coarse tolerance could be used, so "Opt" is not entirely accurate. (In fact, there are reasons to believe not optimizing is best.)

Here are some particular search methods in this glossary:

# Line segment

The line segment with end points $LaTeX: x$ and $LaTeX: y$ in $LaTeX: \mathbb{R}^n$ is

$LaTeX: \{\alpah x + (1-\alpha)y: \alpha \in [0, 1]\},$

which is denoted $LaTeX: [x, y]$ and is the closed line segment. The open line segment is

$LaTeX: \{\alpah x + (1-\alpha)y: \alpha \in (0, 1)\},$

# Linear combination

The vector $LaTeX: y$ is a linear combination of vectors $LaTeX: x_1, x_2, \ldots, x_n$ if $LaTeX: \textstyle y = \sum_{k=1}^n {a_k x_k}$, where $LaTeX: a_k$ is called the coefficient of $LaTeX: x_k$.

See convergence.

# Linear program

An optimization problem in which the objective and constraints are linear. Forms include $LaTeX: \min \{c^Tx: Ax = b,\, x >= 0\}$ and $LaTeX: \max \{c^Tx : Ax \le b\}$. The standard form assumes $LaTeX: A$ has full row rank. Computer systems ensure this by having a logical variable $LaTeX: (y)$ augmented, so the form appears, for example, as $LaTeX: \min \{c^Tx: Ax + y = b,\, L \le (x, y) \le U\}$ (also allowing general bounds on the variables). The original variables $LaTeX: (x)$ are called structural. Note that each logical variable can be a slack, surplus, or artificial variable, depending on the form of the original constraint. This computer form also represents a range constraint with simple bounds on the logical variable. Some bounds can be infinite (i.e., absent), and a free variable (logical or structural) is when both of its bounds are infinite.

# Linearity interval

When a univariate function is piece-wise linear, it has the form $LaTeX: a_i x + b_i$ for $LaTeX: x$ in the interval $LaTeX: [c_i, c_{i+1}]$, where $LaTeX: a_i$ is not equal to $LaTeX: a_{i+1}$. (The phrase usually means the function is continuous.) This arises in linear programming when considering the optimal value as a function of varying right-hand sides (or cost coefficients) in fixed proportions: $LaTeX: b+td$ (or $LaTeX: c+td$), where $LaTeX: d$ is an admissible change vector and $LaTeX: t$ is the (scalar) variable. Then, for the range of $LaTeX: t$ where the LP has a solution, the optimal value function has this piece-wise linear (continuous) form, and the intervals that comprise the domain are called linearity intervals.

# Linearization

The strategy of linearization is to reformulate a nonlinear program as a linear progarm through the introduction of auxiliary variables and constraints that are used to circumvent the nonlinearities. See standard linearization, Glover's linearization, and Reformulation-Linearization-Technique

# Lipschitz continuous

The function $LaTeX: f: X \rightarrow \mathbb{R}$ is Lipschitz continuous if there exists a value $LaTeX: K$, called the Lipschitz constant, such that $LaTeX: |f(x) - f(y)| \le K \|x-y\|$ for all $LaTeX: x$ and $LaTeX: y$ in $LaTeX: X$. This relation is called the Lipschitz condition. It is stronger than continuity because it limits the slope to be within $LaTeX: [-K, K]$. The generalized Lipschitz condition is that there exists a monotonically increasing function, $LaTeX: M : \mathbb{R} \rightarrow \mathbb{R}$ with the property that $LaTeX: m(z) \rightarrow 0$ as $LaTeX: z \rightarrow 0$ such that there exists $LaTeX: K$ for which $LaTeX: |f(x)-f(y)| \le K m(\|x-y\|)$ for all $LaTeX: x$ and $LaTeX: y$ in $LaTeX: X.$

See convergence.

# Local optimum

A feasible point $LaTeX: x^*$ that is an optimal solution to the mathematical program whose feasible region is the intersection of the original region and some neighborhood of $LaTeX: x^*$.

# Locally convex function

There exists a subset of the domain, such as the neighborhood of a point, such that the function is convex on that subset.

# Lockbox problem

A firm wants a check that it receives to clear as quickly as possible (so it can use the money). To reduce the number of days to clear a check, the firm opens offices, called lockboxes, in different cities to handle the checks. There is a cost to open a lockbox, and there are average annual losses calculated from interest rates and the number of days to clear a check sent from source $LaTeX: i$ to lockbox city $LaTeX: j$.

# Logical variable

See linear program.

# Lot size problem

This is one of the oldest mixed-integer programs in operations research, first presented by Wagner and Whitin in 1958. The problem is to minimize cost while satisfying product demands over (discrete) time. Let $LaTeX: y_t$ be the number of units produced in period $LaTeX: t$, for $LaTeX: t=1,\ldots,T$ ($LaTeX: T$ is called the planning horizon), and let $LaTeX: x_t = 1$ if a setup occurs in period $LaTeX: t$; $LaTeX: 0$ otherwise. Let $LaTeX: D_t$ be the demand in period $LaTeX: t$, and let the demand from period $LaTeX: i$ to period $LaTeX: j,$ inclusive, be $LaTeX: d_{ij} = \sum_{i \le t \le j} D_t.$ Then, a MILP formulation is:

$LaTeX: \min \left\{ c^T x + d^T y: x \in \{0,1\}^n, \, y \ge 0, \sum_{t\le i} y_t \ge d_{li} \mbox{ for } i=1, \ldots,n-1, \, \sum_{t\le n} y_t = d_{1n}, \, d_{in} x_i - y_i \ge 0 \mbox{ for } i=1, \ldots,n \right\}.$

(This is preferred over the MILP that defines inventory balance equations. Even though they are equivalent, the LP relaxation of the above is sharper.)

# Lower semi-continuity

Suppose $LaTeX: x^k \rightarrow x$. Of a function, $LaTeX: f$ is lower semi-continuous if

$LaTeX: \lim_{k \rightarrow \infty} \inf_{j \ge k} f(x^j) \ge f(x).$

Of a point-to-set map, $LaTeX: A$, let $LaTeX: N_{\varepsilon}[S]$ be a neighborhood of the set $LaTeX: S$: for each $LaTeX: \varepsilon > 0$, there exists $LaTeX: K$ such that for all $LaTeX: k \ge K$, $LaTeX: A(x)$ is a subset of $LaTeX: N_{\varepsilon}[A(x^k)]$. A pathology to show that the feasibility map may not be lower semi-continuous is:

$LaTeX: A(b) = L_b(g) = \{x \in [-1, 1]: g(x) \le b\},$

where $LaTeX: g(x) = x$ if $LaTeX: x$ is in $LaTeX: [-1, 0]$, and $LaTeX: g(x) = 0$ if $LaTeX: x \in [0,1].$

# Lower triangular matrix

A square matrix, $LaTeX: A$, such that $LaTeX: A_{ij}=0$ for $LaTeX: i \le j$.

# MAX CSP

A constraint satisfaction problem where some constraints may be violated, and the quality of the solution is measured by the number of satisfied constraints (i.e., the optimal solution will have the maximum of satisfied constraints).

# MIMI

Manager for Interactive Modeling Interfaces - A modeling system that integrates mathematical programming, database management and artificial intelligence.

# MINOS

Modular In-core Nonlinear Optimization System - An optimization system, available from Stanford Optimization Lab.

# MODLER

Modeling by Object Driven Linear Elemental Relations

# MOSEK

A software package for solving large-scale sparse linear, conic and convex optimization problems.

# MPL

Mathematical Programming Language.MPL.


# Main Page

This glossary contains terms specific to mathematical programming and related disciplines like economics, computer science, and mathematics. A thread of terms is constructed by following other terms that are defined within those already displayed. Terms can be removed from a thread by clicking . A new thread is initiated by following the title of one of the thread's terms.

The following are of general interest (you should read them if this is your first time here).

 Please remember this is a glossary, not a survey, so no attempt is made to cite credits. ∴

# Manpower planning problem

This has many variations, but typically focuses on either short-term problems, like scheduling workers, or on longterm problems, like recruiting. There are workers of different categories, some relate to each other by level (e.g., copilot and pilot), others do not. For longterm manpower planning, inventory balance equations  are typically used to represent changes in the number of employees in each category over time, including new hires and various forms of departures. For the short-term scheduling, a simple model is the shortest path  through a network. Often, more complicated combinatorial programs are required.

# Marginal price

The rate at which the optimal objective value changes with respect to a perturbation of a right-hand side, like a supply, demand or capacity limit. When it exists, the marginal price is often equal to a most pessimistic dual price (e.g., consumer's marginal price is the greatest dual price, which reflects what another unit of demand would cost to satisfy).

# Markov decision process

A stochastic dynamic program, whereby for each policy the resulting state variables comprise a Markov process (a stochastic process with the property that the conditional probability of a transition to any state depends only on the current state, and not on previous states).

# Matching problem

Given a graph, a matching is a subgraph with the property that no two edges are incident to the same node. Subject to certain restrictions, the problem is to find a feasible matching that optimizes some objective, such as minimizing the total cost of the edges in the matching. A perfect matching is when each node is incident with exactly one edge.

The assignment problem is a classical perfect matching, whereby the graph is bipartite (nodes are two sets: people and jobs), and the matching must have all nodes (every person must do a job, and every job must be done). Then, the matching corresponds precisely to an assignment since each job node is incident with exactly one person node in the matching. Further, the cost of each such matching is precisely the total cost of the corresponding assignment, so this min-cost matching problem is the same as the assignment problem.

Another type of matching problem, called maximum cardinality matching, is to find a matching with the greatest number of nodes, and this is also solvable in polynomial time. However, minimum weight – maximum cardinality is NP-complete.

# Mathematical program

Commonly an optimization problem of the form

$LaTeX: \max \{ f(x) : x \in X,\, g(x) \le 0, \, h(x) = 0\},$

where $LaTeX: X$ is a subset of $LaTeX: \mathbb{R}^n$ and is the domain of $LaTeX: f$, $LaTeX: g$ and $LaTeX: h,$ which map into real spaces. The function $LaTeX: f$ is called the objective function, which is typically real-valued. If not, then $LaTeX: f$ maps into $LaTeX: \mathbb{R}^p$ with $LaTeX: p \ge 2,$ and the problem is a multiple objective problem. The feasible region is the collection of $LaTeX: x$ that simultaneously satisfy $LaTeX: x$ in $LaTeX: X,$ $LaTeX: g(x) \le 0,$ and $LaTeX: h(x) = 0,$ which are the program's constraints.

See norm.

# Matroid

This is an abstraction of independence that includes vectors in linear algebra and circuits in graphs. First, we need some preliminary definitions. Let $LaTeX: N = \{1, \ldots, n\}$ be a finite set, and let $LaTeX: M = \{S_1, \ldots, S_m\}$ be a collection of subsets of $LaTeX: N$. $LaTeX: (N,M)$ is an independence system if $LaTeX: S_i$ in $LaTeX: M$ implies every subset of $LaTeX: S_i$ is in $LaTeX: M$. Elements of $LaTeX: M$ are called independent sets, and the remaining subsets of $LaTeX: N$ are called dependent sets. An independent set, say $LaTeX: S_i$, is maximal if $LaTeX: S_i \cup \{k\}$ is not in $LaTeX: M$ for any $LaTeX: k \in N\backslash S_i$. The max-cardinality independent set of any subset of $LaTeX: N$, say $LaTeX: T,$ is given by:

$LaTeX: m(T) = \max \{ |S_i|: S_i \in M, \, S_i \subseteq T\}.$

A matroid is an independence system $LaTeX: (N, M)$ in which for any subset of $LaTeX: N$, say $LaTeX: T$, every independent set in $LaTeX: T$ that is maximal in $LaTeX: T$ has cardinality $LaTeX: m(T)$. The definition essentially means that maximal and maximum are the same. In other words, a system is a matroid if, and only if, a greedy algorithm yields a globally optimal solution to the associated max weight problem. An example is the spanning tree.

# MaxMin

(or maximin). The objective is, itself, the value of another mathematical program:

$LaTeX: f(x) = \inf \{ f(x, y): y \in Y(x),\, g(x, y) \ge 0,\, h(x, y) = 0\}.$

Then, the objective is to maximize f(x) (on some feasible region).

# Max flow-min cut theorem

The maximum flow through a (single commodity) capacitated network from a specified node, called the source, to another node, called the sink, equals the value of the minimum cutset. Originally proven directly from principles of networks, this was discovered to be a special case of the [[LP|]]uality Theorem of Linear Programming.

# Max flow problem

In a network, $LaTeX: [V, A],$ there are two special nodes in $LaTeX: V:$ a source $LaTeX: (s)$ and a sink $LaTeX: (t)$. The problem is to maximize the flow from $LaTeX: s$ to $LaTeX: t$ subject to conservation of flow constraints at each node and flow bounds on each arc. The max flow labeling algorithm provides a constructive proof of the Max flow - Min cut theorem.

This can also be obtained from the duality of the following linear program:

maximize $LaTeX: v:$

$LaTeX: 0 \le x_{ij} \le U_{ij}$ for $LaTeX: (i, j) \in A$;

$LaTeX: \sum_j x_{ij} - \sum_j x_{ji} = 0$ for $LaTeX: i \in V \backslash \{s,t\};$

$LaTeX: \sum_j x_{sj} - \sum_j x_{js} = v;$

$LaTeX: \sum_j x_{tj} - \sum_j x_{jt} = -v.$

$LaTeX: U$ are upper bounds (capacities) on arc flows, and each sum is for $LaTeX: j$ such that the indicated arc is in $LaTeX: A.$

# Maximal

In a partially ordered set, a maximal element is one for which no element follows it in the ordering. This is not to be confused with maximum. For example, consider the following knapsack problem:

$LaTeX: \max \{ 5x + y: (x, y) in \mathbb{Z}^2, \, x \ge 0, \, y \ge 0, 3x + 2y \le 3\}.$

The maximum value is $LaTeX: 5$, with $LaTeX: (x, y) = (1, 0).$ Define the partial ordering to be the ordinary greater-than-or-equal-to, so that $LaTeX: (x',y') > (x, y)$ means $LaTeX: x' \ge x,$ $LaTeX: y' \ge y,$ and either $LaTeX: x' > x$ or $LaTeX: y' > y$. In words, $LaTeX: (x', y') > (x, y)$ means we can put at least one more item into the knapsack, in which case we would increase our return. When we cannot do so, $LaTeX: (x, y)$ is a maximal element. In particular, $LaTeX: (0,1)$ is a maximal element, and its value is $LaTeX: 1$, which is not the maximum.

Another definition pertains to a subset with respect to some property such that no proper superset has that property. In the 0-1 knapsack problem with $LaTeX: n$ variables, we define the subset of items taken (for any solution) as $LaTeX: J=\{j: x_j=1\}.$ Then, we define $LaTeX: J$ as maximal if no item can be added without violating the limit, i.e., $LaTeX: \textstyle a_k > b - \sum_{j\in J} a_j$ for all $LaTeX: k$ not in $LaTeX: J.$

# Maximand

The objective function in a mathematical program whose sense of optimization is to maximize.

# Maximum

(pl. maxima). A feasible point at which the supremum is achieved. (See Weierstrass' Theorem.) An $LaTeX: \varepsilon$-maximum is within $LaTeX: \varepsilon$ of being a maximum: $LaTeX: \textstyle f(x) \ge f^* - \varepsilon$, where $LaTeX: f^*$ is the supremum and $LaTeX: \varepsilon > 0.$

# Maximum principle

Necessary conditions for a solution to the (constrained) problem of variational calculus, given in terms of nonlinear differential equations. Generally credited to Pontryagin (1962), it was derived as an extension of the Euler-Lagrange conditions for variational calculus, and later was derived from dynamic programming.

# Maxmin

(or maximin). The objective is, itself, the value of another mathematical program:

$LaTeX: f(x) = \inf \{f(x, y): y \in Y(x),\, g(x, y) \ge 0, \, h(x, y) = 0\}.$

Then, the objective is to maximize $LaTeX: f(x)$ (on some feasible region).

# Memetic algorithm

This a population-based approach for heuristic search in optimization problems. It uses the crossover operation of genetic algorithms, but combines it with local search heuristics. Implementations are typically parallel with an MIMD architecture.

# Metaheuristic

This is a general framework for heuristics in solving hard problems. The idea of meta is that of level. An analogy is the use of a meta-language to explain a language. For computer languages, we use symbols, like brackets, in the meta-language to denote properties of the language being described, such as parameters that are optional. Examples of metaheuristics

are:

Besides parameters that define specific algorithms with each framework, these metaheuristics also require some concept of representation, which is problem dependent. Also see GRASP.

# Method of centers

This is an interior method defined for a mathematical program without equality constraints with a nonempty strict interior. Considering $LaTeX: \max \{ f(x) : x \in X, \, g(x) \le 0 \},$ define

$LaTeX: G(w, x) = \min \{f(w)-f(x), -g_1(w), \ldots,-g_m(w)\}$

Observe that $LaTeX: G(w, x) > 0$ if, and only if, $LaTeX: f(w) > f(x)$ and $LaTeX: g_i(w) < 0$ for all $LaTeX: i$. Then, the algorithm map is $LaTeX: \arg\max \{G(w,x): w \in \mathbb{R}^n\}.$

# Metric

A nonnegative, real-valued function, $LaTeX: d$, on pairs from a set $LaTeX: S$ such that for each $LaTeX: x,$ $LaTeX: y,$ and $LaTeX: z$ in $LaTeX: S:$

1. $LaTeX: d(x, y) \ge 0;$
2. $LaTeX: d(x, y) = 0 \Rightarrow x=y;$
3. $LaTeX: d(x, y) + d(y, z) \ge d(x, z).$

The function is also called a distance; the pair $LaTeX: (S, d)$ is a metric space. Condition (3) is called the Triangle inequality. A space is metrizable if a metric can be defined. A metric is induced by a norm if one exists: $LaTeX: d(x,y) = \| x - y \|.$ See Hausdorff metric.

# Min-conflicts heuristic

The min-conflicts heuristic is a local search or heuristic repair method (see heuristic search) for solving CSPs. This heuristic randomly selects a variable in the scope of a violated constraint and assigns it to the value in its domain that minimizes the number of violated constraints. If there are more than one such value, it chooses randomly among them.

# Minimal

In a partially ordered set, a minimal element is one that does not follow another in the ordering. This is not to be confused with minimum.

Another definition pertains to a subset with respect to some property such that no proper subset has that property. In the 0-1 knapsack problem the set of items not taken is minimal with respect to the greedy algorithm of adding the most valuable item, if it fits, until there are no more items than can fit. This does not necessarily produce a maximum objective value, but the converse is certainly true:  in order that $LaTeX: x$ be an optimal solution, $LaTeX: \{j: x_j=0\}$ must be minimal with respect to adding items that fit. See maximal for analogy and numerical example.

# Minimal inequality

In integer programming, a valid inequality is minimal if it not dominated by any valid inequality. Originally, this was limited to not being able to decrease any coefficient and remain valid. For example, suppose $LaTeX: 2x_1 + x_2 \ge 1$ is a valid inequality. Then, if we decrease the first coefficient to obtain $LaTeX: x_1 + x_2 = 1,$ either this is not valid or it dominates the former, rendering it non-minimal.

More generally, suppose $LaTeX: a^Tx \ge b$ is a valid inequality, and we consider $LaTeX: (a',b')$ such that $LaTeX: a' \le t a$ and $LaTeX: b' \ge t b$ for some positive $LaTeX: t$ such that $LaTeX: (a',b') \neq t (a,b).$ If $LaTeX: (a')^T x \ge b'$ is a valid inequality, it dominates the original one because

\{ x \in \mathbb{R}^n : x \ge 0, \, (a')^T x \ge b \} \subseteq \{ x \in \mathbb{R}^n : x \ge 0, \, a^T x \ge 0 \}.

For example, $LaTeX: 4x_1 + 2x_2 \ge 3$ dominates $LaTeX: 2x_1 + x_2 \ge 1$ (use $LaTeX: t = 2$), so if this is valid, $LaTeX: 2x_1 + x_2 \ge 1$ cannot be minimal. Every facet-defining inequality is minimal, but not conversely.

# Minimand

The objective function in a mathematical program whose sense of optimization is to minimize.

# Minimax

The objective is, itself, the value of another mathematical program:

$LaTeX: f(x) = \sup \{f(x, y): y \in Y(x),\, g(x, y) \ge 0, \, h(x, y) = 0\}.$

Then, the objective is to minimize $LaTeX: f(x)$ (on some feasible region).

# Minimax theorem

Proven by von Neumann in 1928, this is a cornerstone of duality (and of game theory). It states that there exists $LaTeX: x \in S_m$ and $LaTeX: y \in S_n$ such that $LaTeX: x^T A y$ is a saddlepoint of the bilinear form:

$LaTeX: \min \left\{ \max \{ u^T A v: v \in S_n\}: u \in S_m \right} = \max \left\{ \min \{ u^T A v: u \in S_m\}: v \in S_n \right\} = x^TAy.$

This extends to the following.

Let $LaTeX: F:X\timesY\rightarrow \mathbb{R}$ be such that $LaTeX: X$ and $LaTeX: Y$ are non-empty, convex, compact sets, $LaTeX: F(.,y)$ is convex on $LaTeX: X$ for each $LaTeX: y \in Y$, and $LaTeX: F(x,.)$ is concave on $LaTeX: Y$ for each $LaTeX: x \in X$. Then, there exists a saddlepoint, $LaTeX: (x^*,y^*) \in X\times Y$ such that

$LaTeX: \min \left\{ \max \{F(x,y): y \in Y\}: x \in X\right\} = \max \left\{ \min \{F(x,y): x \in X\}: y \in Y\right\} = F(x^*,y^*).$

# Minimum

(pl. minima). A feasible point at which the infimum is achieved. (See Weierstrass' Theorem.) An $LaTeX: \varepsilon$-minimum is within e of being a minimum: $LaTeX: f(x) \le f^* + \varepsilon,$ where $LaTeX: f^*$ is the infimum and $LaTeX: \varepsilon > 0$.

# Minkowski inequality

The Minkowski inequality is

$LaTeX: \|x + y\|_p \le \|x\|_p + \|y\|_p,$

where $LaTeX: \|\cdot \|_p$ denotes the $LaTeX: L_p$ norm and $LaTeX: p \ge 1$. Equality holds if, and only if, $LaTeX: x = \alpha y$ for some scalar, $LaTeX: \alpha.$

# Mixed-integer program

A mixed-integer program (MIP) is a mathematical program in which some of the variables are required to be integer-valued. Historically, this term implied the mathematical program was otherwise linear, so older papers (and some recent ones) mean this, but most now refer to the following:

MILP: Mixed integer linear program
MINLP: Mixed integer nonlinear program
Also see combinatorial program.

# Modified Newton method

The Newton method is designed to find the root of a function, say $LaTeX: \nabla f(x)=0$, by the algorithm map $LaTeX: A(x) = x - \nabla^2 f(x)^{-1} \nabla f(x).$ This need not converge, so a modification is to use line search, resulting in the algorithm map:

$LaTeX: A(x) = x - s*\nabla^2 f(x)^{-1} \nabla f(x),$

where $LaTeX: s^* \in \arg\max \{f(x - s\nabla^2 f(x)^{-1} \nabla f(x)): s \ge 0\}.$ More generally, we could have another step size rule; as long as it is chosen to converge, the modified algorithm is sometimes called the damped Newton method.

# Monoid

A set, $LaTeX: M$, defined on rational vectors, such that: (1) $LaTeX: 0 \in M$, and (2) if $LaTeX: v,$ $LaTeX: w$ are in $LaTeX: M$, then $LaTeX: v+w$ is in $LaTeX: M.$ The monoid is integral if it contains only integer vectors. One can think of a monoid as a discrete analogue of a convex cone.

# Monotonic function

A function that either never decreases or never increases. A non-decreasing, or isotonic, function satisfies: $LaTeX: f(x') \ge f(x)$ whenever $LaTeX: x' \ge x$ (it is strictly increasing if $LaTeX: f(x') > f(x)$ for $LaTeX: x' > x$). A non-increasing, or anatonic, function satisfies: $LaTeX: f(x') \le f(x)$ whenever $LaTeX: x' \ge x$ (it is strictly decreasing if $LaTeX: f(x') < f(x)$ for $LaTeX: x'> x$). This extends to a vector function, where $LaTeX: \mbox{range}(f)$ is in $LaTeX: \mathbb{R}^n$: $LaTeX: (f(x)-f(x))^T (x-y) \ge 0.$

# Monte Carlo optimization

A class of algorithms that seek a maximum by sampling, using a pseudo-random number generator. One example is simulated annealing.

# Moore-Penrose inverse

This was originally introduced in the context of linear network flows:

It is possible to send more flow from supply to demand nodes at lower cost, even if all arc costs are positive.

See Myths and Counter Examples in Mathematical Prgramming for a non-network example.

# Mosel

A modelling and solving environment and language, which was designed to fit with Xpress-MP.

# Multi-commodity flow

A network flow problem in which more than one commodity share arc capacities. This applies to communication networks as well as to shipments of different grains on barges of limited capacities. It is more complicated than the single-commodity flow, generally NP-complete (except for some special cases). See [1] for one of the complexities.

# Multi-stage decision process

See dynamic programming and the recourse model in stochastic programming.

# Multilevel program

The "level" refers to sets of variables. A bilevel program has two sets:

$LaTeX: \min \{ f(x, y): x \in X,\, y \in Y, \, h(x, y)=0, \, g(x, y)=0\}.$

A reason for identifying levels is to apply a decomposition principle for algorithm design. One example is the bilinear program. Another is when one set of variables is constrained to be a solution to an inner optimization problem:

$LaTeX: \min \{ f(x, y): x \in X,\, y \in Y,\, h(x, y)=0,\, g(x, y)=0,\, y \in \arg\max \{F(x, y): y \in S(x)\} \},$

where $LaTeX: S(x)$ is some subset of $LaTeX: Y$.

# Multiple objectives

The problem has more than one objective function. Since these do not lie in a totally ordered set, a solution is often defined as an efficient point (sometimes called a Pareto optimum): $LaTeX: x^*$ is feasible, and there does not exist another feasible point, $LaTeX: x$, such that $LaTeX: f(x) \ge f(x^*)$ and $LaTeX: f_i(x) > f_i(x^*)$ for some $LaTeX: i$, where $LaTeX: i$ indexes the objective functions, and we assume we are maximizing. This reduces to the usual definition of an optimum when there is only one objective.

There have been two principle approaches to solving a multiple objective mathematical program (MOMP):

1. weighted sum: Maximize $LaTeX: w^T f(x)$, where $LaTeX: w > 0.$
2. lexicographically: $LaTeX: F_1 = \max \{f_1(x)\}$ and for $LaTeX: i > 1,$ $LaTeX: F_i = \max \{f_i(x): f_k(x) = F_k \mbox{ for } k=1, \ldots, i-1\}.$ This results in $LaTeX: x^*$ for which $LaTeX: f(x^*)$ is lexicographically greater than (or equal to) any feasible solution. (Note: the order is fixed in advance.)

Both methods yield an efficient point (if one exists). Under certain assumptions, both methods can be used to generate the set of all efficient points, called the efficient frontier. (Care must be exercised in doing so, however; for example, see MOP Myth #2 for issues that arise when varying the weights of the objectives.)

# Myopic optimization

Given any partial solution, assign another value that improves the objective the most. (Algorithms that produce solutions based on myopic optimization are called greedy algorithms.)

# N-Opt

This is a heuristic for the TSP, though it can apply more generally to a problem that seeks a subgraph with all nodes. It is a local search, which considers a move by removing n edges from a trial solution, then replacing them with some other n edges. In TSP, the 2-Opt neighbors of the tour $LaTeX: \textstyle (1,2,\dots,n)$ are pair-wise exchanges. The set of exchanges, however, is larger than the set of 2-Opt neighbors. For example, consider $LaTeX: n=4,$ so the original tour is $LaTeX: \textstyle (1,2,3,4).$ The six pair-wise exchanges of this permutation are:

      (2,1,3,4):  (1)*--(2)          (3,2,1,4):  (1)*--(2)
\  *                          |     *
\/                           |     |
/\                           |     |
/  *                          *     |
(4)*--(3)                      (4)--*(3)

(4,2,3,1):  (1)   (2)          (1,3,2,4):  (1)   (2)
|*  * |                        |*  * |
| \/  |                        | \/  |
| /\  |                        | /\  |
*/  \ *                        */  \ *
(4)   (3)                      (4)   (3)

(1,4,3,2):  (1)*--(2)          (1,2,4,3):  (1)--*(2)
|     *                         *  /
|     |                          \/
|     |                          /\
*     |                         *  \
(4)--*(3)                      (4)--*(3)


We see duplicates due to the equivalence of any cyclic permutation. Also, there are only two 2-Opt moves if the TSP is symmetric:

                  (1)---(2)                      (1)   (2)
\  /                          |\  / |
\/                           | \/  |
/\                           | /\  |
/  \                          |/  \ |
(4)---(3)                      (4)   (3)


For $LaTeX: n=2,$ the replacement is unique - that is, once two edges are chosen for removal, there is only one replacement. This is not true for $LaTeX: n > 2,$ and part of the heuristic is to decide what replacement to make.

# NP-complete

Problems are divided into two categories: those for which there exists an algorithm to solve it with polynomial time complexity, and those for which there is no such algorithm. We denote the former class of problems by $LaTeX: P.$ There are problems for which no known algorithm exists that solves it in polynomial time, but there is also no proof that no such algorithm exists. Among these problems that are not known to be in $LaTeX: P$ (or in $LaTeX: ~P$), there is a subclass of problems known as NP-complete: those for which either all are solvable in polynomial time, or none are. Formally, a problem is NP if there exists an algorithm with polynomial time complexity that can certify a solution. For example, it is not known whether there exists a polynomial algorithm to solve a system of Diophantine equations, $LaTeX: \textstyle Ax=b \mbox{ for } x \in \mathbb{Z}^n$ (integer n-vectors). However, we can certify any trial $LaTeX: x$ in polynomial time, just by checking that it is in $LaTeX: \textstyle \mathbb{Z}^n,$ then multiplying by $LaTeX: A$ to compare with $LaTeX: b.$ A problem, $LaTeX: p,$ is NP-complete if it is NP and for any problem in NP, there exists a polynomial time algorithm to reduce it to $LaTeX: p.$ A fundamental member of the NP-complete class is the satisfiability problem. It is an open question whether $LaTeX: P=\mbox{NP},$ and most regard the NP-complete problems as having exponential time complexity. Further details are in the supplement, Introduction to Computational Complexity.

# NP-hard

An optimization problem that relies upon the solution of an NP-complete problem. In that sense, NP-hard problems are at least as hard as NP-complete problems.

# Near optimal

A point $LaTeX: x$ that is within a small value, say $LaTeX: e > 0,$ of optimality in some sense. The following are the common measures of nearness:

• in value: $LaTeX: x$ feasible and $LaTeX: f(x) \ge f(x^*) - e.$
• in policy: $LaTeX: \textstyle ||x - x^*|| \le e.$
• in resource level: $LaTeX: x \in \argmax \left \{f(y): y \in X, g(y) \le b, h(x) = c \right \},$ where $LaTeX: \textstyle ||(b, c)|| \le e.$

# Nearest neighbor algorithm

A type of greedy algorithm for combinatorial programs where there is measure of nearness between neighbors. An example is the traveling salesman problem, where the procedure is to choose the next city to be one that is nearest the current city in the sequence.

# Negative definite matrix

$LaTeX: x^TAx < 0$ for all nonzero $LaTeX: x.$

# Negative semi-definite matrix

$LaTeX: x^TAx \le 0$ for all $LaTeX: x.$

# Neighborhood

For a normed space, the neighborhood of a point, $LaTeX: x,$ is the open ball centered at $LaTeX: \textstyle x: \left \{y: ||y-x|| < e \right \},$ where $LaTeX: e > 0.$ The closed ball, with $LaTeX: \textstyle ||y-x|| \le e,$ is also a neighborhood. The usual notation (in this glossary) is $LaTeX: N_e(x),$ and context dictates whether it is open or closed. This extends to the neighborhood of a set $LaTeX: S$ by taking the union; equivalently, $LaTeX: \textstyle N_e[S] = \left \{y: ||y-x|| \le e \mbox{ for some } x \in S \right \}.$

In integer programming, the neighborhood could mean integer-valued neighbors of the form $LaTeX: \textstyle |x-x^*| \le 1.$ In a combinatorial program, where variables are binary, integer-valued neighbors comprise all members of $LaTeX: \textstyle \{0,1\}^n.$ In this case the neighborhood is defined relative to some subset of binary variables reachable by some operation that depends on the problem. This is what is generally meant by a neighborhood in heuristic search in general, and simulated annealing or tabu search in particular, where a move is defined in terms of neighbors. In that context, a neighborhood could be as simple as complementing the value of one variable, as deletions or additions in a knapsack problem, or it could be more complex, like a [[n-Opt|]]-Opt neighbor of a TSP tour.

This is a heuristic search (not guaranteed to find a solution) for unconstrained optimization. Let $LaTeX: \textstyle \left \{x^0, x^1, \dots, x^n \right \}$ be $LaTeX: n+1$ points in $LaTeX: \mathbb{R}^n$ that define a simplex. Define the extreme values: $LaTeX: \textstyle f(x^u) = \max \left \{f(x^0), \dots, f(x^n) \right \}$ and $LaTeX: \textstyle f(x^l) = \min \left \{f(x^0), \dots, f(x^n) \right \}.$ Seeking a maximum of $LaTeX: \textstyle f \in \mathbb{R}^n,$ the idea is to replace $LaTeX: x^l$ with a point having a better objective value.

Here is an iteration:
1. Define the centroid of the simplex without this point of least value: $LaTeX: \textstyle c = \sum_i \left \{x^i: i \ne l \right \}/n.$
2. reflection: compute $LaTeX: \textstyle r = c + a(c - x^l),$ where $LaTeX: a > 0$ ($LaTeX: a$ is called the "reflection coefficient").
3. expansion: If $LaTeX: \textstyle f(r) > f(x^l)$ (i.e., we have a better point), compute $LaTeX: \textstyle x = c + b(c - x^l),$ where $LaTeX: b > 1$ ($LaTeX: b$ is called the "expansion coefficient"). If $LaTeX: \textstyle f(x) > f(r), x$ replaces $LaTeX: x^l,$ and the iteration is complete. Otherwise, $LaTeX: r$ replaces $LaTeX: x^l,$ and the iteration is complete.
4. At this step, the reflection point $LaTeX: (r)$ is not better than the worst of the simplex vertices (i.e., $LaTeX: \textstyle f(r) \le f(x^l)$). This is divided into the following cases.
1. If $LaTeX: \textstyle \min \left \{f(x^i): i \ne l \right \} \le f(r) \le f(x^u),$ replace $LaTeX: x^l$ with $LaTeX: r.$
2. If $LaTeX: \textstyle \min \left \{f(x^i): i \ne l \right \} > f(r),$ define $LaTeX: x^*$ as the better of the two: $LaTeX: \textstyle f(x^*) = \max \left \{f(x^l), f(r) \right \}$ (with $LaTeX: \textstyle x^* = x^l \mbox{ or } r,$ resp.). Then, take a contraction step: $LaTeX: \textstyle x = c + d(x^* - c),$ where $LaTeX: 0 < d < 1$ ($LaTeX: d$ is called the "contraction coefficient"). If $LaTeX: \textstyle f(x) > f(x^*), x$ replaces $LaTeX: x^l,$ and the iteration is complete. Otherwise, the last resort is to replace all $LaTeX: x^i$ with: $LaTeX: \textstyle x'^i = (x^i + x^u)/2.$

# Network

A collection of nodes, V, sometimes called vertices, plus a collection of arcs, A, which are directed from one node to another. The two sets form a network, denoted $LaTeX: N=[V,A].$ As such, it can be considered a directed graph (see other terms, like special graphs).

• Here are associated functions and data values:
• tail of k-th arc $LaTeX: (i, j)$ is node $LaTeX: i;$ we sometimes write $LaTeX: T(k).$
• head of k-th arc $LaTeX: (i, j)$ is node $LaTeX: j;$ we sometimes write $LaTeX: H(k).$
• in-degree of node $LaTeX: i$ is $LaTeX: \textstyle |\left\{k: H(k)=i\right\}|.$
• out-degree of node $LaTeX: i$ is $LaTeX: \textstyle |\left\{k: T(k)=i\right\}|.$
• arc capacity limits the total flow across the arc at any one time.
• node capacity limits the total flow through a node at any one time.
• supply or demand at a node provides external input or an output requirement.

# Network flows

This is an assignment of arc values, called flows, say $LaTeX: f(k)$ for the k-th arc, that satisfy two types of constraints: (1) arc bounds, $LaTeX: \textstyle L \le f \le U,$ and (2) node balances, $LaTeX: \textstyle \mbox{Flow out of node } i - \mbox{ Flow into node } i = b(i).$ The flow out of node $LaTeX: i$ can be expressed as $LaTeX: \textstyle \sum_k \left \{f(k): T(k)=i \right \},$ and the flow into node $LaTeX: i$ can be expressed as $LaTeX: \textstyle \sum_k \left \{f(k): H(k)=i \right \}.$

If $LaTeX: b(i) < 0, -b(i)$ is a supply at node $LaTeX: i$ (called a supply node); if $LaTeX: b(i) > 0, b(i)$ is a demand at node $LaTeX: i$ (called a demand node). If $LaTeX: b(i)=0,$ node $LaTeX: i$ is simply a transshipment node, and the balance equation says that the flow into node $LaTeX: i$ must equal the flow out of node $LaTeX: i.$ Another way to express the node flow balance equations is with the node-arc incidence matrix: $LaTeX: Mf=b.$

Still another representation is to define flow variables, $LaTeX: x(i, j)$ on $LaTeX: \textstyle V \times V,$ which describes how much flow goes from node $LaTeX: i$ to node $LaTeX: j.$ This can be used as long as there are no parallel arcs - i.e., no two arcs have both the same tail and the same head. (In some applications, parallel arcs are needed, such as to increase capacity across a pair of arcs with an increased cost.) In this form, the capacity constraints are still of the form $LaTeX: \textstyle L \le x \le U,$ but the node equations have a different form:

$LaTeX: \sum_j \left \{x(i, j): (i, j) \in A \right \} - \sum_j \left \{x(j, i): (j, i) \in A \right \} = b(i) \mbox{ for all } i \in V.$

The (linear) min cost network flow problem is to minimize total cost, $LaTeX: \textstyle \sum_{ij} \left \{c(i,j)x(i,j): (i,j) \in A \right \},$ where $LaTeX: c$ is a unit cost of flow, subject to the flow bounds and balance equations.

# Neural network

(also called artificial neural network, abbr. ANN). A network where the nodes correspond to neurons and the arcs correspond to synaptic connections in the biological metaphor. The following properties are included:

neural state. Each node has a state variable, say x. In the brain, this could be the potassium level; in computing applications, it could be anything the modeler chooses.
arc weight. Each arc has a weight that affects the state of its neighboring nodes when firing. If the weight is positive, it said to be excitatory; if it is negative, it is inhibitory.
state equations. The neural states change by some differential (or difference) equation, say $LaTeX: \textstyle x' = F(x,w,t).$ Typically (but not necessarily), $LaTeX: -F$ is the gradient of an energy function (in keeping with the biological metaphor), say $LaTeX: \textstyle F(x,w,t) = -\nabla x[E(x,w,t)],$ so that $LaTeX: x(t)$ follows a path of steepest descent towards a minimum energy state.
learning mechanism. This could be equations to change the weights: $LaTeX: \textstyle w' = L(x,w,t).$ Various learning mechanisms are represented this way, including a form of supervised learning that uses a training set to provide feedback on errors. Other elements can be learned besides the arc weights, including the topology of the network.

# New term

Hello test, Freeman

# Newsboy problem

A newspaper is concerned with controlling the number of papers to be distributed to newstands. The cost of a paper varies (i.e., Sunday vs. daily), and the demand is a random variable, $LaTeX: q,$ with probability function $LaTeX: P(q).$ Unsold papers are returned, with no salvage value the next day, losing millions of dollars in the production cost. The demand for newspapers is a random variable, with probability function $LaTeX: P(q)$ = probability that demand equals $LaTeX: q.$ It is possible, however, for a newstand to order more papers the same day. There are holding and shortage (penalty) costs. The problem is to determine a reorder policy so as to minimize total expected cost. This problem was used to consider a reorder policy with a 2-parameter decision rule:

1. $LaTeX: s$ = inventory level at which an order is placed;
2. $LaTeX: S$ = inventory level to which to order.

Then, the decision rule to be employed is the following policy:

Order nothing if the inventory of papers is $LaTeX: \ge s;$
Order $LaTeX: S-s$ if there are s papers on hand and $LaTeX: s < S.$

The significance of this problem is that it was used to introduce the notion (and optimality) of an $LaTeX: (s, S)$ policy in inventory theory.

# Newton method

(also called Newton-Raphson method). This is the iterative scheme to find a zero of a function:

$LaTeX: x' = x - J(x)^{-1} F(x),$

where $LaTeX: \textstyle F:\mathbb{R}^n \to \mathbb{R}^n$ $LaTeX: \textstyle (F \in C^1)$ and $LaTeX: J(x)$ is the jacobian of $LaTeX: F$ (assumed nonsingular). In mathematical programming this is used to find a stationary point, where $LaTeX: F=\nabla f$ and $LaTeX: J=H_f.$ Lacking global convergence, this leads to the modified Newton method, sometimes called the damped Newton method. (See the associated myth, [[./myths/myth-NLP.html#13|Myth NLP-13]] to avoid misconception.)

# No-free-lunch theorem

In heuristic search, this is the proposition that all methods perform the same when averaged over all possible objective functions. The idea is that a particular search algorithm, like simulated annealing, may be designed to perform especially well for some functions, compared to a genetic algorithm, but when applied to a representative sample of all possible costs, they will perform exactly the same, on the average. This implies that to do especially well on some problems, the search method must do worse on others; hence, the "no-free-lunch" description.

# No-good

A partial assignment of variables to values that does not lead to a solution. In other words, there does not exist a solution to the overall problem that satisfies all the assignments in the no good.

In a backtracking search to find a solution, each dead-end corresponds to a no good. However, where no-goods become useful is when they can be learned and added to the constraint problem as implied constraints that remove many dead-ends that would otherwise have to be searched over.

In particular, no-good learning and reasoning are very important for modern techniques to solve SAT problems.

# Node consistency

A simple consistency property concerning unary constraints: a variable is node consistent if all values within its domain are consistent with the all unary constraints on the variable.

# Node packing problem

See Packing problem.

# Nonbasic

A variable or column that is not basic.

# Nondegeneracy

A nondegenerate solution is one that is not degenerate. A nondegeneracy assumption is one that ensures every optimal solution is not degenerate. In general, a nondegenerate solution is strictly complementary. In linear programming, nondegeneracy is equivalent to uniqueness in both the primal and the dual.

# Nonlinear program

At least one of the functions (objective or constraint) is not affine on $LaTeX: X.$ (Usually, it is the rule of the function that classifies it as nonlinear. In particular, a linear integer program is not generally considered to be a nonlinear problem (NLP).

# Norm

This is a function of a vector, say $LaTeX: ||x||,$ that satisfies three properties:

1. Homogeneous: $LaTeX: \textstyle ||tx|| = |t| ||x|| \mbox{ for all (scalars)}, t.$
2. Positive: $LaTeX: \textstyle ||x|| > 0 \mbox{ for } x \ne 0.$ (Note: $LaTeX: ||0|| = 0$ by homogeneity, so $LaTeX: 0$ is the unique vector with zero norm.)
3. Subadditive: $LaTeX: \textstyle ||x + y|| \le ||x|| + ||y||.$

Norms that arise frequently in mathematical programming are:

Euclidean norm (on $LaTeX: \mathbb{R}^n$): $LaTeX: \textstyle ||x|| = \sqrt{\sum_j x_j^2}$

L_inf (on $LaTeX: \mathbb{R}^n$): $LaTeX: \textstyle ||x|| = \max_j\{|x_j|\} (= \lim L_p \mbox{ as } p \to \infty)$

L_p $LaTeX: \textstyle (\mbox{on } \mathbb{R}^n, \mbox{ for } p \ge 1): ||x|| = [\sum_j |x_j|^p]^{1/p}$

Matrix norm (induced by vector norm): $LaTeX: \textstyle ||A|| = \max \left \{||Ax||: ||x||=1 \right \}$

Supremum norm (on function space): $LaTeX: \textstyle ||F|| = \sup \left \{|F(x)|: x \in X \right \}$

# Normal cone

For a convex set $LaTeX: S,$ the normal cone to $LaTeX: S$ at $LaTeX: x,$ for $LaTeX: \textstyle x \in S,$ is $LaTeX: \textstyle \left \{y : \le 0 \mbox{ for all } v \in S \right \},$ where $LaTeX: < * >$ is an inner product.

# Northwest corner rule

This constructs a feasible shipment for the transportation problem, from the cost table ($LaTeX: c_{ij}$ = unit cost from source $LaTeX: i$ to destination $LaTeX: j$), as follows. Start at the NW corner (source 1, destination 1), and allocate the most possible: the min of the supply and demand. If supply exceeds the demand, proceed to the next destination, and continue until all of supply 1 is allocated. Then, go to source 2 and repeat the allocation process, starting with the first (lowest index) destination whose demand has not been fulfilled. If demand exceeds the supply, proceed to the next source, and continue until all of demand 1 is allocated. Then, go to destination 2 and repeat the allocation process. Eventually, all demand must be fulfilled if the problem is feasible (i.e., total supply $LaTeX: \ge$ total demand).

# Null space

Null space of matrix $LaTeX: A.$ $LaTeX: \textstyle \left \{x: Ax=0 \right \}.$ It is the orthogonal complement of its row space, $LaTeX: \textstyle \left \{x: x=yA \mbox{ for some } y \in \mathbb{R}^m \right \}.$ Its dimension is $LaTeX: n - \mbox{rank}(A),$ which complements the subspace spanned by its row space.

# Numeric CSP

A constraint satisfaction problem is numeric if the constraints are numeric relations (e.g., log, sin, power, etc.) and the decision variables' domains are continuous.

# Objective function

The (real-valued) function to be optimized. In a mathematical program in standard form, this is denoted $LaTeX: f.$ Also see multiple objectives.

# Online problem

A decision problem that is ongoing, so its data values are frequently changing with uncertainty. Here are some examples:

1. Allocation of bandwidth during video conferencing on a telecommunications network;
2. Investment decisions in a portfolio, selling and buying as the market changes;
3. Scheduling jobs as they arrive;
4. Load forecasting and dispatching for electric power.

In general, an online decision problem could be the allocation of resources that is done in real time with changes in the amount of resources, their costs and the demands for what they produce. A problem that is not online is sometimes called an offline problem.

# Optimal

For a mathematical program in standard form, $LaTeX: \textstyle x^* \in X$ (the domain) is an optimal solution if it is a maximum (or a minimum):

1. $LaTeX: x^*$ is feasible;
2. $LaTeX: f(x^*) \ge f(x)$ for all feasible $LaTeX: x$ (maximum value).

Some authors refer to an optimal solution when they mean a local optimum; others mean a member of the optimality region (which are global optima). In either case, the optimal value is the objective value, evaluated at an optimal solution. A solution is nearly optimal if it is feasible, but the optimality condition (2) is replaced by

$LaTeX: f(x^*) \ge f(x) - t \mbox{ for all feasible } x,$

where $LaTeX: t > 0.$ ($LaTeX: t = 0$ corresponds to being optimal). Typically, $LaTeX: t$ is specified as a small fraction, such as a cut-off tolerance for an algorithm to terminate finitely.

# Optimal partition

Consider a primal-dual pair of linear programs:

$LaTeX: \min \left \{cx: x \ge 0, Ax \ge b \right \} \mbox{ and } \max \left \{yb: y \ge 0, yA \le c \right \}.$

Define primal surplus variables, $LaTeX: s = Ax - b,$ and dual slack variables, $LaTeX: d = c - yA.$ For any non-negative vector, $LaTeX: v,$ define its support set as those indexes for which the coordinate value is positive: $LaTeX: \textstyle I(v)= \left \{j: v_j > 0 \right \}.$ In an optimal solution, complementary slackness must hold, so that $LaTeX: \textstyle j \in I(x)$ implies $LaTeX: j \notin I(d),$ and $LaTeX: \textstyle i \in I(s)$ implies $LaTeX: i \notin I(y).$ Thus, the intersections, $LaTeX: \textstyle I(x) \land I(d)$ and $LaTeX: \textstyle I(s) \land I(y),$ are empty. If the solution is strictly complementary, these support sets yield partitions because then $LaTeX: \textstyle I(x) \lor I(d) = \left \{1, \dots, n \right \}$ and $LaTeX: \textstyle I(s) \lor I(y) = \left \{1, \dots, m \right \}.$ A strictly complementary solution is an interior solution (and conversely), so each interior solution yields a partition of these index sets.

A key fact is that every LP with an optimal solution must have an optimal interior solution. Further, every optimal interior solution yields the same partition, so it is called the optimal partition.

# Optimal response function

The optimal value of the objective as a function of parameters, typically the right-hand side:

$LaTeX: f^*(b, c) = \sup \left \{f(x): x \in X, g(x) \le b, h(x) = c\right\}.$

# Optimality gap

Generally the difference between a best known solution, e.g. the incumbent solution in mixed integer programming, and a value that bounds the best possible solution. One such measure is the duality gap. The term is often qualified as the absolute gap, which is the magnitude of the difference between the best known solution and the best bound, or as the relative gap, which is the absolute gap divided by the best bound.

# Optimality region

Set of optimal points.

# Optimization Software Library

(OSL). A library of optimization routines, callable from Fortran or C, produced by IBM.

# Optimum

(pl. optima). Minimum or maximum.

See convergence.

# Orthogonal complement

Orthogonal complement of a subspace, $LaTeX: S.$ $LaTeX: \textstyle \left \{y: yx = 0 \mbox{ for all } x \in S \right \}.$ In particular, if $LaTeX: \textstyle S = \left \{x: x = Av \mbox{ for some } v \in \mathbb{R}^n \right \},$ where $LaTeX: A$ is an $LaTeX: \textstyle m \times n$ matrix, its orthogonal complement is $LaTeX: \textstyle \left \{y: yA = 0 \right \}.$

# Orthogonal matrix

A matrix, $LaTeX: A,$ such that $LaTeX: \textstyle AA^T=I.$

# Orthogonal vectors

Two vectors in $LaTeX: \textstyle \mathbb{R}^n$ whose inner product is zero.

# Out-of-kilter algorithm

Applies to network flows, where the balance equations hold every iteration, but bounds and duality (optimality) conditions could be violated. The dual prices, often called potentials in this context, are modified along with flows so as to move closer to both primal and dual feasibility.

# Outer approximation

This solves a sequence of approximations to a mathematical program where the approximating problem contains the original feasible region. Examples are cutting plane algorithms and relaxation.

# Over-constrained problem

A term generally meaning that a mathematical program has too many constraints in that either the feasible region can be described with a subset of the constraints or that it is empty.

# Over-optimize

A term that means the model is too optimistic in expecting an optimal solution to produce the benefits in its objective. For example, a time-staged model generally presumes perfect information over the planning horizon, and can be "over-optimizing" in its allocation of resources.

# P-matrix

A square matrix with all of its principal minors positive. This includes all symmetric, positive definite matrices. Here is an example of a P-matrix that is not positive definite:

$LaTeX: A = \begin{bmatrix} 1 & -3 \\ 0 & 1 \end{bmatrix}$

The principal minors are positive, but $LaTeX: \textstyle (1, 1)A(1, 1)^t < 0.$ The significance of this class is in the theorem:

The linear complementarity problem, defined by $LaTeX: (M, q),$ has a unique solution for each q in Rn if, and only if, $LaTeX: M$ is a P-matrix.

# Packing problem

The set packing problem is the question, "Does a collection of sets contain at least $LaTeX: K$ mutually disjoint subsets?" ($LaTeX: K$ is a positive integer not greater than the number of sets given.) This is solvable in polynomial time if no set has more than 2 members. Otherwise, it has the NP-hard equivalent integer program: $LaTeX: \textstyle \max \left \{ \sum_j x_j: Ax \le 1, x \in \left \{0, 1 \right \}^n\right\},$ where $LaTeX: x_j = 1$ if element $LaTeX: j$ is selected; else, $LaTeX: x_j = 0.$ The matrix $LaTeX: A$ has 0's and 1's, where the i-th row corresponds the i-th set to be packed: $LaTeX: A_{(i, j)}=1$ means element j is in set i; else, $LaTeX: A_{(i, j)}=0.$ The constraint $LaTeX: Ax \le 1$ means that at most one element of each set can be selected. If the IP maximum is less than $LaTeX: K-1,$ or if it equals $LaTeX: K-1 =$ number of elements, the answer to the set packing problem is "No". Otherwise, let $LaTeX: \textstyle x_1, \dots, x_{(K-1)}$ be among those elements selected (re-index, if necessary). Let $LaTeX: S_i$ be all sets containing $LaTeX: x_i.$ Due to the packing constraints, no selected element can be in two sets, so these are disjoint. The remaining set, $LaTeX: S_K,$ is composed of all other elements $LaTeX: \textstyle (x_K, \dots, x_n),$ and must be disjoint from the other sets, so the answer to the set packing question is "Yes" (and we can construct the disjoint subsets).

A special case is the node packing problem on a graph: find a set of nodes of maximum cardinality such that no node is incident with the same edge (or arc). In that case, every row of $LaTeX: A$ has exactly two 1's, and this is sometimes called [[Degree-2_inequality|]]egree-2 inequalities. Another special case is the bin packing problem.

The IP formulation has the natural extension: $LaTeX: \textstyle \max \left \{ cx: Ax \le b, x \in \left \{0,1 \right \}^n \right \},$ where $LaTeX: \textstyle c \ge 0$ and $LaTeX: \textstyle b \ge 1.$ This allows up to $LaTeX: b_i$ elements to be selected from the i-th set, and elements can be weighted by value $LaTeX: (c).$

# Parallel algorithm

This is an algorithm that executes more than one instruction at a time by having a computing environment with more than one processor. The types of parallel computing environments are:

 MIMD: Multiple Instruction/Multiple Data (includes massively parallel); MISD: Multiple Instruction/Sequential Data (unusual); SIMD: Sequential Instruction/Multiple Data (e.g., array processor).

# Parallel tangents

(PARTAN). An algorithm developed from the zigzag phenomenon observed using Cauchy's steepest descent. It takes two gradient steps, then performs a line search on the line through the first and last points $LaTeX: \textstyle (x^{k+2} - x^k).$ (For a quadratic objective, PARTAN is equivalent to the conjugate gradient method.)

# Parameter

A constant in a mathematical program, not subject to choice in the decision problem, but one that could vary outside the control of the decisions. Examples are supplies, demands, loss factors, exponents and coefficients in polynomial functions (of the decision variables). Not all coefficients are parameters, as many are zero by the logic of the model. For example, the only data for a standard transportation problem are the costs, supplies and demands. These can depend upon parameters, but the LP matrix does not – it is the incidence matrix of the network. In general, parameters are data-dependent constants, rather than logically fixed for all instances of the model. Some parameters are simply units of measurement, such as the amount of energy (Btu) in a ton of coal, whereas some parameters are uncertain, like demand for a product.

# Parametric analysis

The sensitivity analysis of parameters, which can be vectors in any designated region.

# Parametric programming

Solving a family of mathematical programs over a parameter space. For example, consider the family of linear programs:

$LaTeX: \min \left \{ cx: Ax = b + td, x \ge 0 \right \},$

where t is a (scalar) parameter and d is a specified direction vector. Starting with $LaTeX: t=0,$ the LP is solved, then an optimal basis is found (if possible) that remains optimal as $LaTeX: t$ is increased. The max value of $LaTeX: t$ is determined; if this is finite, the basis changes to a new optimal basis (for the max $LaTeX: t$ value) such that $LaTeX: t$ can be further increased, if possible. This is continued until either $LaTeX: t$ cannot be increased any further or a basis is found that remains optimal on the interval, $LaTeX: \textstyle [t_k, \infty),$ where $LaTeX: \textstyle 0 < t_1 < t_2 < \dots < t_k$ are the break points of the optimal objective value as a function of the parameter.

# Pareto optimum

The conjugate gradient method is performed for some number of iterations, say $LaTeX: k (< n),$ then it is restarted. (If $LaTeX: k=0,$ this is the special case of Cauchy's steepest descent.)

# Partial quasi-Newton method

A quasi-Newton method is performed for some number of iterations, say $LaTeX: k (< n),$ then it is restarted.

# Partial solution

Some of the variables are specified. This arises in a branch and bound algorithm, where some variables have been fixed by the branching process. It also arises in bidding algorithms.

# Partially ordered set

(or poset). A set plus a binary relation on that set that is reflexive, antisymmetric and transitive. This arises in the presence of precedence constraints, and other relations that arise in combinatorial programs.

# Particle Swarm Optimization

(PSO). This is an evolutionary algorithm based on swarm behavior of animals, like bird flocking. A move from a state is influenced by directions of states in its neighborhood. A consensus function is used to average neighbors' best fitness values, and this is combined with the original state's fitness to obtain a new position for the state. It applies to continuous variables, using a velocity term to derive state increments. The fundamental equation that governs state evolution is

$LaTeX: v_p = wv_p + A r (x^*_p - x_p) + B r' (g^* - x_p)$

where p is a particle, or state, and

 $LaTeX: v_p =$ velocity vector of particle $LaTeX: p$ $LaTeX: x_p =$ position vector of particle $LaTeX: p$ $LaTeX: x^*_p =$ previous position of particle $LaTeX: p$ giving best fitness value $LaTeX: g^*_p =$ position of globally best fitness value $LaTeX: w$ = parameter, called "inertia weight" $LaTeX: r, r'$ are pseudo-random numbers in $LaTeX: [0,1]$ $LaTeX: A, B =$ positive parameters, generally in $LaTeX: [1,2]$

# Partitioning problem

This is the combination of packing and covering. Its IP form is $LaTeX: \textstyle \mbox{Opt}\left \{cx: Ax=b, x \in \left \{0,1 \right \}^n \right \},$ where Opt could be Min or Max, and $LaTeX: A$ is a 0-1 matrix. The equation $LaTeX: \textstyle A(i,.) x = b_i$ means that exactly $LaTeX: b_i$ elements must be selected from set $LaTeX: i.$ In particular, a multiple choice constraint is $LaTeX: \textstyle \sum_j x_j = 1.$

# Path

As a (differentiable) curve, see path following. As a finite sequence, see graph or network.

# Path consistency

A consistency property that is similar to arc consistency but that considers pairs of variables instead of just one. A pair of variables $LaTeX: (x,y)$ is path-consistent with another variable $LaTeX: z$ if for every consistent value pair for the binary constraint on $LaTeX: x$ and $LaTeX: y$ there exists a value for $LaTeX: z$ such that all binary constraints between $LaTeX: ({x},z)$ and $LaTeX: (y,z)$ are satisfied.

# Path following

In this context a path is a piecewise differentiable curve in space. The idea is to follow such a path (if one exists) from some initial point to a solution. One example of a path following algorithm is the barrier penalty function, with the path created by the parameter $LaTeX: u$ in the solution, $LaTeX: x^*(u)$ in $LaTeX: \textstyle \argmax \left \{f(x) + uP(x): x \in X^0\right\},$ where $LaTeX: X^0$ is the strict interior of the feasible region. (This is called the central path, or the path of the analytic center.)

# Pattern search

This is a heuristic (in the sense of not guaranteeing convergence) for unconstrained optimization. It consists of two basic steps, where $LaTeX: x$ is a current point and $LaTeX: w$ is a vector of widths (both having been initialized).

1. Exploration move. Set $LaTeX: y=x$ and do for $LaTeX: \textstyle j=1,\dots,n:$
If $LaTeX: \textstyle f(y + w_j e_j) > f(y),$ replace $LaTeX: y$ with $LaTeX: y + w_j e_j.$
If $LaTeX: f(y - w_j e_j) > f(y),$ replace $LaTeX: y$ with $LaTeX: y - w_j e_j.$
If $LaTeX: x=y$ at the end of this, go perform a pattern move. Otherwise, set $LaTeX: v = y-x$ (the change) and $LaTeX: x = y$ (move to better point).
2. Pattern move. If $LaTeX: f(x + v) > f(x),$ replace $LaTeX: x$ with $LaTeX: x+v.$ Otherwise, reduce widths and return to exploration move.

The idea is to perform exploration moves as long as they are successful. Then, saving the last direction $LaTeX: (v)$ when it was successful, this is used to advance $LaTeX: x$ if it is an improvement, giving a new base point $LaTeX: (x)$ for the next series of exploration moves. Termination occurs when the widths become too small.

# Penalty function

The traditional concept is to augment the objective with a function and one positive constant, so that the original mathematical program is replaced by solving a parametric family of the form $LaTeX: \textstyle \max \left \{f(x) - uP(x): x \in X^0\right\}.$ The function, $LaTeX: P,$ is called a penalty function if it satisfies $LaTeX: P(x) > 0$ for $LaTeX: x$ not feasible and $LaTeX: P(x)=0$ if $LaTeX: x$ is feasible. The set $LaTeX: X^0$ depends on the type of penalty function, and there are two classical types, each requiring $LaTeX: P$ to be continuous: interior (or barrier) and exterior. A penalty function is exact if there exists a finite parameter value such that its maximum is a solution to the original mathematical program.

More generally, a penalty function could involve many parameters, such as the Lagrangian, or it could be stated in parameter free form. A general model is the generalized penalty-function/surrogate dual, see the supplement on duals. The notion of an exact penalty function leads to a strong dual.

# Perturbation

An imprecise term that means changing some parameter value (or function) and seeing what effect this has on the solution. This embodies continuity and differentiability properties of the optimal responses: objective value, policy set, and perhaps dual values too. This is what is sometimes called marginal analysis of solutions, and changes are limited to some neighborhood of the initial values. Some use perturbation function to mean the optimal objective value, even for large changes in the parameters, as in parametric programming.

# Phase I and Phase II

Phase I of a mathematical program is finding a feasible solution, and Phase II is entered with a feasible solution to find an optimal solution. The standard Phase I is:

$LaTeX: \max \sum_i \{u_i\} + \sum_i \{v_i + w_i\}: u, v, w \ge 0,$

$LaTeX: g(x) - u \le 0, h(x) - v + w = 0.$

Then, for any $LaTeX: x,$ one could set $LaTeX: \textstyle u = g(x)^+, v = h(x)^+ and w = -h(x)^-$ to have a feasible solution to the Phase I mathematical program. The optimal objective value is 0 if, and only if, $LaTeX: u=0$ and $LaTeX: v=w=0,$ in which case $LaTeX: x$ is feasible in the original mathematical program (to commence Phase II). The Phase I problem is sometimes called an elastic program (thinking of the artificial variables $LaTeX: (u,v,w)$ as providing "elastic" behavior in the levels of constraint violation).

In linear programming, the standard Phase I & II are:

I. Min ev: $LaTeX: \textstyle x, v \ge 0, Ax + v = b,$ and
II. Max cx: $LaTeX: \textstyle x \ge 0, Ax = b, \mbox{ where } b \ge 0 \mbox{ (multiply by } -1 \mbox{ for } i: b_i < 0.$

# Pivot

This is the algebra associated with an iteration of Gauss-Jordan elimination, using the forward transformation. The tableaux for a pivot on element $LaTeX: a(p,q)$ $LaTeX: (\ne 0),$ which means nonbasic variable $LaTeX: x_q$ enters the basis in exchange for basic variable $LaTeX: x_p,$ are as follows:

Before pivot:

$LaTeX: \begin{array}{ll|ll} Basic & & \multicolumn{2}{c}{Nonbasic} \\ Var. & Level & x_j & x_q \\ \hline \hline x_i & b_i & a(i,j) & a(i,q) \\ x_p & b_p & a(p,j) & a(p,q)^* \\ \hline \hline obj & -z & d_j & d_q \\ \hline \hline \end{array}$

After pivot:

$LaTeX: \begin{array}{ll|lr} Basic & & \multicolumn{2}{c}{Nonbasic} \\ Var. & Level & x_j & x_q \\ \hline \hline x_i & b_i - b_p a(i,q)/a(p,q) & a(i,j) - a(i,q)a(p,j)/a(p,q) & -a(i,q)/a(p,q) \\ x_q & b_p/a(p,q) & a(p,j)/a(p,q) & 1/a(p,q) \\ \hline \hline obj & -a - b_p d_q/a(p,q) & d_j - a(p,j)d_q/a(p,q) & -d_q/a(p,q) \\ \hline \hline \end{array}$

A pivot is primal degenerate if the associated basic solution $LaTeX: (x)$ does not change (i.e., the nonbasic variable enters the basis, but its level remains at the same bound value, in which case no basic variable changes level). Similarly, the pivot is dual degenerate if the associated dual solution (i.e., pricing vector and reduced costs) does not change. For dealing with degenerate pivots, see Bland's rule and the TNP rule.

# Pivot selection

In the simplex method, this is a tactic to select a basis exchange. The incoming column is based on its effect on the objective function, and the outgoing column is based on its effect on feasibility.

# Point-to-set map

Image of map is a set. If $LaTeX: \textstyle F:X< \to 2^Y,$ this means $LaTeX: F(x)$ is a subset of $LaTeX: Y$ for each $LaTeX: x \in X.$ An example is the feasibility map.

# Pointed cone

A convex cone, $LaTeX: C,$ such that $LaTeX: \textstyle C \cap (-C) = \{0\}.$ An example is $LaTeX: \textstyle C=R^{n+}$ (so $LaTeX: \textstyle -C=\mathbb{R}^{n-}$).

# Polar cone

Polar cone, of a set $LaTeX: \textstyle S \in \mathbb{R}^n. \left \{(y, y^0)\in \mathbb{R}^{n+1}: yx \le y^0 \mbox{ for all } x \in S \right \}.$

# Policy iteration

This is an algorithm for infinite horizon dynamic programs (generally stochastic) that proceeds by improving policies to satisfy the fundamental equation:

$LaTeX: F(s) = r^*(s) + \sum_{s'} P(s,s') F(s'),$

where $LaTeX: F(s)$ is the maximum expected value when starting in state $LaTeX: s, r^*(s)$ is the immediate expected return when in state $LaTeX: s$ and following an optimal policy (a decision rule), and $LaTeX: P(s,s')$ is probability of a transition from state $LaTeX: s$ to state $LaTeX: s'$ in one time period.

The algorithm has some policy, $LaTeX: x^*(s),$ at the beginning of an iteration. This determines an approximation of $LaTeX: F,$ which is exact if the equation holds. If the equation is violated, the violation identifies how the policy can be improved. This changes the policy for the next iteration. Convergence depends upon the underlying Markov process (e.g., whether it is ergodic). Another approach is value iteration.

# Polyhedral annexation

This is a cutting plane approach to find an optimal solution known to lie at an extreme point of a polyhedron, $LaTeX: P.$ The general algorithm is to start at some extreme point and solve the polyhedral annexation problem. This will result in ascertaining that the extreme point is (globally) optimal, or it will generate a recession direction from which a convexity cut is used to exclude the extreme point. The approach generates a sequence of shrinking polyhedra by eliminating the cut region. Its name comes from the idea of annexing a new polyhedron to exclude from the original, homing in on the extreme point solution.

Notes:

When applied to reverse convex programming, the convex set $LaTeX: (S)$ in the polyhedral annexation problem is the objective level set, $LaTeX: \{x: f(x) \le f(x^*)-t\},$ where $LaTeX: {t} > 0$ is an optimality tolerance, and $LaTeX: x^*$ is the best (extreme point) solution so far. The initial polyhedron is the feasible region, so this is an inner approximation strategy.

When applied to mixed integer programming, the original polyhedron is the LP relaxation, and the convex set $LaTeX: (S)$ for the polyhedral annexation problem can vary according to the particular problem (exploiting structure, especially to find facets of the feasible region).

# Polyhedral annexation problem

Given a convex cone $LaTeX: C,$ a polytope $LaTeX: P$ contained in $LaTeX: C,$ and a compact, convex set $LaTeX: S$ with 0 in int$LaTeX: (S),$ find a point $LaTeX: y$ in $LaTeX: P\backslash S,$ or ascertain that $LaTeX: P$ is contained in $LaTeX: S.$

# Polyhedron

(pl. polyhedra). A set that equals the intersection of a finite number of halfspaces. This is generally written as $LaTeX: \textstyle \left \{x: Ax \le b \right \},$ where the representation $LaTeX: (A,b)$ is not unique. It is often useful to separate the implied equalities: $LaTeX: \textstyle \left \{x: Ax \le b, Ex = c \right \},$ so that the relative interior is $LaTeX: \textstyle \left \{x: Ax < b, Ex = c \right \}.$ The system, $LaTeX: \textstyle \left \{Ax \le b, Ex = c \right \},$ is a prime representation if it is irredundant, and it is minimal if it is irredundant and contains no implied equality. A polyhedron is degenerate if it contains an extreme point that is the intersection of more than $LaTeX: n$ halfspaces (where $LaTeX: n$ is the dimension of the polyhedron). An example is the pyramid.

# Polymatroid

Let $LaTeX: N$ be a finite set and let $LaTeX: f$ be a submodular function on the subsets of $LaTeX: N$ with $LaTeX: f(\emptyset)=0.$ The polymatroid associated with $LaTeX: (N,f)$ is the polytope:

$LaTeX: \left \{x \in \mathbb{R}^{n+}: \sum_{j \in S} x_j \le f(S) \mbox{ for all } S \in N \right \}.$

# Polytope

A bounded polyhedron.

# Pooling of inventory

When there are two or more demand points for a product, it may possible to save money if the separate inventories for these demand points can be combined or "pooled." There is a "square root economy of scale" from pooling for two of the simplest inventory models: a) the Economic Order Quantity (EOQ) model, and b) the Newsboy (NV) model.

For the EOQ case, where we want to minimize the fixed charge of ordering plus the holding cost of inventory carried, if we have two separate inventories for two demand points, each with constant demand rate $LaTeX: D,$ fixed charge of ordering $LaTeX: K,$ and holding cost/unit of inventory $LaTeX: h,$ then the minimum total cost is $LaTeX: \textstyle \sqrt{2\, D\, K\, h}$ at each demand point for a total cost of $LaTeX: \textstyle 2\, \sqrt{2\, D\, K\, h}.$ If we combine the two demand streams and carry a single inventory for them, then the minimum total cost is $LaTeX: \textstyle \sqrt{2\, (2\, D)\, K\, h} = 2\, \sqrt{D\, K\, h},$ which decreases the total cost by a factor of $LaTeX: \textstyle 1/\sqrt{2}.$

For the NV case, if we have two demands, each independently Normal distributed with mean $LaTeX: D$ and standard deviation $LaTeX: \sigma,$ a specified service level $LaTeX: \textstyle \alpha = \prob \left \{\mbox{Demand} < \mbox{inventory}\right\}$ at an inventory point, and $LaTeX: z_\alpha$ being the number of standard deviations corresponding to a left tail probability of $LaTeX: \alpha,$ then we must carry inventory $LaTeX: \textstyle D + z_\alpha \sigma$ at each inventory point for a total inventory of $LaTeX: \textstyle 2D + 2z_\alpha \sigma.$ If we can combine the two demands so we carry just one inventory, then the mean and standard deviation for the combined demands are $LaTeX: 2D$ and $LaTeX: \sqrt{2}\sigma.$ Thus, the total inventory we must carry is $LaTeX: \textstyle 2D + \sqrt{2}z_\alpha\sigma,$ and there is a square root economy of pooling in the amount of safety stock we must carry. If we say that the system service level should be $LaTeX: \alpha,$ then the benefits of pooling are even greater.

There is an interesting anomaly wherein pooling can increase inventory. Specifically, for the NV problem, if instead of there being a constraint on service level, the objective is simply to minimize the expected cost of unsatisfied demand and the cost of surplus inventory, and the demand distributions at each of two demand points (with associated inventory) are right skewed, e.g., the mean is greater than the median, then pooling the two inventories may in fact result in a higher total optimal inventory. For example, if the cost/unit short is 5, the cost/unit of inventory left over is 4, the demands at each of two demand points are independent Poisson distributed with mean 3.3, then the optimal amount to stock at each location is 3. If, however, the demands are pooled and a single inventory is carried so the demand is Poisson with mean 6.6, then the optimal amount to stock is 7. So pooling increased the optimal inventory. See Myths and Counter Examples for an additional example.

# Pooling problem

A type of blending problem, as follows.

Given:
1. A set of sources, indexed by $LaTeX: \textstyle 1,\dots,N_s.$ Each source supplies raw feeds in limited amounts, such as crude oil.
2. A set of attributes, indexed by $LaTeX: \textstyle 1,\dots,N_a.$ Each supply contains one or more attributes. An example of an attribute is the percent of sulfur in a crude oil supply.
3. A set of products, indexed by $LaTeX: \textstyle 1,\dots,N_d,$ for which there is demand; each product has a quality defined by its attributes. For example, if percent sulfur is an attribute, each refined petroleum product has a range of percent sulfur: 1.5% sulfur is a higher quality product than 2.5% percent sulfur.
4. A set of pools, indexed by $LaTeX: \textstyle {1},\dots,N_p.$ Each pool is formed by inflows from sources, whose attributes are linearly mixed to determine the attributes of the pool, and hence its quality. Pools combine to form products (see Flow of Materials below). For example, suppose a pool receives flow values $LaTeX: x$ and $LaTeX: y$ from two sources whose attribute values are $LaTeX: A$ and $LaTeX: B,$ respectively. Then, the pool's attribute value is the average: $LaTeX: \textstyle (Ax+By)/(x+y).$
                            Flow of Materials
~~~~~~~~~~~~~~~~~
Sources          Pools         Products
S(s,p)         P(p,d)
SUPPLY --->(s)------------->(p)------------>(d)--->DEMAND



The objective is to minimize cost, which is the sum of source flow costs, subject to four types of constraints:

1. Limited supplies: $LaTeX: \frac{\sum_{1} S(s,1)}{\mbox{Total flow out of source } s \mbox{ (into pools)}} \le \mbox{SUPPLY}(s)$

2. Balances at pools: $LaTeX: \frac{\sum_s S(s,p)}{\mbox{Total flow into pool (from source)}} - \frac{\sum_d P(p,d)}{\mbox{Total flow out of pool (to products)}} = 0$

3. Quality constraints:
$LaTeX: L(p,a) \le Q(p,a) = \frac{\sum_s w(s,a)S(s,p)}{\sum_s S(s,p)} \le U(p,a).$
The numerator is a weighted sum, where the weights (w) are the (given) quality values of attribute a at the sources (s). The denominator is the total flow into the pool (p).

4. Demand requirements: $LaTeX: \frac{\sum_p P(p,d)}{\mbox{Total flow into product (from pools)}} \ge \mbox{DEMAND}(p).$

# Portfolio selection problem

In its elementary form, this is the same as the capital budgeting problem, except that the objective is to minimize the risk, rather than maximize expected return. Let $LaTeX: x_j$ be the percent of capital invested in the j-th opportunity (e.g., stock or bond), so $LaTeX: x$ must satisfy $LaTeX: \textstyle x \ge 0$ and $LaTeX: \textstyle \sum_j x_j = 1.$ Let $LaTeX: v_j$ be the expected return per unit of investment in the j-th opportunity, so that $LaTeX: vx$ is the sum total rate of return per unit of capital invested. It is required to have a lower limit on this rate: $LaTeX: \textstyle vx \ge b$ (where $LaTeX: b$ is between $LaTeX: \min v_j$ and $LaTeX: \max v_j$). Subject to this rate of return constraint, the objective is the quadratic form, $LaTeX: x^TQx,$ where $LaTeX: Q$ is the variance-covariance matrix associated with the investments (i.e., if the actual return rate is $LaTeX: V_j,$ then $LaTeX: \textstyle Q(i,j) = E[(V_i - v_i)(V_j - v_j)].$

# Positive definite matrix

The symmetric matrix $LaTeX: A$ is positive definite if $LaTeX: \textstyle x^T Ax > 0$ for all nonzero $LaTeX: x.$ Equivalently, the matrix is positive definite if all its eigenvalues are positive.

Also see Positive semi-definite matrix

# Positive semi-definite matrix

The symmetric matrix $LaTeX: A$ is positive semi-definite if $LaTeX: \textstyle x^T Ax \ge 0$ for all $LaTeX: x.$

Also see positive definite matrix.

# Posynomial

A positive sum of monomials: $LaTeX: \textstyle \sum_i c(i) \prod_{j} [x(j)^{a(i,j)}],$ where $LaTeX: c > 0.$ Each monomial is the product,

$LaTeX: \prod_j [x(j)^{a(i,j)}] = x(1)^{a(i,1)} \times x(2)^{a(i,2)} \times \dots \times x(n)^{a(i,n)},$

and $LaTeX: [a(i, j)]$ is called the exponent matrix. This is the fundamental function in geometric programming.

Example: $LaTeX: \textstyle x(1)x(2) + 2x(3)/x(1)^{2}$ is a posynomial with 2 monomial terms and 3 variables. The exponent matrix is 2 by 3, showing the exponent in each monomial (row) of each variable (column):

$LaTeX: \begin{bmatrix} 1 & 1 & 0 \\ -2 & 0 & 1 \end{bmatrix}$

# Pre-processing

Generally the same as presolve, but the purpose need not be to speed up optimization. Instead, it could be aimed at deepening one's knowledge about the mathematical program, such as what is forced by implication of the constraints versus what is determined by the economic trade-off.

# Precedence constraint

When ordering objects, like jobs to to be performed, this is a constraint that restricts the order: i must precede j, denoted $LaTeX: \textstyle i << j.$ If order really means time, and if the model has decision variables $LaTeX: t_i$ and $LaTeX: t_j$ to denote the start times of $LaTeX: i$ and $LaTeX: j,$ resp., this precedence constraint can be written as $LaTeX: \textstyle t_j \ge t_i + T_i,$ where $LaTeX: T_i$ is the time job $LaTeX: i$ takes. On the other hand, a precedence constraint need not correspond to real time. For example, $LaTeX: i << j$ could mean if project $LaTeX: j$ is not selected, we cannot select project $LaTeX: i$. In that case, suppose the model has binary variables $LaTeX: x_i$ and $LaTeX: x_j,$ where $LaTeX: x_i=1$ means project $LaTeX: i$ is selected, and $LaTeX: x_i=0$ means it is not selected. Then, the precedence constraint $LaTeX: i << j$ is represented as: $LaTeX: x_i \le x_j.$

# Preconditioning

A generalization of scaling that modifies the hessian so as to improve convergence properties. In the case of quadratic programming, this typically means multiplying the quadratic form matrix, $LaTeX: Q,$ by a preconditioner, $LaTeX: P,$ so that the condition number of $LaTeX: PQ$ is as close to 1 as possible. There are many variations, including splitting a matrix, $LaTeX: Q = P - S,$ to achieve a better conditioned matrix in the sense of its eigenvalue (and eigenspace) structure that governs convergence properties of algorithms like steepest ascent and conjugate gradient.

# Predictor-corrector algorithm

This is a class of algorithms whose origin is in ordinary differential equations. One first "predicts" a solution with either a functional approximation or an algorithm, getting "close" to a solution. Then, this is the initial point for a "corrector" method, such as using Newton's method to get "closer" to the curve. Although it could be used to characterize many methods in mathematical programming, it arises most naturally in path following methods. The term began being widely used in connection with following the central path in an interior point method. One such algorithm is using the affine scaling direction as a predictor and centering as a corrector.

# Presolve

A heuristic applied to reduce the problem in some way before starting an algorithm. In linear programming, for example, one might scan for an equation of the form $LaTeX: x=0,$ then simply fix $LaTeX: x$ at zero, thus reducing the number of variables and constraints. Further details are in the supplement, simplex method, by which each variable is evaluated for its potential to improve the value of the objective function. Let $LaTeX: p = c_B[B^{-1}],$ where B is a basis, and $LaTeX: c_B$ is a vector of costs associated with the basic variables. The vector $LaTeX: p$ is sometimes called a dual solution, though it is not feasible in the dual before termination. $LaTeX: p$ is also called a simplex multiplier or pricing vector. The price of the j-th variable is $LaTeX: c_j - pA_j.$ The first term is its direct cost $LaTeX: (c_j)$ and the second term is an indirect cost, using the pricing vector to determine the cost of inputs and outputs in the activity's column $LaTeX: (A_j).$ The net result is called the reduced cost, and its value determines whether this activity could improve the objective value.

Other pricing vectors are possible to obtain information about the activity's rates of substitution without actually computing $LaTeX: \textstyle r = [B^{-1}]A_j.$ If $LaTeX: \textstyle p = v[B^{-1}],$ then $LaTeX: \textstyle pA_j = vr,$ and $LaTeX: v = c_B$ is the special case to get the reduced cost. Another special case is to obtain information about whether the (nonbasic) activity would need to perform a degenerate pivot. For LP in standard from, let $LaTeX: v_i=1$ if $LaTeX: x_i=0$ and $LaTeX: v_i=0$ if $LaTeX: \textstyle x_i > 0.$ Then, $LaTeX: \textstyle vr = \sum_i \left \{r_i: x_i = 0\right\}.$ Thus, if $LaTeX: pA_j > 0,$ activity $LaTeX: j$ must have a positive tableau value with respect to some basic variable whose level is zero, so the pivot would have to be degenerate.

# Pricing

This is a tactic in the simplex method, by which each variable is evaluated for its potential to improve the value of the objective function. Let $LaTeX: \textstyle p = c_B[B^{-1}],$ where $LaTeX: B$ is a basis, and $LaTeX: c_B$ is a vector of costs associated with the basic variables. The vector $LaTeX: p$ is sometimes called a dual solution, though it is not feasible in the dual before termination. $LaTeX: p$ is also called a simplex multiplier or pricing vector. The price of the j-th variable is $LaTeX: \textstyle c_j - pA_j.$ The first term is its direct cost $LaTeX: (c_j)$ and the second term is an indirect cost, using the pricing vector to determine the cost of inputs and outputs in the activity's column $LaTeX: (A_j).$ The net result is called the reduced cost, and its value determines whether this activity could improve the objective value.

Other pricing vectors are possible to obtain information about the activity's rates of substitution without actually computing $LaTeX: \textstyle r = [B^{-1}]A_j.$ If $LaTeX: \textstyle p = v[B^{-1}],$ then $LaTeX: \textstyle pA_j = vr,$ and $LaTeX: v = c_B$ is the special case to get the reduced cost. Another special case is to obtain information about whether the (nonbasic) activity would need to perform a degenerate pivot. For LP in standard from, let $LaTeX: v_i=1$ if $LaTeX: x_i=0$ and $LaTeX: v_i=0$ if $LaTeX: x_i > 0.$ Then, $LaTeX: \textstyle vr = \sum_i \left \{r_i: x_i = 0 \right \}.$ Thus, if $LaTeX: pA_j > 0,$ activity $LaTeX: j$ must have a positive tableau value with respect to some basic variable whose level is zero, so the pivot would have to be degenerate.

See Degeneracy

# Primal method

An algorithm for which each iterate is feasible in the (primal) mathematical program.

# Primal program

The original mathematical program, cited in the context of having its dual.

# Prime representation

A set represented by an irredundant system of inequalities. (See polyhedron.)

# Principle of optimality

The idea that an optimal path, or trajectory, is composed of optimal sub-paths. Equivalently, once we reach a particular state in a sequential decision process, the remaining decisions must be optimal with respect to that state. This is the foundation of dynamic programming.

# Problems

There are various problems that have evolved that are popularly cited by name.

# Product mix problem

To determine what mix of products to make from shared resources. For example, if a company has apples and cranberries, one could make a certain amount of apple juice, cranberry juice, cranapple juice, and applecran juice, each differing in the percentages of apples and cranberries. More extensive examples abound to illustrate one of the early, fundamental applications of linear programming.

# Production scheduling problem

To determine levels of production over time. Constraints include demand requirements (possibly with backordering), capacity limits (including warehouse space for inventory), and resource limits. One basic model is as follows.

Let

$LaTeX: x_t =$ level of production in period t (before demand);
$LaTeX: y_t =$ level of inventory at the end of period t;
$LaTeX: U_t =$ production capacity in period t;
$LaTeX: W_t =$ warehouse capacity in period t;
$LaTeX: h_t =$ holding cost (per unit of inventory);
$LaTeX: p_t =$ production cost (per unit of production);
$LaTeX: D_t =$ demand at the end of period t.

Then, the mathematical program is

$LaTeX: \min \{ p^T x + h^T y : 0 \le (x, y) \le (U, W), \, y_{t+1} = y_t + x_t - D_t \mbox{ for } t=0,\dots,T \}.$

$LaTeX: y(0)$ is the given initial inventory, and $LaTeX: T$ is the planning horizon.

The condition that $LaTeX: y \ge 0$ means there is no backordering. Other tacit assumptions, which could be relaxed to gain more scope of the model are as follows.

• Letting quantities be multiple – e.g., $LaTeX: x_{k,t} =$ level of production of product $LaTeX: k$ at time $LaTeX: t.$ In such cases, competition for warehouse space and other resources result in more equations.
• There could be an ending inventory constraint, such as $LaTeX: y_T=y_0.$
• There could be additional decision variables to allow capacity expansion.
• Costs could be nonlinear functions.
• This could be a stochastic program; for example, with demands not known with certainty.

Also see warehouse problem.

A feasible direction is obtained for the region $LaTeX: \textstyle \left \{x: Ax=b \right \}$ by projecting the gradient of the objective function to the null space of $LaTeX: \textstyle A: d = P \nabla f(x),$ where $LaTeX: \textstyle P = \begin{bmatrix}I - A^T[AA^T]^{-1} & A \end{bmatrix}$ (note $LaTeX: A[Py] = 0$ for all $LaTeX: y,$ so $LaTeX: \textstyle \left \{Py \right \}$ is the null space of $LaTeX: A$). Thus, $LaTeX: x + ad$ is feasible for all feasible $LaTeX: x$ and all $LaTeX: a > 0$ (since $LaTeX: Ad=0$). Further, if $LaTeX: \textstyle P \nabla f(x) \ne 0,$ $LaTeX: f(x+ad) = f(x) + a \nabla f(x)' P \nabla f(x) > f(x)$ for $LaTeX: \textstyle a > 0,$ so the objective value improves each iteration until its projected gradient is null. At that point where $LaTeX: \textstyle P \nabla f(x)=0, x$ satisfies first-order conditions.

# Proper optimum

A local optimum that is uniquely optimal in some feasible neighborhood.

# Pseudo-boolean function

This maps $LaTeX: \textstyle \left \{0,1 \right \}^n \to \mathbb{R}.$

# Pseudo-boolean program

$LaTeX: \textstyle X = \left \{0,1 \right \}^n$ and all functions are pseudo-boolean.

# Pseudo-inverse

(of a matrix). Same as generalized inverse.

# Pseudo-monotone function

$LaTeX: \textstyle f: \mathbb{R}^n \to \mathbb{R}^n$ is pseudo-monotone over $LaTeX: X$ (subset of $LaTeX: \textstyle \mathbb{R}^n$) if

$LaTeX: f(y)^T (x - y) \ge 0 \mbox{ implies } f(x)^T (x - y) \ge 0 \mbox{ for all } x, y \in X.$

The gradient of a pseudoconvex function is a pseudo-monotone function.

# Pseudoconcave function

Negative is pseudoconvex.

# Pseudoconvex function

$LaTeX: f$ is in $LaTeX: C^1$ and satisfies the property: $LaTeX: \textstyle \nabla f(x)(y-x) \ge 0$ implies $LaTeX: \textstyle f(y) \ge f(x).$ While every differentiable