All Pages
From Glossary
100% Rule 
This pertains to sensitivity analysis in linear programming. In its original form, it uses the convexity of the set of admissible changes in the rim data to test whether a particular change is admissible: any combination of changes can occur as long as the total percentage deviation from the coordinate extremes does not exceed 100%. (Note: this applies to righthand sides (b) and costs (c) separately.)
More generally, suppose the optimal value remains constant if the cost coefficient vector in a linear program is replaced with any of (we could have and let be the jth coordinate extreme for , but that is not necessary). Then, the optimal objective value is the same for , provided and
The same applies for convex combination of changes in the righthand side (maybe with the origin, which is no change). If the objective value remains optimal at if is replaced with any of , then it is also optimal for the righthand side so long that and
ABS algorithm 
This is a family of algorithms to solve a linear system of equations, , with variables such that . Its distinguishing property is that the kth iterate, , is a solution to the first equations. While algorithms in this class date back to 1934, the family was recognized and formally developed by J. Abaffy, C.G. Broyden and E. Spedicato, so the algorithm family name bears their initials. The ABS algorithm is given by the following.
Notation: is the ith row of .
 Initialize: choose , so that is it nonsingular, set
 Set search direction for any that does not satisfy . Compute residuals: .
 Iterate: , where is the step size: .
 Test for convergance (update residuals, if used in test): Is solution within tolerance? If so, stop; else, continue.
 Update: , where is a rank 1 update of the form for such that .
 Increment: let and go to step 1.
The ABS algorithm extends to nonlinear equations and to linear diophantine equations.
AIMMS 
AIMMS is Advanced Integrated Multidimensional Modeling Software.
AMPL 
AMPL is A Mathematical Programming Language.
ANALYZE 
A computerassisted analysis system to analyze mathematical programs, including an artificially intelligent level of support.
Abstract program 
This is a mathematical program defined on an abstract vector space (generally a Banach space). It unifies the ordinary mathematical program with control theory.
Active constraint 
An inequality constraint that holds with equality (at a point). Same as tight, but not to be confused with binding. The set of all active constraints generally includes equality constraints as well.
Active set method 
A technique that partitions inequality constraints into active constraints (or sufficiently close to be deemed active) and inactive constraints and then ignores the inactive constraints for the iteration. The active set for this iteration is sometimes called the working set. The new point is selected by moving on the surface defined by the working set (called the working surface).
Activity analysis 
An approach to microeconomics for which a system is composed of elementary functions, called activities, which influenced the early growth of linear programming. An insightful way to view an activity in a standard form linear program is that negative coefficients represent input requirements and positive coefficients represent outputs:
In general, the reduced cost represents the net worth of the activity for the prices ,
This leads to an economic interpretation of not only linear programming but also of the simplex method: agents (activity owners) respond instantaneously to changes in prices, and the activity with the greatest net revenue wins a bid to become active (basic), thus changing the prices for the next time (iteration).
In this context, activities can be regarded as transformations, from inputs to outputs. Three prevalent transformations are: form, place and time.
Here are examples:
Acyclic 
Means without cycles. Arises in graphs and networks, as well as in the context of algorithms (e.g., not cycling back to some previous solution).
Adjacency matrix 
The binary matrix, say , that represents node adjacency in a simple graph: if node is adjacent to node . In the undirected case, is symmetric. In the directed case, the adjacency means there is an arc from to . In the case of a multigraph, is the number of edges whose endpoints are nodes and .
Adjacent basis 
Two bases are adjacent if they differ by exactly one column (so one is reachable from the other by a single pivot).
Adjoint 
The classical adjoint of a matrix is the transpose of the matrix of cofactors
where is the determinant of the submatrix of formed by removing the ith row and the jth column. The Hermitian adjoint (conjugate transpose or Hermitian transpose), , is the transpose of the conjugate. We have that if is realvalued.
Admissible 
Generically means something that satisfies conditions. One example is the admissibility of a direction of change of some parameter, such as a righthand side of a linear program (LP) from to , where is some scalar and is the direction of change. For a positive value of , this might cause the LP to become infeasible, no matter how small it is. In that case it is common to call the direction vector inadmissible. For a perturbation to be admissible in this context, it is required that the mathematical program have a solution in some neighborhood of that direction. For the example, is admissible if the (primal) LP is feasible for all for some .
Admissibility need not be for directions of change in the context of sensitivity analysis. The term is used elsewhere and defined for the context in which it is used. For example, a primal ascent algorithm could define a direction, say , to be admissible if is feasible and for all for some .
Advanced basis 
Initiating the simplex method with a known feasible basis instead of with logical variables as in Phase I. The advanced basis could be optimal for some parameter values, but those values have changed. It is usually better to start with an advanced basis because fewer iterations are generally required to reach the (new) optimal basis.
Affine combination 
The affine combination of is a linear combination such that .
Affine function 
Let . Then, is affine if is a convex set and
for all and .
Equivalently, is affine if it is both convex and concave. Moreover, if , is the translation of a linear function: .
Affine hull 
The affine hull of a set is the intersection of all affine sets containing . Equivalently, the set of all affine combinations of points in the set.
Affine independence 
The set is affinely independent if their affine hull is kdimensional. Equivalently, is linearly independent. For n=2, any two distinct vectors are affinely independent (k=1). Three vectors need to form a triangle; otherwise, they are colinear and are affinely dependent.
Affine scaling 
A variant of Karmarkar's algorithm for the linear program in standard form: , where has full row rank.
Here are the basic steps (of the primal method).
 Given , let .
 Estimate dual: and .
 Move: , where is a parameter in .
Multiplication of and by is a scaling operation, using current levels as scale values i.e. multiplies column of by , and multiplies by . Note that the reduced cost is in the null space of i.e. , so is a feasible direction. This affine scaling direction is used in its own right in interior methods. It minimizes the duality gap subject to and in an ellipsoid (replacing ).
Affine set 
(or affine manifold, affine variety, linear variety, flat.)
A set that contains the line through any two of its points i.e. implies for all . The dimension of an affine set is that of the (unique) subspace parallel to it. Dimensions of 0, 1 and 2 are called points, lines and planes, respectively.
Aggregation 
Aggregation is combining units to reduce problem dimensions. One form of aggregation is to combine basic entities at the data level, such as regions, time periods, and materials. The dimensions of the mathematical program, which include numbers of variables and constraints, are generally reduced by aggregation at the entity level. Another form of aggregation is the use of surrogates, including the strong form: integer equivalent aggregation.
Algorithm 
An iterative method that generates a sequence of the form , where is given (called the initial point). is an algorithm map that yields a set of policies, given the current point, , and is a selection function (in case has more than one policy). Note that the algorithm map and selection function can depend on the iteration number . A concrete example is of the form , where is a scalar parameter, called the [[Step_size]]tep size and is a vector, called the direction of change. The step size may require a selection from a line search if there are many optimal solutions; one is to select the least value (perhaps above some threshold). In practice, algorithms have a variety of tolerances to control its computer representation, progression and termination.
A stochastic algorithm is one whose algorithm map is a random variable. Unlike the meaning in computer science, most people in mathematical programming mean an algorithm whereby an iterate is selected by some random mechanism. One example is simulated annealing. When is a random variable, there is a meaning to convergence that differs from ordinary (deterministic) sequences.
An online algorithm is one that obtains a solution for an online problem. Typically, this is a heuristic that obtains a "good" solution because there is not enough time to guarantee optimality.
Alldifferent constraint 
is a global constraint that is imposed on variables, , stating that every variable is assigned a different value.
In other words, it represents the set of disequalities , however, typically, algorithms for alldifferent propagate far more efficiently than the set of disequality constraints.
Almost complementary 
A pair of vectors, , such that each coordinate, except one, satisfies complementary slackness i.e. for all but one .
Alternative systems 
Two systems of inequalities and/or equations such that there exists a solution to one system, or there exists a solution to the other system, and there cannot exist a solution to both systems. Several alternative systems are widely used in mathematical programming, especially in the study of duality.
 Fourier (1826)
 (not vacuous)
 or
 (not vacuous)
 Gordan (1873)

 or

 Farkas (1902)

 or
 A Variant is

 or

 Fredholm (1903)

 or

 Stiemke (1915)

 or

 Motzkin (1936)
 ( not vacuous)
 or
 ( not vacuous)
 Ville (1938)

 or

 Tucker (1956)

 or

 Gale (1960)

 or

Integer Systems
 Papadimitriou and Steiglitz (1982)
Let be an integer matrix, ,
, and an integer vector.

 or
 .

Convex Systems
For the following, is convex on .
 Fan, Glicksburg and Hoffman (1957)

 or
 for all .

Analytic center 
Given the set, , which we assume is nonempty and compact, such that is concave on , with a nonempty strict interior, its analytic center is the (unique) solution to the maximum entropy problem:
Note that the analytic center depends on how the set is defined  i.e., the nature of , rather than just the set, itself. For example, consider the analytic center of the box, . One form is to have functions as: . In this case, the analytic center is for all , where is the solution to:
Since , the analytical center is what we usually think of as the center of the box. However, the upper bounds could be defined by for all , where (so is concave). This changes the functional definition, but not the set  it's still the unit box. The analytic center is skewed towards the corner because the defining mathematical program is:
The solution is , so the analytic center approaches as .
Ant colony optimization 
A heuristic search approach to solving a combinatorial program based on the behavior of ant colonies, particularly their ability to collectively determine shortest paths through the cumulative affect of pheramones.
Anticycling rule 
A rule that prevents cycling, such as Bland's rule for linear programming.
Approximation algorithm 
An algorithm that reaches a feasible solution to an NPcomplete problem in polynomial time with a guaranteed bound on the quality of the solution. For minimization, with being a positive optimal value, the bound generally has the form:
Arborescent sets 
Any two sets that are either disjoint, or one is a proper subset of the other.
Arc consistency 
A consistency property in constraint programming used as a basis for inference. A variable is arcconsistent with another variable if for every value in the domain of there exists a value in the domain of such that the binary constraint is satisfied. A binary constraint is arcconsistent if both of its variables are arcconsistent with respect to each other.
See also:
Argmax 
The optimality region for maximization:
It is the "argument of the maximum."
Argmin 
The optimality region for a minimization problem:
It is the "argument of the minimum."
Artificial variable 
A variable, say , added to an equation, . The resulting system, , is feasible upon letting for a chosen . The objective function is modified to penalize nonzero values of . Often, is required, multiplying by 1 if necessary. This originated in linear programming, where a Phase I objective is used to find a solution with (or ascertain that the original system has no feasible solution) by minimizing (ignoring the original objective, ).
Assembly line balancing problem 
A combinatorial program. An assembly line consists of work stations, connected by a conveyor belt. Each station performs a set of tasks that takes a cycle time, . Each task must be assigned to exactly one station such that precedence constraints are satisfied. One model minimizes the total cycle time, given the number of stations. Another minimizes the number of stations, given the maximum allowable cycle time. There are variants of this simple form.
Assignment polytope 
The assignment polytope is
The name originates from an interpretation of its extreme points in the assignment problem. If , assign person to task . Each extreme point has the property that each element of is 0 or 1. The sums describing the polytope require that each row () and each column () has exactly one 1 in it. This means every person is assigned to some task, and every task is assigned to be done by one person. The polytope is also the set of doubly stochastic matrices, whose extreme points are the permutation matrices.
Assignment problem 
Choose an assignment of people to jobs so as to minimize total cost. The ordinary model is that of a matching problem. Although the usual assignment problem is solvable in polynomial time (as a linear program), important extensions are the following NPcomplete problems:
 Quadratic assignment (QAP). The objective is a quadratic form, which can have just cross terms, so it need not be convex (in fact, it can be quasiconcave, in which case a solution occurs at an extreme point). The QAP includes the traveling salesman problem as a special case (this is what is used with a neural network model of the TSP).
 Multidimensional assignment. More than 2 indexes, the variables are of the form, . The constraints sum all but one index  for example, the 3index assignment problem is
Asymptotic LP 
A linear program in which the coefficients are functions of a single parameter, usually denoting time (some authors require the functions to be rational i.e. of the form , where and are polynomials). The problem is to find a steady state solution i.e., one that is optimal (or nearly optimal) for all sufficiently large values of the time parameter.
Asymptotic stability 
The system is asymptotically stable in the large if for any initial . Originally developed for differential (and difference) equations, this applies to a sequence generated by an algorithm.
Atleast constraint 
is a global constraint that states the minimum number of occurrences of a particular value among an array/set of variables. Typically it is written as atleast(, , ), which is true if value is assigned at least </math>n</math> times among the variables .
See also atmost constraint.
Atmost1 constraint 
is a global constraint stating that sets of known cardinality should intersect pairwise in atmost one element. If then , and for all .
See also atmost constraint.
Atmost constraint 
is a global constraint that states the maximum number of occurrences of a particular value among an array/set of variables. Typically it is written as atmost(, , ), which is true if value is assigned at most times among the variables .
See also the atmost1 constraint and the atleast constraint.
Auction algorithm 
See bidding algorithm.
Augmented Lagrangian 
The Lagrangian augmented by a term that retains the stationary properties of a solution to the original mathematical program but alters the Hessian in the subspace defined by the active set of constraints. The added term is sometimes called a penalty term, as it decreases the value of the augmented Lagrangian for off the surface defined by the active constraints. For example, the penalty term could be proportional to the squared norm of the active constraints:
where is a parameter (negative for maximization), is the vector of active constraints (at ), and is the usual Lagrangian. In this case, suppose is a KuhnTucker point with multiplier . Then, (since ). Further, the gradient of the penalty term with respect to is at , and the Hessian of the penalty term is simply , where is the matrix whose rows are the gradients of the active constraints. Since is positive semidefinite, the penalty term has the effect of increasing the eigenvalues (decreasing them if maximizing).
Augmenting path 
In network flows, an augmenting path is a path from a source to sink such that every forward arc has flow less than capacity.
Automatic differentiation 
A process for evaluating derivatives of a function that depends only on an algorithmic specification of the function, such as a computer program. (The term "automatic" stems from having the derivatives produced when executing the program to evaluate the function.)
There are two modes:
 Forward  direct parse and evaluation, introducing intermediate variables in the usual way, as rules of differentiation are applied.
 Reverse  evaluates entries in the code list first, and lastly differentiates the intermediate and independent variables in reverse order.
While the forward mode is straightforward and easier to implement, the reverse mode is computationally superior for scalar functions, as its complexity is independent of the number of intermediate variables. More generally, if , the forward mode has complexity of O(n), and the backward mode has complexity of O(m).
BFGS method 
This is a method to solve an unconstrained nonlinear program that proceeds as follows.
 Start with any symmetric, negative definite matrix, say , and any initial point . Set
 Direction: Solve .
 Step size: .
 Change in position: .
 New point: .
 Change in gradient: .
 Update to form with the BFGS update.
 Let to complete the iteration.
Compare this with the DFP method, and note a key difference is that DFP estimates the inverse hessian, whereas BFGS estimates the hessian (and solves the linear system). There is a correspondance, and empirical evidence suggests, that the BFGS method has less of a problem with error.
BFGS update 
A way to update an approximation of the hessian, used for unconstrained optimization. The update is
where is the update from and and are defined in the BFGS method for the th iteration. Taking the inverse to compare with the DFP update,
where is the update from .
BFS 
Backbone 
Originally, this was a set of propositions that must be true in every solution of a satisfiability problem. If there is no solution, one seeks a set of truth values that maximizes the number of clauses that are true. Then, the backbone is the set of propositions that must be true in every optimal solution. This has been extended to various combinatorial programs, for which the backbone is
where is a vector of binary variables in the IP formulation and is usually the optimality region, although it could allow some ball around the objective value. Some results suggest that the difficulty of an NPhard problem can be further analyzed by the size of itsbackbone.
Backjumping 
A backtracking algorithm that, in contrast to classic backtracking algorithms, records partial justifications for inferences that are made during the tree search. These justifications allow the search to soundly jump back several levels in the search tree in case of an inconsistency.
Backtracking 
An algorithm applied in tree search that builds up candidate solutions by making one decision at a time (i.e., branching, see branch and bound) and checks for feasibility at each decision. In case an infeasible state is reached,the algorithm "backtracks", returning to a previous decisions and changing it.
In constraint programming a narrower definition is often used: backtracking is a recursive algorithm that traverses the search tree depthfirst, such that at each step a variable is assigned a value (all possible values are assigned in turn) and then the partial assignment is checked for consistency, i.e. the algorithm checks whether all constraints whose variable are fully assigned, are satisfied. If there is an inconsistency, the algorithm prunes the subtree and tracks back one step.
Backward Transformation 
Backward Transformation (abbr. BTRAN). This applies to the factored system, , where each is an elementary matrix. The recursion is:
 ...
 ,
Backward substitution 
The recursion to solve a nonsingular upper triangular system, . It starts with , then
Backward triangularization 
An algorithm to rearrange a matrix by recursively assigning a singleton column to the rear (with its only row as its pivot). The recursion applies to the submatrix defined by omitting the assigned column and row (and reducing remaining column counts accordingly). This results in a lowertriangular rearrangement if and only if such a rearrangement exists. Otherwise, the algorithm reaches a submatrix will all columns having more than one element, of which one is selected to be a spike that is assigned to the rear of the matrix. The process then continues on the submatrix.
Example 1: Let the nonzeros be denoted by *:
The column counts are . We thus put column at the rear, assigned to pivot on row , and consider the remaining submatrix:
The remaining column counts (for ) are , so column is placed at the (new) rear, pivoting on row :
The remaining column counts are , so we reach the final rearrangement, which is triangular:
Example 2: Column counts are shown below the matrix.
All column counts are greater than one, so we select a column to be a spike, and we assign it to the rear with a pivot row (some algorithms delay the pivot row assignment). For this example, we choose a column with minimum count and pick the pivot row with maximum count.
The final rearrangement () has one spike.
Balance equation 
A constraint of the form , where is the input and is the output, each as a function of the decision variables, . Typically this is linear, conserving flows, measured in units of volume or mass. In an ordinary network in which there are no gains or losses, this appears as:
Barrier function 
is a barrier function on the strict interior, say , of the set if is continuous, and as approaches any point in . Barrier functions arises in penalty function methods for certain classes that include convex programs of the form:
where . Two popular choices are
and
Barycenter 
This is the center of mass and is the point at which a mass (body) can be considered as being concentrated without altering the effect of the attraction that the earth has upon it. This is the average position of particles located at postions weighted by their masses ,
In mathematical programming, this is usually used to mean the simple average of the extreme points of a polytope,
where is the set of extreme points. This could be a weighted average,
where .
Basic 
Associated with a submatrix of , say , whose columns comprise a basis for (i.e., consists of linearly independent columns of , which is a basis for ).
Here are some related terms that arise in linear programming.
 Adjacent basis. One that differs in exactly one column from a given basis.
 Basic column. A column of the basis matrix.
 Basic variable. The variable, say , associated with the th column of the basis matrix.
 Basic level. The value of a basic variable.
 Basic solution. The solution, , obtained by setting nonbasic values to some bound value, like 0, resulting in a unique solution for the basic variables. That is, is equivalent to , where and . Upon fixing the value of , the nonsingularity of gives the basic solution with . In a standard linear program, , and hence .
 Basic feasible solution. A basic solution that is feasible i.e., the basic values satisfy their bounds. (In a standard LP, this means .)
 Basis kernel. After performing forward triangularization, if the basis does not triangularize completely, backward triangularization is applied. The result is a (rearranged) blocking of the basis into three segments:
\  \ < Forward triangle __\ ______       < Kernel  ______  \   \ < Backward triangle ____________\
Each row and column in the kernel has at least 2 nonzeroes.
Basic solution 
See terms associated with being basic.
Benders decomposition 
This decomposition separates complicating variables. The (original) semilinear model is:
Generalized Benders decomposition considers the following mathematical program:
Suppose this is not a convex program, but that with each fixed , the remaining mathematical program (with being the only variable) is convex. The idea is to consider
and solve , with evaluations of requiring the solution to the convex program.
Biconcave function 
A function whose negative is biconvex.
Biconvex function 
A function on that is convex on for each and convex on for each . An example is on , which is not convex in both and together.
Biconvex program 
A nonlinear program with each function biconvex or biconcave, such that if or is fixed, the resulting mathematical program in or , resp., is a convex program. A special case is the bilinear program.
Bidding algorithm 
Also called an auction algorithm. This proceeds by generating a sequence of partial solutions (e.g., a partial assignment), terminating when the solution is complete. There is a pricing vector and a bidding increment whereby the solution is revised and extended by a myopic optimization scheme (which depends on the particular problem).
BigM method 
An alternative to using Phase I, a large number is used as a linear penalty in the composite objective:
where is an artificial variable and .
Bilevel program 
A multilevel program with two levels (sets of variables).
Bilinear function 
A special quadratic function, , that is linear in for each , and linear in for each .
Bilinear program 
A nonlinear program whose objective and/or constraint functions are bilinear. An example is the pooling problem.
Bin packing problem 
Partition a set of integers, , into bins such that the sum of all integers in each bin does not exceed the bin capacity, say , so as to minimize the number of bins required. The problem is NPhard.
Binary CSP 
A constraint satisfaction problem is binary if all constraints are either unary or binary relations (wrt the decision variables). That is, the cardinality of the scope of the constraint is either 1 or 2. For example, (unary) or (binary).
Binary constraint 
A constraint is binary if there are exactly two variables in its scope.
Binary relation 
A subset of , say . When , we refer to just the relation on the set .
Here are 5 properties that distinguish relations on a set (, , are in ):
 antisymmetric  and imply .
 irreflexive  .
 reflexive  .
 symmetric  implies .
 transitive  and both in imply .
Binary variable 
A decision variable with two feasible values, usually 0 or 1.
Binding constraint 
A constraint whose removal changes the optimality region. (Some authors require that the removal results in a strict improvement in the objective value.)
Bisection 
Taking the midpoint of endpoints known to contain a solution. If the solution is in the interval (a,b), bisection chooses x = (a+b)/2 as the next point to evaluate. The result reduces the interval to (a,x) or (x,b), so the length is cut in half with each evaluation.
Bland rule 
This is for pivot selection in the simplex method to avoid cycling:
 If more than one (nonbasic) column has a negative (for minimization) reduced cost, choose the one with lowest index.
 If more than one (basic) column has the same determining value to leave the basis, select the one with the lowest index.
Beale's example of cycling shows that Bland's Rule must apply to all candidates for entering the basis, not just those with most negative reduced cost, see the supplement on cycling. The fifth tableau is the first time Bland's Rule is needed to break the cycle. Variable is chosen to enter the basis rather than .
Blending problem 
A blend is some combination of materials to make another material. Given raw materials, their blends make intermediate materials, called stock, and/or final materials, called products (there can be other blending activities that make products from stocks). The raw materials have purchase costs and the blending activities have costs of operation and maintenance. The products have either fixed demands or selling prices, and the problem is to find blends that minimize total net cost (or maximize profit). This arises in refinery operations, in the food industry, and in other process industries. The problem can sometimes be solved with linear programming, but there can be complicating relations that require nonlinear programming, such as the pooling problem.
Block pivot 
Given a detached coefficient form, a pivot is performed on a nonsingular submatrix of nonbasic variables (rather than just one element). Suppose the detached coefficient form is partitioned into 2 row blocks and 5 column blocks (with the last column corresponding to the righthand side):
Blocking polyhedron 
The blocking polyhedron of a polyhedron is .
BolzanoWeierstrass theorem 
A fundamental existence theorem guaranteeing a maximum and a minimum.
A continuous function on a nonempty compact set achieves a minimum and a maximum over the set.
A useful corollary to the BolzanoWeierstrass theorem is:
Suppose is continuous on its nonempty feasible region, , and that is a closed set. Then, achieves a maximum on if there exists such that is bounded.
In particular, if we have a quadratic objective of the form , it is sufficient for to be negative definite. Although often credited to Weierstrass alone, it was proven by Bolzano in 1817, and it was known to Cauchy near the end of the 19th century.
Bordered hessian 
The complete Hessian of the Lagrangian:
Bottleneck TSP problem 
The problem asks if there a traveling salesman problem tour whose value is less than some given value?
Bottleneck assignment problem 
This is the assignment problem, where the "cost", , is the rate at which person can perform job . The objective is to maximize the rate of production of the entire assembly line.
Bottleneck transportation problem 
This is the transportation problem with a traversal time instead of arc cost, and the objective is to minimize the time it takes to satisfy the demands:
where is the flow from source to destination , and is the time it takes to ship across arc (independent of the quantity shipped).
Bounds consistency 
A consistency property for variables over integer domains where consistency is only defined on the minimum and maximum values of the integer domain.
Box constraint 
Simple bounds of the form .
Braess paradox 
The phenomenon that in a traffic network in which travel times depend on link (arc) flows the addition of a link (arc) can increase travel time for every user if all users minimize their own travel time. Hence, the paradox is that building a new road can make traffic congestion worse. See Myths and Counter Examples in Mathematical Programming for an example.
Branch and bound 
Born in integer programming (IP), branch and bound (B&B) is more generally used in global optimization, including continuous functions. A prototype is as follows.
 Initialize set list with feasible region ( plus all constraints), say , with associated bounds , (finite bounds on the objective can be computed, if computationally preferable, and the initial set could be a superset of the feasible region). Initialize equal to the lower bound (, unless an initial feasible solution is known).
 Branch by selecting the th set from list, say , and splitting into a partition, say such that the interior of does not intersect the interior of . (Could partition into more than 2 subsets.)
 Bound in and/or in , say for all , for and/or . Define if it is determined that is empty (in which case will be discarded in the next step).
 Test if , for and/or . If not, put on list. (If the test passes, is discarded by not being put on the list. Note: is best solution value to date, and is a tolerance that says we want to get a solution whose objective value is within of the optimal value.) Further, if and the associated point, , is feasible (where ), replace with and with . If the list is not empty, go to branch.
B&B terminates when the list becomes empty, and the best feasible solution obtained is with value , which is within of optimality. (If , then there is no feasible solution.) Notes:
 A particular bounding rule for IP is to solve an LP relaxation, whose value is an upper bound on the integervalued maximum. The lower bounds do not change unless an integer solution is obtained from the LP, in which case the best solution to date can change.
 Branching rules can range from breadthfirst, i.e. expand all
possible branches from a node in the search tree before going
deeper into the tree, to depthfirst, i.e. expand the deepest
node first (and backtrack, as necessary). For example,
suppose we branch on the value of , which could be or . The
breadthfirst rule would solve the LPs for each of the two cases.
The depthfirst rule would solve for and postpone the
complementary case, to consider another variable dichotomy,
going deeper into the search tree. To illustrate, the following are
search progressions in ;.
Breadthfirst:
____. ____.____ ____.____      (x=0) (x=0) (x=1) (x=0) (x=1) / (y=0) 1 2 3 ____.____ ____.____ ____.____       (x=0) (x=1) (x=0) (x=1) (x=0) (x=1) /\ /\ / /\ /\ (y=0)(y=1) (y=0)(y=1) (y=0) (y=0)(y=1) (y=0)(y=1) 4 5 6
Depthfirst:
____. ____. ____.    (x=0) (x=0) (x=0) / /\ (y=0) (y=0)(y=1) 1 2 3 ____.____ ____.____ ____.____       (x=0) (x=1) (x=0) (x=1) (x=0) (x=1) /\ /\ / /\ /\ (y=0)(y=1) (y=0)(y=1) (y=0) (y=0)(y=1) (y=0)(y=1) 4 5 6
 A bestfirst search is to select the node (i.e., set on list) that seems most promising, such as one with the greatest upper bound. Typically, there are multiple criteria for branching, and these change depending on whether a feasible solution has been obtained.
 Backtracking need not go in exactly the reverse order (i.e., need not have a LIFO rule), but care must be taken when reordering the path.
 When the problem is not finite (not IP), branching rules need to consider the hypervolume of each set in a partition to ensure convergence.
 The B&B methods we consider in OR are special cases of heuristic search in Artificial Intelligence (AI).
Branch and cut 
A hybrid of Branch and bound and cutting plane methods.
Branch and price 
The price operation is the same as column generation.
Branching heuristic 
See branching rule.
Branching rule 
A branching rule (or branching heuristic) specifies how a search tree is constructed during search. More specifically, it is a procedure or set of conditions for choosing the next variable to be branched on. It is part of the search strategy.
In constraint programming a branching rule is commonly referred to as a variable ordering heuristic.
Breakpoint 
A point where some property of a function changes, such as its slope. For example, a piecewise linear, univariate function with breakpoints has the form:
BroydenFletcherGoldfarbShanno method 
BroydenFletcherGoldfarbShanno update 
See BFGS update.
Broyden family 
This is a family of algorithms to solve an unconstrained nonlinear program where the objective function is in ;. The algorithms in this family differ by how they update the approximation of the hessian inverse (assumed to exist). These are of the form:
with the two matrices being the BroydenFletcherGoldfarbShanno update (BFGS) and the DavidonFletcherPowell update (DFP), respectively. If is in , this is called the Broyden convex family.
Bundle method 
A class of nonsmoothoptimization methods for the numerical minimization of a convex function, . At the th iteration the technique uses the available information contained in a bundle that is obtained from an oracle; here each is a subgradient of at .
Important applications are Lagrangian relaxation and column generation applied to a primal problem
In this context, for a given ,
results from the maximization of the Lagrange function; each is , with maximizing the Lagrangian at . Minimizing is the dual problem.
If minimizing , a basic method to construct the next iterate uses the cuttingplane paradigm: one minimizes the piecewiselinear function
which can be viewed as a model of (in fact ). Minimizing is a linear program, which is called the restricted master program in the context of column generation. One of the deficiencies of this method is instability i.e. the sequence usually oscillates uncontrollably. Several possibilities exist to stabilize the process; one, adopted in the bundle approach, replaces the minimization of by the quadratic program
Here the positive is a stability parameter that is usually adjusted at each iteration, and the stability center is
some () taken from the bundle, typically one that gave the best value.
CPLEX 
A collection of mathematical programming software solvers.
CSP 
See constraint satisfaction problem.
Calculus of variations 
See variational calculus.
Capacity expansion 
This is an augmentation to any mathematical program that has constraints representing a capacity limit. Suppose represents the limitation of using more than units of capacity, where is the actual amount used for policy . Then, this becomes a capacity expansion model by replacing the constraint with
Capital budgeting problem 
In its elementary form there is a fixed amount of capital, say , that can be allocated to any of investments. Each investment has a minimum level, say , and a maximum level, say . The expected return on investment is a function, , where is the level of the th investment opportunity (). Risk is measured by a standard deviation from the expected return, say . The problem is to maximize total expected return, subject to a budget constraint: , and a risk constraint: , where and are parameters. The returns on the investments could be correlated. Then, if is the variancecovariance matrix, the risk constraint is quadratic: . (Also see the portfolio selection problem.)
Caratheodory conditions 
For the classical Lagrange form, , where and are smooth, the following conditions are necessary for a feasible to be optimal: there exists , called multipliers, such that
This reduces to the Lagrange Multiplier Rule when is not zero (divide by ), which must be the case if has full row rank.
Caterer problem 
A caterer has booked his services for the next days. He requires fresh napkins on the tth day, . He sends his soiled napkins to the laundry, which has 3 speeds of service: , or days. The faster the service, the higher the cost, , of laundering a napkin. He can also purchase new napkins at a cost, . With an initial stock of napkins, the caterer wishes to minimize his total cost. (This can be formulated as a transportation problem.)
CauchySchwarz inequality 
This is inequality
where is the Euclidean norm. (Note: for , is the cosine of the angle between vectors and .) .
Ceiling 
The integer roundup of a value, :
Examples: , , . Also, see floor.
Central path 
A differentiable curve where each point is the analytic center of a polyhedron associated with a linear program. Specifically, suppose we have the primaldual pair:
Let be the primal slack. Then, the primal central path is:
Certainty equivalent 
See stochastic program.
Certificate 
A certificate of optimality is a set of conditions that certify some feasible solution, , is optimal. One example comes from duality: let be feasible in any dual with objective value . Then, is a certificate of optimality. In particular, if we have a linear program:
a certificate that a feasible is optimal is the set of dual conditions:
(Note the general use of duality does not require any convexity assumption, and the dual can be weak.)
A certificate of unboundedness is a collection of conditions that certify a mathematical program is unbounded. An example from duality is that a primal feasible solution is found and the dual is determined to be infeasible. In particular, if we have a linear program:
a certificate that this is unbounded is the existence of a feasible x and the determination that implies a contradiction.
A certificate of infeasibility is a set of conditions that certify a mathematical program is infeasible. One class comes from duality: a dual sequence is found whose objective diverges. In particular, if we have a linear program:
then a certificate that this is infeasible is the existence of a sequence such that for all and as .
Chance constraint 
See stochastic program.
Character of solution 
The idea is to define character by solution status. In linear [integer] programming, columns might be partitioned into
 basic vs. nonbasic at Lower bound vs. nonbasic at Upper bound
 positive in some optimal solution vs. zero in every optimal solution
Rows might be partitioned into
 inactive vs. active
 nonbinding vs. binding
 positive dual price in some optimal solution vs. zero dual price in every optimal solution
Only some rows and columns might be partitioned, and others are free to change. The character is the property we want to fix by the partition. For example, if the LP has activities for operating a plant, we might want to specify the plant operation as a character of the solution, never mind specific levels (or maybe some minimal throughput is specified).
Then, one conducts sensitivity analysis by asking for preservation of the character. Classical LP sensitivity analysis does this by preserving the optimality of the basis; interior point analysis seeks to preserve the optimality of the partition. Both are special cases, and one might want to focus on only a portion of the LP. Here are some properties of interest:
 If the character holds at [c', b'] and at [c", b"], it holds for all [c, b] in their line segment.
 The stability region is a convex set.
This notion of character extends naturally to combinatorial optimization, such as preserving a key portion of a TSP tour or joborder in a schedule.
Characteristic cone 
The Characteristic cone of a polyhedron, is
If , then its characteristic cone is , which is the same as the recession cone.
Chemical equilibrium problem 
The problem is
Chinese postman problem 
A mailman starts from the post office, delivers the mail on his route, covering each street at least once, then returns to the post office. The problem is to find a traversal such that the total distance travelled is minimized. (Some streets are oneway, and the postman is driving.) Abstractly, the problem is to find a leastcost cycle in a graph that uses each edge and arc (forward direction only) at least once. This differs from the Traveling Salesman Problem in that the postman can traverse an edge more than once. A related problem is to determine whether the postman can cover his route within a prescribed distance (while starting and ending at his post office). If so, the cycle is produced. This is computationally equivalent to the minimization problem (in a complexity sense), and is sometimes taken as the problem definition.
Chinese remainder theorem 
Let be relatively prime positive integers, and let be any integers. Then, there exists such that for each . The value of is uniquely determined by and .
Cholesky factorization 
Given an symmetric matrix , a lower triangular matrix, , is obtained, called the Cholesky factor, such that . This is particularly useful for solving linear systems, , by using forward substitution, , then backward substitution, . The original algorithm assumes is positive definite, but it can apply more generally. This is also called Cholesky decomposition.
Chvatal cut 
A cutting plane of the form,
where is a Chvatal function.
Chvatal function 
This class is defined recursively:
 is a Chvatal function (of rank 0) if it is a linear function.
 If and are Chvatal functions, then is a Chvatal function for any nonnegative and , and its rank is . (Some require and to be rational.)
 If is a Chvatal function of rank , then is a Chvatal function, and its rank is if is not equal to .
This arises in integer linear programming in several ways, see Gomory function.
Closed form solution 
This is where one could express a solution (e.g., optimal) by an equation of the
form:where is some function of the parameters, . This is typically meant to suggest that no algorithm is needed, but one must be careful since is not limited in any way, such as being "easy" to evaluate. For example, under mild assumptions, the coordinates of a global maximum of a function, , on the domain , is given by the following integral:
(The idea is to weight "mass" at the global maximum, and it is valid if the set of global maxima is convex.) This is not too useful, and the integral would typically have to be evaluated with a numerical method  i.e., an algorithm.
Closed function 
In convex analysis, this is where its epigraph is a closed set. (If f is concave, it is its hypograph that must be closed.)
Closed map 
Let be a pointtoset map and suppose and are such that for all . Then, is closed at if . The map is closed on a subset of , say , if is closed at each in .
Closed set 
A set is closed if it contains all its limit points, i.e. if and each is in , then is in . By the logic of the definition, an empty set is closed (it is also open). A finite number of intersections of closed sets is a closed set, so the feasibility region of a mathematical program in standard form is closed if each constraint function is continuous. ( can be only upper semicontinuous for the constraint, .) An example of a set that is not closed is: . The point 0 can be approached by a sequence, say , but 0 is not in the set. The closure of a set is the union of the set and all of its limit points.
Closure condition 
The conditinos that the closure of a nonempty strict interior be equal to the level set: . Here is an example where the closure condition fails:
Note that is continuous and quasiconvex. Its level set is , but the strict interior, is , so its closure is only . We lose the flat portion, .
This is important in the use of interior point methods and
in stability. Here
are some functions that satisfy the closure condition:
 explicitly quasiconvex
 positively homogeneous (all degrees positive or all negative)
 monotonic (all strictly increasing or all strictly decreasing)
When equality constraints are present, there are two forms of extension of the closure condition: to consider the relative strict interior, and to consider "feasible sequences". The first generally assumes is affine, so the closure condition becomes:
and
Then, the closure condition is that and are not empty, and
and
Coercive function 
The function is coercive with respect to if there exists a vector such that for any sequence having the property that , we also have
Some people use a different definition, where (i.e., a scalar, realvalued function):
Column generation 
This pertains to solving a linear program whose columns are generated during pricing. Typically, the number of columns is astronomically large, possibly infinite. An example is when solving the randomized program, as with the Generalized Lagrange Multiplier method. In that case, column generation consists of maximizing the Lagrangian. A similar view applies to DantzigWolfe decomposition. From the dual view, this is a cutting plane method since generating a column in the primal corresponds to generating a constraint in the dual.
The concept applies to any mathematical program, and the randomized program model hightlights the duality and how a completely general mathematical program can be considered with (generalized) linear programming.
Combinatorial program 
Let and consider a finite collection of subsets, say . For each subset there is an objective function value, , and the problem is to optimize . Typically, the feasible subsets are represented by inclusion or exclusion of members such that they satisfy certain conditions. This then becomes a special class of integer programs whose decision variables are binary valued: if the ith element is in ; otherwise, . (IP formulations are not always easy, and often there is more than one formulation, some better than others.)
 Here are some examples:
 assignment
 covering, cutting stock
 knapsack
 matching
 packing, partitioning
 routing
 sequencing, scheduling (jobs), shortest path, spanning tree
 traveling salesman
Also see integer program.
Compact formulation 
In integer programming, this refers to having a polynomial number of constraints. For example, look at the travelling salesman formulations. The linear form has an exponential number of "subtour elimination consgtraints," so it is not compact. The
quadratic assignment formulation is compact.Compact set 
This has a general meaning in mathematical analysis. For mathematical programming, where we work in finitedimensional Euclidean vector spaces, it means the set is closed and bounded. This is often an assumption about the feasibility region in order to ensure the existence of an optimum value for a continuous objective function – see Weierstrass' theorem. Any finite set is compact. An open interval is not compact because its endpoints, and are limit points but are excluded from the set.
Compatibility theory 
The idea that a solution's character does not change for a particular perturbation. In linear programming the character could be an optimal basis, and the theory is concerned with whether a particular basis remains optimal when the data is changed in a prescribed direction. A Fundamental Theorem of Basis Compatibility is the following:
is an admissible direction for perturbing if, and only if, it is compatible with some equilibrium basis.
The range of compatiblity of a basis, , for a direction, , is the greatest step for which remains optimal:
The basis spectrum is the greatest range:
Complementarity condition 
Satisfying complementary slackness.
Complementarity problem 
Let . The complementarity problem (CP) is to find such that and . It is complementary because every solution has the property that either or (or both) for each j. The linear complementarity problem (LCP) is when .
The problem generalizes to allow bounds so that . Then, is required to satisfy:

if
if
if .
Complementary slackness 
The condition that two nonnegative vectors are orthogonal. It arises in the KuhnTucker conditions, where the Lagrange multiplier, say , must be orthogonal to the (inequality) constraint functional value: . This means either or for each  that is, if a constraint is not active, its Lagrange multiplier must be zero.
Complementary variables 
If and are the two vectors that satisfy complementary slackness, each pair, and are called complementary variables. In linear programming, primal levels and dual prices are complementary variables.
Complexity 
A measure of computer time or space to solve a problem by an algorithm as a function of the problem's dimensions. Suppose is the time it takes to solve an instance of the problem with dimension . Then, the algorithm has (worstcase) time complexity , if the greatest time it could take to solve an instance of the problem is . When is a polynomial, we say the algorithm has polynomial time complexity. The KleeMinty polytope shows that the elementary simplex method does not have polynomial time complexity.
The average time complexity is the average (rather than worst) time of an algorithm (or class of algorithms) over some class of problems, for which some probability distribution is assumed. Absent a modifier, the complexity of an algorithm is taken to mean its worstcase time complexity.
Whereas complexity refers to the performance of an algorithm, see the notion of NPcompleteness for the related meaning of problem complexity. The standard notation to describe complexity in terms of problem dimensions, say , is . This ``bigO notation means the following: a function, , is if there exists a constant, , and in , such that for all . For example, if an algorithm requires fundamental operations on a problem of size , its time complexity is . See the supplement on complexity for more information.
Complicating variables 
Those variables which, when fixed at particular values, render the remaining mathematical program relatively easy to solve. One example is the use of binary variables for fixed charges in an otherwise linear program. More generally, see Benders' decomposition.
Component 
In a graph (or network), this is a maximal connected subgraph. (The number of components is 1 if the graph is connected.)
Composite concave program 
A global optimization problem whose objective is a composite concave function.
Composite function 
A function of a function, of the form .
Concave function 
The real valued function is concave if its domain, say , is a convex set, and for and :
Condition number 
This is when is nonsingular and is some matrix norm. This arises in convergence analysis, where is the hessian. Whenever is symmetric and positive definite (as the hessian of a strictly convex function), and the matrix norm is that induced by the Euclidean norm (i.e., ), the condition number is the ratio of the extreme values of the eigenvalues. It is often used to measure the convergence rate of an ascent algorithm, due to Kantorovich's inequality. (Unfortunately, it is erroneously used to measure the goodness of an algorithm – see Myths and Counterexamples in Mathematical Programming.)
Cone 
A set, , with the property that if , then , for all positive real . A convex cone is a cone that is also a convex set. Equivalently, is a convex cone if . (An example of a cone that is not convex is the union of the axes.) A polyhedral cone is a cone that is also a polyhedron; equivalently, is a polyhedral cone if there exists a matrix such that . An example of a cone that is not polyhedral is .
A quadratic cone is of the form , where is any (real) matrix. If is negative semidefinite, the cone is all of space. If is positive definite, the cone is just the origin, . So, quadratic cones usually arise when is indefinite. Example: . See each of the following special cones:
Cone of optimality 
Given an LP in standard form, let be an optimal basis (for which the dual is feasible). Then, the cone of optimality of is . It is so called because remains an optimal basis for any in this cone.
Conic program 
The standard form is
where is a cone (not necessarily convex). This is more general than it looks. Suppose is any nonempty set, and we have the mathematical program:
Then, defining , we have that the above problem is equivalent to the conic program:
This problem class also includes the semidefinite program, as could be a matrix.
Conjugate directions 
Direction vectors chosen in an algorithm that are conjugate with previous directions, with respect to the hessian of the objective. For the unconstrained quadratic, (where is symmetric), direction vector is conjugate if for all previous direction vectors, .
Conjugate duality 
See the supplement on particular duals.
Conjugate function 
The convex conjugate of on , denoted on , is the greatest convex approximation from below:
and , i.e. is the effective domain of ). The concave conjugate of on , denoted on , is the least concave approximation from above:
and . This is a foundation for Lagrangian duality, viewed in response space.
Conjugate gradient method 
An ascent method for unconstrained optimization such that successive directions are conjugate with respect to the hessian for a quadratic objective function.
Conjugate vectors 
With respect to a symmetric matrix, , and are conjugate if . (In particular, if , and are conjugate if they are orthogonal.)
Conjunctive normal form 
A logical expression that is written as a conjunction,
where is the conjunction (logical and). Each is called a clause and has the form
where is the disjunction (logical inclusive or) and each is a literal  a proposition or its negation.
Connected network 
In the undirected case, a graph is connected if, for any pair of nodes, there is a path that contains both of them. In a directed graph, or network, the path may be required to be directed (i.e., follow the orientations of the arcs), in which case the network is strongly connected; or, the path may be allowed to ignore the arc orientations, in which case the network is weakly connected.
Consistency 
In general, a logical theory is consistent, if it contains no contradictions.
In constraint programming, a (partial) variable assignment is called consistent with respect to a set of constraints if none of the constraints is violated.
More generally, in CP, different definitions of consistency are used as a basis for inference. See:
Consistent 
Pertains to a system of equations and inequalities, for which there exists a feasible solution.
Constraint 
Usually this is a relation of the form of an inequality, g(x) <= 0, or equation, h(x)=0. More generally, it can be any restriction the decision variables must satisfy. For example, some regard "x must be integervalued" as a constraint, while others would say that this is simply the domain of the decision variables in the mathematical program. There are other relations, such as the logical form of a precedence constraint: IF x=0 THEN y=0.
In constraint programming, a constraint is an arbitrary relation imposed on a set of variables where each variable must be assigned on value from its domain. A constraint holds if all variables in are assigned values from their domain and the relation is satisfied.
Constraint graph 
Constraint graphs are used to conceptualize the set of constraints and their scopes in a CSP.
A primal constraint graph represents variables by nodes and associates an arc with any two nodes in the scope of in the same constraint.
A dual constraint graph represents each constraint with a node and associates a labeled arc with any two nodes whose scopes share variables. The arcs are labeled by the shared variables.
Constraint optimization problem 
The general aim for constraint optimization problem is to find an optimal solution to a set of constraints subject to some objective function obj. For example, consider a CSP together with a function . is a mapping from solutions to (i.e., all elements the set ) to the set of real numbers, .
A solution, , to a constraint optimization problem is a complete assignment of variables that satisfies all constraints and for which the value is optimal (either minimal or maximal, depending on the sense of the optimization).
See also constraint satisfaction problem.
Constraint programming 
Constraint Programming (CP) is a paradigm to solve satisfaction or optimization problems by a twostep approach: propagation and tree search. Given a CSP the decision variables’ domains are tightened by first propagating the constraints. If after this step all domains contain only one value then a solution has been found. If at least one domain is empty, the CSP has no solutions. Otherwise, the second step, tree search, is initiated: by specifying a variable ordering heuristic and a value ordering heuristic, a search tree is constructed which is then traversed according to one of various search strategies. If the problem is a satisfaction problem, the first solution found is returned or it is proved that no solution exists. For an optimization problem the whole tree is traversed in a branchandbound search and the best solution returned.
Constraint propagation 
Typically, the propagation process is repeated until a "fixed point" is reached where no further values can be removed from the domain of any variable.
Constraint propagator 
A constraint propagator is an algorithm (often associated with a global constraint) which implements a constraint propagation algorithm. Typically, the algorithm prunes the domains of the variables involved in the constraint, removing values for which it can be shown that they participate in no solutions given the current search node.
Constraint propagator is also now as filtering algorithm.
Constraint qualification 
Conditions on the constraint functions ( and ) that are sufficient to make the Lagrange Multiplier Rule valid. Here is an example to illustrate what can go wrong:
Since is the only feasible solution, it is optimal. The Lagrange Multiplier Rule requires that there exist for which , but and , so no such exists. Slater used this example in illustrating his interiority condition. The classical qualification, given by Lagrange's multiplier theorem without inequality constraints, is that have full row rank, which stems from the Implicit Function Theorem. Another constraint qualification is that all constraint functions be affine (even with redundant constraints). Each constraint qualification gives a sufficient condition for the Lagrange Multiplier Theorem to be valid. A constraint qualification is necessary if it must hold in order to guarantee that the Lagrange Multiplier Rule is valid for all smooth having optimal value at .
Constraint satisfaction problem 
A classical Constraint Satisfaction Problem (CSP) is defined by a finite set of variables where each variable is associated with a finite domain of integers . Furthermore, a finite set of constraints is imposed on the variables, where a constraint is an arbitrary relation on a subset of variables . A solution to the CSP is a value assignment to each variable such that the value lies within the respective domain and all constraints are satisfied.
There exist different variants and subclasses of constraint satisfaction problem:
Constraint solver 
A constraint solver typically consists of two parts:
 a propagation engine that prunes the variable domains according to the constraint propagator of each constraint.
 a search engine that branches on different values for the variables according to the selected search strategy.
A solution is found as soon as every variable is assigned a value from its domain and all constraints are satisfied.
A constraint solver can typically search for one solution (returning the first solution it finds, if one exists) or for all solutions.
Continuous program 
A mathematical program with continuous variables. Moreover, the term is commonly used to further imply that the objective and constraint funtions are continuous, an assumption that allows certain analytical results. For example, Weierstrass first proved that continuous functions attain both a maximum and a minimum over a compact set. If the problem contains both continuous and integer variables, then the problem is said to be a mixed integer program.
Contour 
Also called an isoquant of a function, . The projection of its graph into its domain: , where is the contour value.
Contraction map 
Also called a contractor. A function that brings points closer together, forming the foundation for convergence by successive approximation. Mathematically, we have , where is in a normed space, with norm of denoted by . is a contractor if there exists a constant in such that for all and in . See the supplement on fixed points.
Convergence 
Specifically, convergence of an algorithm. The algorithm is represented as the pointtoset map, , where there is some selection function to choose if has more than one member. Convergence means that the sequence, , has a limit point, say , such that satisfies certain conditions. Ideally, the conditions are that is an optimal solution, at least locally, but this is often not the definition used in a convergence theorem. (For example, in unconstrained optimization, most algorithms converge to a stationary point, which need not be even a local optimum.)
Let , where is a scalar series pertaining to the sequence . Typically, , which is sometimes called policy convergence (to a solution, ). We could also have , where is the objective function, in which case the concern is with value convergence to an optimal value, .
Some related terms and concepts:
 Dual convergence. Dual values, notably Lagrange multipliers, converge to a dual solution.
 Geometric convergence. Same as linear convergence, but usually used when the sequence is exactly a geometric progression:  is to the power of
 Global convergence. Usually means the same as globally convergent, but some mean that the algorithm convergences to a global solution.
 Globally convergent. Convergence to a solution from any starting point.
 Linear convergence. Order is 1 and rate is less than 1 (see below).
 Local convergence. Some mean locally convergent, but some mean that the limit point is a local optimum (or just satisfies necessary conditions, see Myths and Counter Examples in Mathematical Programming to avoid misconception).
 Locally convergent. There exists an open neighborhood of 0 such that {s^k}>0 from any s^0 in the neighborhood.
 Order of convergence. The order is
.
The order being 1 is linear convergence and the order being 2 is quadratic convergence. Most (nonfinite) algorithms to solve mathematical programs are between linear and quadratic.
Let and suppose (for notational convenience). The term order derives from the approximate equation, , where is the rate. If this equation were exact, we would have if , and if , for all . If and , we gain one decimal place each time: , , , etc. If , the number of accurate decimal places approximately doubles each iteration: , , , etc. Unfortunately, the underlying equation is not exact – see Myths and Counter Examples in Mathematical Programming to avoid misconception. Some authors call this qorder (or Qorder) convergence to distinguish it from variations of the definition of order. Each definition is designed to capture the notion of how many digits of accuracy are added as the sequence approaches its limit.
 Rate of convergence. This is generally used to mean the limiting ratio: , given the order is .
 Sublinear convergence. Order is 1 and rate is 1 (slower than all linear convergence), e.g., .
 Superlinear convergence. Order is 1 and rate is 0 (faster than all linear convergence), e.g., .
 Stochastic convergence. This applies when the successor point is a random variable, as in simulated annealing. Here are the most common types of convergence for a sequence of random variables, .
 with probability or almost everywhere (abbr., a.e.). .
 in probability. for all .
 in distribution. The sequence of cumulative distribution functions (cdf), converges pointwise: for all at which is continuous, where is the cdf of and is the cdf of .
 in pth mean. . For , this is called convergence in quadratic mean or in mean square.
Convex combination 
An affine combination of vectors whose coefficients are nonnegative, i.e., , where and .
Convex cost flow problem 
The mincost network flow problem with a nonlinear convex cost function.
Convex function 
Let . Then is convex if is a convex set, and for and :
Equivalently, its epigraph is convex.
Convex hull 
(of a set). The intersection of all convex supersets (which can be limited to halfspaces). Equivalently, the set of all convex combinations of points in the set (which can be limited to convex combinations of at most n+1 points, in n dimensions, which is known as Carathéodory's Theorem).
Convex program 
The mathematical program is convex if the objective function is convex, and the feasible region is a convex set. In particular, the mathematical program is convex if is convex, and is affine. See the supplement, Convex Cones, Sets, and Functions.
Here are some key properties of a convex program:
 Every local optimum is a global optimum.
 Firstorder conditions are sufficient.
 Lagrangian duality is strong.
Convex set 
If any two points are in the set, so is their line segment, i.e.
for any . See Myths and Counter Examples in Mathematical Programming.
Convex simplex method 
This extends the general strategy of the simplex method in linear programming to maximize a concave objective over linear constraints of the form and . (A form of local convergence applies when the objective is not concave, but is smooth.) A tableau is maintained, but nonbasic variables need not be at zero level. The partition is used to compute a generalized reduced cost of the form . The vector is determined by the basic portion: (so ).
As in the simplex method, pricing is used to determine whether there is an improving nonbasic vector. If not, the algorithm terminates (satisfying the KuhnTucker conditions); otherwise, a line search is used to determine the new point, changing only the basic variables to compensate for the change in the one nonbasic level. If this causes a basic variable to reach zero, the basis is changed with the pivot operation.
Convexity cut 
A class of cutting planes derived by considering a convex set in a polyhedron, P. The form of the cut is
and it is derived as follows.
Let be an extreme point of a given polyhedron, which contains a given set, . Suppose is a convex set in whose interior contains but does not contain any point of . Let be linearly independent vectors in , and let be such that and for all (e.g., the edges emanating from in ). Define and (i.e., . Then, the cutting plane excludes without excluding any other in for which . The cut, , is equivalent to the above form, , where for some .
One special case is the intersection cut for a 01 integer program:
 ;
 , where are the cuts already added;
 is an extreme point of , obtained by solving the LP relaxation with a simplex method, so is a basic optimum.
 , where is the basis that transforms the original equations to ;
 is a sphere, .
Corner polyhedron problem 
Gomory's relaxed IP that drops the nonnegativity constraints of basic variables. Suppose is a basic solution to the LP, , where is the vector of basic variables. Then, the corner polyhedron problem is
(dropping ), where is the reduced cost of in the LP solution. (Note: for all such that .)
Correlation matrix 
See elliptope.
Cost of captial 
Used in computing as an objective function, it is the interest rate. The units are currency per time period (e.g., dollars per year). One must be careful in computing this. For example, the marginal cost of capital can be greater than the marginal interest rate because the latter applies to the entire borrowing, not just the last dollar borrowed. (This is the "classical" definition, which does not consider advances in option theory.)
Covering problem 
The idea is to select enough members in each of a specified collection of sets; that defines covering the sets. Subject to this, there is a cost for the elements selected, and the objective is to minimize total cost. The IP form of the usual set covering problem is
where if element is selected; else, . The matrix has 0's and 1's, where the th row corresponds to the th set to be covered: means element is in set ; else, . The constraint means that at least one element of each set must be selected. This could be extended to require elements of set be selected by writing . Also see vertex cover.
Cramer rule 
To solve , where is nonsingular, the jth coordinate is given by:
where is the determinant, and is the matrix obtained if column of is replaced by :
Crash 
This is a heuristic to generate an initial point for an algorithm. It is from the early linear programming computer systems that used a variety of heuristics to generate an initial basic solution.
Crisscross method 
Crisscross method. A method in linear programming that chooses a pivot by possibly crossing from primal to dual, and vica versa. The leastindex crisscross method is a finite pivot method that chooses a pivot based on the least index of a column or row for which there is improvement in either the primal or dual infeasibility. It terminates at a basis when one of the following termination conditions is reached:
 Both the associated primal and dual solutions are feasible (implies optimality because complementary slackness holds by construction).
 There is evidence of primal and/or dual infeasibility (like the tests in the simplex method).
Critical path 
A longest path in a network, where length units are time durations. The nodes represent tasks, and the arcs represent precedence constraints. The path is critical because the associated tasks determine the total completion time of the project. Moreover, at least one of their duration times must decrease in order to decrease the total completion time.
Critical point 
A critical point of a smooth function is an element of the domain for which the first partial derivatives are zero or infinite.
Crossover operation 
See genetic algorithm.
Cumulative constraint 
is a global constraint that models scheduling problems as follows. Given are a collection of tasks , such that each task is associated with four variables:
 release time, , is the earliest time at which it can begin executing,
 deadline (or due date), , is the time by which it must complete
 processing time, , is the length of time it takes to complete
 capacity requirement, , is the capacity of the resource that it uses while it executes.
In addition, we are given the capacity variable of the resource.
The constraint assigns a starting time for each task such that:
 : the task starts at or after its release date and end at or before its deadline, and
 : at any time unit , the capacity of the resource is not exceeded.
Cut search 
An algorithm strategy for global optimization (notably for integer programming) that consists of two alternating phases: search and cut. The search phase finds linearly independent vectors emanating from a root point, , to setup a probe for the cut phase. Usually, is an extreme point, so the search is an edge probe phase that extends edges of the cone rooted at until it intersects the candidate solution points. Then, a cutting plane is added (usually a convexity cut) to exclude the root point. (A new root point is obtained by solving the new approximating problem on the smaller polyhedron.)
Cutset 
In a weakly connected network, this is a set of arcs whose removal decomposes the network into exactly two components. Separating node from node , the value of the cutset is the sum of the capacities of its arcs from the component containing to the one containing .
Cutting plane 
A hyperplane whose halfspace cuts off a particular point, such as a noninteger extreme point solution of a linear program relaxation of an integer program. The cut is such that it does not eliminate any solution to the original problem. For example, suppose we want to solve
The only feasible point is . The LP relaxation is
and an optimal solution is at . One cutting plane is . A deeper one is . (When adding a cutting plane to the LP, a new relaxation is obtained, and the solution is closer to the IP solution.)
Cutting stock problem 
Determine a way to cut standard sheets of material into various shapes (like clothes parts) to minimize waste. This is a (linear) integer programming model: patterns are specified, and is the amount of th stock (e.g., sheet or roll of material) used to generate the th output by the th pattern. Then, let be level of th pattern used and . Thus, is the amount of the th stock used, which is limited by its availability: ; and is the amount of th output generated, which is required to be in some range, say (allowing some demand overruns or underruns). Some models seek to minimize the total waste: . Other models consider cost too. The most common problems are 2dimensional (odd shapes from sheets of material); the 1dimensional case is called the trim problem. In the latter case, the stock index is not needed. For example, consider a stock of rolls of paper with a given width, which must be slit into rolls of various widths. Then, is how much of a stock roll is used by the th pattern to generate a roll of the th width. Moreover, is the amount of a stock roll used by pattern to generate a roll of width .
Cyclic descent 
An algorithm that seeks to optimize a multivariate function by optimizing each coordinate with a univariate optimization technique (keeping the other coordinates fixed). This is repeated until a fixed point is reached. In general, it is possible for such a fixed point not to be an optimum (even locally) because a simultaneous change in variables could result in an improvement. An example is given by:
has a minimum at , and has a minimum at . However, decreases with simultaneous change, . Along this parabola, , which is negative for nonzero. Thus, is not a local minimum of . For further discussion about this example, see Myths and Counter Examples.
Cycling 
In linear programming this means revisiting a basis, mainly referring to a simplex method, so that the algorithm would cycle through the same sequence of bases, not converging to an optimal one. See the cycling supplement
DFP method 
This is a method to solve an unconstrained nonlinear program that proceeds as follows.
 Start with any symmetric, negative definite matrix, say (e.g., ), and any point, say . Compute , and set each of the following:
 direction: .
 step size: .
 change in position: .
 new point and gradient: and .
 change in gradient: .
 Replace with and update by the DFP update to complete the iteration.
DFP update 
This is a way to update an approximation of an inverse Hessian, used for unconstrained optimization. Using the notation in the DFP method, the update is:
Note: is a rank 1 matrix (the outer product of the vector with itself).
Damped Newton method 
DantzigWolfe decomposition 
The principle applied here is that the total problem consists of:
 subprograms corresponding to its almost independent parts;
 master program that ties together the subprograms.
..  Master constraints  _____________________ ..  B1  ____ ..  B2  ____ . . . ..  Bk  ____
The procedure operates on the master LP with blocks used to generate extreme points as part of the pricing.
Dead end elimination 
Same as fathoming. Used by scientists who have combinatorial optimization problems, this is more descriptive than the older term of "fathoming" a node. The idea is that a condition has been met that ensures that following some branch (i.e., setting values of variables) cannot lead to an optimal solution. This is often just a dominance property.
Decision variable 
A decision variable represents a problem entity for which a choice must be made. For instance, a decision variable might represent the position of a queen on a chessboard, for which there are 100 different possibilities (choices) on a 10x10 chessboard or the start time of an activity in a scheduling problem. Each possible choice is represented by a value, hence the set of possible choices constitutes the domain that is associated with a variable.
Decomposition principle 
The idea of decomposing a mathematical program into two or more sets of variables and associated constraints. The purpose is to separate some portion with special structure from the rest of the mathematical program.
Here are some particular ones that are in this glossary:
Decoupling principle 
Arises in sensitivity analysis, as a corollary to the Compatibility Theorem:
An equilibrium basis is compatible with the change if, and only if, it is compatible with both and .
Degeneracy 
The solution to the primaldual pair of linear programs:
is degenerate if it is not strictly complementary i.e. for some . The pair is primal degenerate if there is an optimal solution such that . Similarly, the pair is dual degenerate if there is a dual optimal solution such that .
With regards to a basic optimal solution, such a solution is (primal) degenerate when some basic variable is at one of its bound values (canonically zero). A basic optimal is dual degenerate if one of its nonbasic variables has a zero reduced cost. Geometrically, this corresponds to a degenerate polyhedron. Suppose we have (where is by ). This polyhedron is degenerate if there exists an extreme point that is an element of the intersection of more than hyperplanes. The pyramid is degenerate because four planes meet at a point. (Example.)
Algorithmically, degeneracy can cause cycling in the simplex method.
Degeneracy graph 
An undirected graph, where the nodes represent bases and edges their adjacency. The degeneracy graph is a way to probe deeply into a degenerate extreme point (represented by more than one basic feasible solution). Among other things, it reveals good pivot steps through the point enroute to an optimum, and it provides a clear, efficient approach to sensitivity analysis.
 Here are some terms for a degeneracy graph associated with a (basic) solution, :
 degeneracy power of : number of bases corresponding to .
 internal node: a node whose neighbors are all in this degeneracy graph.
 transition column: a tableau column such that a pivot exists for which that column enters the basis, and the new basis corresponds to a transition node.
 transition node: a node with at least one member outside this degeneracy graph (so a pivot moves to a different solution).
 Transition Node Pivoting(TNP) rule: a lexicographic rule using a special column to avoid internal nodes (but cycling is possible).
Degenerate polyhedron 
See Degeneracy
Degree2 inequality 
Arises in mixed integer linear programming models for combinatorial programs on graphs, where each row of the matrix has 2 nonzeroes (usually 1's), as in the node packing problem.
Degree of difficulty 
See geometric program.
Density 
The portion of nonzeroes, see also sparsity.
Detached coefficient form 
Given the system , we augment the objective equation: . Then, detaching the variables, the form is:
Upon multiplying by the inverse of a basis, say , where , the detached coefficient form becomes
where (conformal with the partition of the columns of ). Now if we subtract times the first rows from the last row, this becomes
Recognizing that is the vector of basic levels, and the associated objective value is , we can drop the first columns plus column (corresponding to ) and form the tableau associated with this basis (adding labels to identify the basic variables in the rows and nonbasic variables in the columns).
Dichotomous search 
This finds the maximum of a unimodal function on an interval, , by evaluating points placed near the center, approximating the bisection method. With being a small positive value, let and . Then, if , the new interval of uncertainty is ; if , it is ; if , it is . With function evaluations, the interval of uncertainty can be reduced to within , where
The same reduction occurs with , as an even number of evaluations is required. For example, with , , so the interval reduction is . See fibonacci search for a more efficient method.
Diet problem 
Select levels of food consumption so as to minimize total cost subject to nutrient requirements. As a linear program,
is the level of food at unit cost . The th nutrient requirement (e.g., amount of vitamin C) is , and is the amount of nutrient in each unit of food .
Digraph 
Directed graph.
Dijkstra algorithm 
See labeling algorithm.
Dimension 
 Of a set: the dimension of the smallest affine space that contains the set. The dimension of an affine space is if the maximum number of
affinely independent points in the set is . Of particular interest is the column space of a matrix ,
. The dimension of this is the rank of . Another is the row space of ,. The dimension of this is also the rank of .  Of an expression: the units of measurement (e.g., tons of apples, meters of rope, hours of labor). In a relation, such as , the units of , and must all be the same. Dimensional analysis is concerned with determining such consistency and inferring whatever units are missing.
Diophantine equations 
A system of equations whose variables must be integervalued. A famous one is in Fermat's Last Theorem:
Directed tree search 
A class of algorithms designed to systematically search a decision space, where nodes correspond to partial solutions (sometimes called ``states). Examples are branch and bound and implicit enumeration. Also see heuristic search.
Directional arc consistency 
Directional arc consistency is a form of arc consistency.
An ordering is associated to the variables in the constraint network and constraints are made arc consistent only in in the direction of the ordering. For instance, if and constraint exists, directional arc consistency enforces that each value in the domain of is consistent with at least one value in the domain of but not necessarily the other way around.
Directional derivative 
The limit (if it exists) of the rate of change along a specified direction, say ,
In particular, the th partial derivative is with . (Note that the directional derivative, as defined above, depends on the scale: the derivative for is times the derivative for . To avoid scale dependence, some authors require .) Recently, some people have called this a Bderivative (or Bouligand derivative), and functions that have directional derivatives in all feasible directions are Bdifferentiable. Some require the convergence to be uniform.
Discount rate 
Also a discount factor. This accounts for the time value of money and arises naturally in financial models, such as a portfolio selection problem. A discount rate of 7% means $1 earned a year from now has a present value of approximately $93.46 (1/1.07). If $1 is earned n years from now, and the discount rate is , the present value is $. In continuoustime models, there are variations, such as defining the present value of dollars at time to be . In infinite horizon dynamic programming, the discount factor serves to make value iteration a contraction map. In that case, the fixed point of the stationary equation,
is obtained by successive approximation  i.e., value iteration for an infinite horizon DP. This converges to a unique fixed point, , if . Here, the DP notation is used, where is the discount factor.
Discrete program 
A mathematical program in which the variables are restricted to a discrete set, commonly the integers. Subcases include integer programs and combinatorial programs.
Disjunctive constraint 
This is a global constraint enforcing the fact that two activities and requiring the same unary resource cannot overlap in time. Therefore, either precedes or precedes . In general the disjunctive constraint achieves arc_consistency on the formula: .
For discrete (cumulative) resource capacity, the disjunctive constraint holds when where is the capacity requirement of task , is the capacity requirement of task and the capacity of the resource.
See also the Cumulative_constraint.
Disjunctive normal form 
A logical expression that is written as a disjunction,
where is the disjunction (logical inclusive or). Each has the form
where is the conjunction (logical and) and each is a literal – a proposition or its negation.
Disjunctive program 
In its most general form, this seeks a solution to one of a finite list of mathematical programs:
for some . Usually, they have the same objective functions, for all , so the mathematical program with disjunctive constraints has the following form:
This can be reformulated as a mixedinteger program, using binary variables to represent the disjunction.
Divide and conquer 
An algorithm strategy, such as dynamic programming, that divides a problem into parts. Suppose is the time complexity for a problem of size , and that dividing the problem into subproblems, each of size , takes additional time. Then, is called the divideandconquer recurrence relation. If (a constant) and , for and if . For example, if a bisection method is used (as in searching a sorted list), the divide and conquer recurrence is for . This implies since and .
Domain 
The domain of a variable is a set of all possible values that can be assigned to the variable. When the domain contains numbers only, the variables are called numerical variables. The domain of a numerical variable may be further restricted to integers, rational numbers or real numbers. When the domain contains boolean values only, the variables are called boolean variables. When the domain contains an enumerated type of objects, the variables are called symbolic variables (e.g. a variable that represents a day of the week).
A common propagation technique in constraint programming is to dynamically remove values from the domain of a variable when it can be inferred that the value does not belong to any solution given the set of variable assignments that have already been made. See
Dominance 
This is used in many contexts, but the general meaning is that something is uniformly better than something else. For example, consider two activities in a linear program, say and , such that:
 has greater cost:
 produces less of each requirement: for such that we require
 consumes more of each resource: for such that we require
 produces or consumes at the same rate of goods to be conserved: for such that we require
Then, activity dominates activity .
Doubly stochastic matrix 
A nonnegative matrix such that each row and each column sums to 1.
Dual 
Another mathematical program with the property that its objective is always a bound on the original mathematical program, called the primal. Suppose the dual is . Then, for all feasible in the primal and all in Y. This immediately implies that if the primal is feasible, the dual cannot be unbounded, and vice versa: if the dual is feasible, the primal cannot be unbounded. It also implies that if the dual is unbounded, the primal must be infeasible (and vice versa). A dual provides a sufficiency test for optimality, for if feasible and can be found such that , it follows that is optimal in the primal and is optimal in the dual. Weak duality is when this sufficient condition is all that can be guaranteed. Strong duality is when a finite optimal value to one problem ensures the existence of an optimal solution to the other and that their optimal objective values are equal. There are particular duals of significance.
Dual degeneracy 
See Degeneracy.
Dual method 
An algorithm for which each iterate is feasible in a dual mathematical program. An example is that the class of cutting plane methods, associated with the Lagrangian dual, is dual to the column generation method of DantzigWolfe decomposition.
Dual norm 
Given the norm is , where , its dual norm is , where . Note, if then , and if then . This is based on Hölder's inequality.
Dual price 
A dual variable associated with a primal constraint.
Duality 
The study and use of dual mathematical programs.
Duality gap 
Numerically, it is the difference between primal and dual objective values. The phrase by itself means the presence (or existence) of a gap  i.e., that the value is not zero.
Duality theorems 
These establish the fundamental duality relations (weak or strong). Several significant ones are given here.
Dynamic CSP 
Dynamic CSPs deal with the problems that are subject to exogenous change over time. A changing problem is viewed as a sequence of (static) CSPs in which one problem in the sequence is transformed to another through restrictions (i.e. additions of constraints) and/or relaxations (removal of constraints). Various solution definitions exist, most dealing with some notion of robustness (i.e., a solution to one static CSP is likely to remain a solution after a dynamic change in the problem) or ease of modification (i.e., a solution to one static CSP can be easily modified to become a solution for the next one).
Dynamic program 
Based on the Principle of Optimality, this was originally concerned with optimal decisions over time. For continuous time, it addresses problems in variational calculus. For discrete time, each period is sometimes called a stage, and the dynamic program (DP) is called a multistage decision process. Here is the Fundamental Recurrence Equation for an additive process:
where is the optimal total return upon reaching time in state , and is decision variable(s); , are state variables; , are time; is the decision space (usually depends only on state); is the immediate return; is the state transform.
In words, the optimal return upon reaching time in state equals the optimal return from an immediate decision plus the optimal return for the remainder of the process upon the transition to state . This new state is a function of the current time, state, and decision. For discrete time, for the forward equation, and for the backward equation. The multiplier, is generally a positive scalar, such as a discount factor to yield the present value of money. (For some problems , in which case the infinite horizon model might be ill posed.) More generally, the process need not be additive, so that '+' could be another composition operation.
A decision rule is a function, , that specifies an optimal value of upon reaching time in state . For a completely deterministic process, backtracking is used to obtain the usual form of an optimal solution from a known initial or final state. The process could be stochastic, in which case and are expected values, and the state transform is random with probability distribution, . Thus, is replaced by .
DP is a technique that applies to static problems too. For example, consider the following recurrence equation for the knapsack problem:
In words, this says that the max total return with objects having total knapsack space equals the max choice, of how much of the th object we put into the knapsack, with immediate return , plus the max total return from the remaining objects with the remaining space, . The value of (the stage decision variable) is any nonnegative integer up to the space allowed: is the max number of objects of this type that could fit in the knapsack. There could also be explicit bounds on , such as the 01 knapsack problem, in which case the policy set, simply has only those values.
Economic order quantity 
Abbreviated EOQ. This is the level of inventory to order that minimizes the sum of holding and ordering costs. The inventory function, , is the following periodic sawtooth function, where is the time between orders, and is the ordering quantity:
where is the rate of demand (inventory units per time units), and for . The inventory becomes zero at , which requires a new order of units. The model is thus:
where is the holding cost (currency per time units), so is the average holding cost, and is the fixed cost of ordering, so is the average ordering cost. The solution is , which yields the Economic Order Quantity (EOQ): . See the more general production scheduling problem.
Edgefinding 
The edgefinding rule is a constraint propagation technique that consists of deducing whether an activity can, cannot, or must be executed before or after a set of activities that all (including ) require the same resource. Two types of conclusions can then be drawn: new ordering relations (“edges” in the graph representing the possible orderings of activities) and new timebounds (i.e., tightening of the earliest and latest start and end times of activities).
An edgefinding algorithm is a procedure that performs all such deductions.
Effective domain 
The domain of a function for which its value is finite.
Efficient frontier 
The collection of efficient points, see multiple objectives.
Eigenvalue 
If a (scalar) value, , satisfies for some vector, , it is an eigenvalue of the matrix , and is an eigenvector. In mathematical programming this arises in the context of convergence analysis, where is the hessian of some merit function, such as the objective or Lagrangian.
Elastic program 
Same as Phase I, the artificial variables are called elastic with the idea that the constraints can "stretch".
Element constraint 
is a global constraint that is true iff a variable is the th element of an array . More specifically, it involves a 1dimensional variable array , an integer variable and variable , where , typically written as "element(X,y,z)".
Elementary matrix 
A matrix formed by performing a single row operation on the identity matrix. These arise in linear programming, particularly for the product form of the basis: , where each is an elementary matrix.
Elementary simplex method 
See simplex method.
Elementary vector 
Let be a subspace of . For , let denote its support set: . Then, is an elementary vector of if there does not exist such that and . This extends to where is not a subspace, such as the collection of nonnegative vectors in, in which case is the set of coordinates with positive value. This is of particular importance in defining the optimal partition of an LP solution.
Ellipsoid 
An ellipsoid centered at is the set
where is a symmetric, positive definite matrix.
Ellipsoid method 
This algorithm seeks to solve by iterations that begin with and , where is a measure of the smallest datum that can be represented  i.e., is a solution if, and only if, it satisfies . If the current iterate, , does not satisfy this, a new pair is defined by choosing one of the violated inequalities, say . Then, let , and apply the following rules:
By treating the objective as a constraint, say and/or , the ellipsoid method can be used to solve a linear program. Its significance is that it was the first algorithm for linear programming shown to have polynomial time complexity.
Elliptope 
This arises in semidefinite programming. It is the set of all real, square symmetric matrices whose diagonal entries are all equal to one and whose eigenvalues are all nonnegative. The dimension of the set is (where the matrix is ). For example, consider :
Then, is in the elliptope for if, and only if, , i.e. it is diagonally dominate (note the dimension is 1). An elliptope is also called a set of correlation matrices.
Epigraph 
This is the region on or above the graph of a function :
Equilibrium basis 
In linear programming this is a basis that is optimal in both the primal and dual. It arises in compatibility theory and degeneracy graphs.
Equivalent CSPs 
Consider two constraint satisfaction problems and and a sequence of their common variables. We say that and are equivalent w.r.t if
 for every solution to a solution to exists that coincides with on the variables in X, and
 for every solution to a solution to exists that coincides with on the variables in X.
Euclidean norm 
See norm.
EulerLagrange equation 
This is a necessary condition for to solve the classical problem in variational calculus:
where .
Evolutionary algorithm 
This is a term that applies to any metaheuristic based on some biological metaphor that is a dynamical system whose evolution rules have a stochastic component. In particular, it includes genetic algorithms, scatter search, Particle Swarm Optimization, and some neural networks. It emerged as a unification of related computational methods, including cellular automata, which was conceived by Ulam and von Neumann in the 1940s. This unification is not universal. Some researchers cling to the GA metaphor but say that evolutionary algorithms have more operators; some say that the GA population is composed of bit strings, whereas evolutionary algorithms can have a broader range of population structures. It is the concept of evolution dynamics, which is populationbased and stochastic, that distinguishes this class of algorithms from other methods of global optimization. Evolutionary computation uses the evolutionary algorithm as its foundation; typically, it is massively parallel.
Exact penalty function 
A penalty function is exact if there exists a finite parameter value such that its maximum is a solution to the original mathematical program.
Existence of solution 
This pertains to whether a mathematical program has a solution. One source of nonexistence is that the feasible region is empty. Another is that it is unbounded, and it is possible to have a sequence of feasible points whose objective value diverges. In linear programming, these are the only sources of nonexistence. In nonlinear programming, however, we can have asymptotes such that the objective is bounded but cannot achieve its infimum or supremum. See the BolzanoWeierstrass theorem, which is fundamental. For linear programming, see theorems of the alternative.
Expanding subspace theorem 
Let be the subspace spanned by vectors , which are Qconjugate. Let be any initial point in , and let be generated by these conjugate directions with optimal line search:
Then, minimizes on the affine set, .
Explicitly quasiconcave function 
Negative is explicitly quasiconvex.
Explicitly quasiconvex function 
A quasiconvex function that is also strictly quasiconvex. Its main property is that it rules out flat portions that cause the closure condition to fail. (Every convex function is explicitly quasiconvex.)
Exponent matrix 
Powers of variables in posynomials in a geometric program.
Extended reals 
Typically denoted by , this is the set of real numbers augmented with positive and negative infinity, i.e.
Extensional constraint 
For instance, the extensional constraint expresses the intensional constraint where and can take values .
Also called a table constraint.
Exterior penalty function 
A penalty function in which the associated algorithm generates infeasible points, approaching feasibility in the limit towards a solution. An example is to maximize over and let in order to approach the solution to
If, during the penalty iterations, for some finite , then solves the original mathematical program. Otherwise, the idea is that approaches the feasibility condition, , as gets large (though this need not happen without assumptions on and ).
Extrapolation 
The idea of estimating a value by extending information at hand outside its immediate range. In linear programming, an extrapolation estimate of the optimal objective value uses dual price as a slope: . For a sequence , an extrapolation is an estimate of a limit point.
Extreme point 
A point in the closure of a set, say , that is not the midpoint of any open line segment with end points in closure of . Equivalently, is an extreme point of a closed set, , if there do not exist for which . When is a polyhedron of the standard form, , with of full row rank, we have one of the fundamental theorems of linear programming that underlies the simplex method:
is an extreme point of the feasible region if, and only if, is a basic feasible solution.
Extreme ray 
An extreme ray of a closed set is a ray in that cannot be expressed as a (simple) sum of other rays in . For example, the axes, , are extreme rays of the nonnegative vectors in . The ray , however, is the sum of the axes since . The recession direction of an extreme ray is sometimes called an extreme direction.
Extreme value 
An extremum (pl. extrema). A minimum or maximum.
FTRAN 
This applies to the factored system,
where each is an elementary matrix. The recursion is:
In the end, is the solution. Each elementary system is solved as follows. For notational convenience, suppose the system is , and is the distinguished th column of . Then,
This is what is done in the (revised) simplex method, and each elementary solution is a pivot operation.
Face 
A convex subset, say , of a convex set, say , such that for any and in
The set is itself a face of , and most authors consider the empty set a face. The faces of zero dimension are the extreme points of . When is a polyhedron, i.e. , the faces are the subsystems with some inequalities holding with equality: , where and .
Facet 
A face of a convex set, not equal to the set, of maximal dimension. If the set is polyhedral, say , where the defining inequalities are irredundant, then the facets are in onetoone correspondence with for such that the equality is not forced – i.e., there exists in for which . Here is the ith row of .
Facetdefining inequality 
In integer programming, an inequality, , such that is a facet of the integer polyhedron, .
Facility location problem 
Find a location (say xy coordinates) that minimizes a weighted sum of distances to each of several locations. An example is to locate a hospital or school to serve a community defined by population centers. Originally considered by Fermat in the 17th century, there are now many variations of this problem. One variation is the location of an obnoxious facility, such as a garbage dump. In that case the objective is to maximize total (weighted) distance from the given points (there are constraints, such as cost, that require the location to be within some distance of the points).
Factorable function 
This has a recursive definition, relative to a collection of elementary operations (e.g., sum, product) on other factorable functions, starting with some set of primitive functions. For example, starting with the power functions, , for any and , and allowing only the sum operation, we can obtain all polynomials. The polynomial factors into 3 parts.
Factorable program 
All functions (objective and constraints) are factorable. An example is the separable program, provided that each of the univariate functions is factorable. Another is the geometric program, where each posynomial is factorable.
Factored form of basis 
Originally for linear programming, this pertains to writing a basis in the form, , where are factors. Two forms of interest are: elementary factors, where each is an elementary matrix, and LU decomposition: , where is lower triangular and is upper triangular. ( and might be factored, notably into elementary matrices, which are lower and upper triangular, resp.) These have inexpensive updates when changes to an adjacent basis.
Fail first principle 
The fail first principle is an idea of what a variable ordering heuristic should attempt to do in tree search. It suggests that the variable to be assigned next should be the one which is most likely to lead to a deadend. The heuristic aims at proving that the search is in a subtree with no feasible solutions as soon as possible.
One way of operationalizing this principle, is to choose the next variable to assigned to be the variable which is the most constrained. The extent to which a variable is constrained can be measured in different ways. One simple measure being the current size of the domain. Under this measure, the variable which has the smallest domain should be assigned next. Another common measure is to select the variable that appears in the largest number of constraints. More sophisticated estimates exist.
Faithfully convex 
A convex function is faithfully convex if it is constant along some segment only if it is constant along the whole line containing this segment.
Fallacy of averages 
Imagine standing with one foot on a keg of ice and the other in a fire. Your average body temperature may be fine, but the extremes will kill you! The fallacy of averages is the fallacious results you may get when replacing data with their expected values. Formally, the fallacy is stated as – viz., the covariance is not zero. Another form of this fallacy is that (unless is linear). In particular, suppose we have
where is a vector of parameters with some uncertainty. The fallacy of averages suggests that it is a mistake to replace with its expected value for at least 2 reasons: (1) members of may be correlated, and (2) the average values of and need not equal the functional values at the mean.
Farkas lemma 
This result gives an alternative system, of which there are many. Farakas' Lemma states that exactly one of the following is consistent,
An often used variant is
Fathom 
Arises in directed tree search. A node is fathomed if it is determined that no completion from this point could produce a better solution than one already obtained. This could happen by bounding the objective, or it could happen by determining there is no feasible solution with the partial specifications corresponding to the node.
Feasibility map 
The pointtoset map that maps a righthand side to a feasible region:
Feasible 
A point is feasible if it satisfies all constraints. The feasible region (or feasibility region) is the set of all feasible points. A mathematical program is feasible if its feasible region is not empty.
The term feasible doesn't imbue other properties such as convexity or connectedness. For example, a feasible region to a nonlinear program could be , which is the disjoint union .
Feasible direction 
Given a feasible point , a direction vector, say , is feasible if there exists such that is feasible. The method of feasible directions is designed to choose a feasible direction at each iteration, along which (as becomes positive) the objective value improves. Such directions exist for continuous mathematical programs (where the line segment is feasible for some ) unless is a local optimum. (Note: with nonlinear constraints, there is typically no feasible direction according to this (classical) definition. See the generalized reduced gradient method.)
FermatWeber problem 
See facility location problem. .
Fibonacci search 
This finds the maximum of a unimodal function on an interval, , by evaluating points placed according to a fibonacci sequence, . If there are points in the interval, only evaluations are needed. In the continuous case, we begin with some interval of uncertainty, , and we reduce its length to . The ratio, , is the key to the placements.
Here is the method for the continuous case:
 Initialization. Let and . Evaluate and and set .
 Iteration. If , reduce the interval to (i.e., set ), decrement to , and set and . If , reduce the interval to (i.e., set ), decrement to , and set and .
The fibonacci search method minimizes the maximum number of evaluations needed to reduce the interval of uncertainty to within the prescribed length. For example, it reduces the length of a unit interval to 1/10946 = 0.00009136 with only 20 evaluations. In the case of a finite set, fibonacci search finds the maximum value of a unimodal function on 10,946 points with only 20 evaluations, but this can be improved – see lattice search.
For very large , the placement ratio approaches the golden mean, and the method approaches the golden section search. Here is a comparison of interval reduction lengths for fibonacci, golden section and dichotomous search methods. In each case is the number of evaluations needed to reduce length of the interval of uncertainty to . For example, with 20 evaluations dichotomous search reduces the interval of uncertainty to of its original length (with separation value near ). The most reduction comes from fibonacci search, which is more than an order of magnitude better, at . Golden section is close (and gets closer as N gets larger).
Evaluations Fibonacci Dichotomous Golden section N 1/F_N 1/2^(N/2) 1/(1.618)^(N1) ======================================================================== 5 0.125 0.25 0.146 10 0.0112 0.0312 0.0131 15 0.00101 0.00781 0.00118 20 0.0000914 0.000976 0.000107 25 0.00000824 0.0002441 0.0000096 ========================================================================
Fibonacci sequence 
The numbers satisfying: , with initial conditions . As shown in the following table, the fibonacci sequence grows rapidly after , and becomes astronomical for ,
N F_N ============== 0 1 1 1 2 2 3 3 4 5 5 8 6 13 7 21 8 34 9 55 10 89 20 10946 50 2.0E10 100 5.7E20 ==============
Named after the 13th century mathematician who discovered it, this sequence has many interesting properties, notably for an optimal univariate optimization strategy, called fibonacci search.
Fillin 
Arises in the context of sparse matrices subjected to certain operations. In particular, a basis may be factored in a product of elementary matrices to represent Gaussian elimination. The nonzeroes that appear in positions that were not in the original matrix are called fillin coefficients. An example of a matrix that has no fillin for this factorization is one that is lower triangular. In that case the factors appear as:
On the other hand, a lower triangular matrix could cause fillin for some other factorization, such as the Cholesky factorization. A completely dense matrix also has no fillin since there were no zeroes to begin with. Here is an example of fillin, taking the original order of rows and columns in the product form:
where is fillin.
Firstorder conditions 
This descends from classical optimization, using Taylor's Theorem. For unconstrained optimization, this is simply that . (In variational calculus, it is the EulerLagrange equation.) For constrained optimization, the firstorder conditions of a regular mathematical program is given by the Lagrange Multiplier Rule.
Fixed charge 
A cost that is some value, say , regardless of the level as long as the level is positive; otherwise the fixed charge is zero. This is represented by , where is a binary variable. If , the fixed charge is 0; if , the fixed charge is . An example is whether to open a plant or not . To apply this fixed charge to the nonnegative variable , the constraint is added to the mathematical program, where is a very large value, known to exceed any feasible value of . Then, if (e.g., not opening the plant that is needed for ), is forced by the upper bound constraint. If (e.g., plant is open), is a redundant upper bound. Fixed charge problems are mathematical programs with fixed charges. (See Myths and Counterexamples in Mathematical Programming to avoid a misconception.)
Fixed point 
The fixed point of a function, , is an element such that . Of a pointtoset map, , we instead have that . The study of fixed points has been at the foundation of algorithms. Moreover, forcing substructure.
Fixed variable 
A variable whose value is fixed, either explicitly or implicitly. Explicit fixing arises for convenience of scenario design, particularly in linear programming, and during an algorithm's progression, particularly in integer programming. Implicit fixing occurs from a forcing substructure.
Fleet mix problem 
To determine how many of each type of aircraft should be in a fleet to meet demands and minimize total cost. Here is an integer linear program model:
where is the number of aircraft of type available; is the capacity of aircraft type for mission ; is the least number of missions of type that must be flown; is the greatest number of missions of type that must be flown. The variables are are the number of aircraft of type in the fleet, and is its maintenance cost. If the aircraft must be purchased, binary variables are introduced, as , with a fixed charge, , in the objective . There could be additional constraints, such as a budget on total purchases or on total maintenance .
Floor 
This is the integer rounddown of a value, :
Examples: , , . Also, see ceiling.
Flow augmenting path 
This arises in the FordFulkerson labeling algorithm to find a maximum flow in a network with arc flow bounds, and . Given a flow, , from a source to a sink, a flow augmenting path, relative to , is a path from that source to the sink such that
along all forward arcs, and
along all backward arcs.
The flow, , is a maximum flow from the source to the sink if and only if there is no flow augmenting path relative to . If there is a flow augmenting path, it can be changed along the path such that total flow increases (by at least 1 if all data are integer).
Forced equality 
This is an inequality constraint that must hold with equality in every feasible solution.
Forcing substructure 
A subsystem of the constraints and the variables in them that forces some value. For example, the following system is a forcing substructure because each variable, which is required to be nonnegative, is forced to be zero.
Forward checking 
Forward checking is a propagation procedure that guarantees that at each step of the search, all the constraints between already assigned variables and not yet assigned variables are arc consistent.
Formally, let be a binary constraint network and such that for all . is forward checking consistent according to the instantiation on iff is locally consistent and for all , for all , for all , is arc consistent.
Forward substitution 
The recursion to solve a nonsingular lower triangular system, . It starts with , then
for .
Forward transformation 
See FTRAN.
Forward triangularization 
An algorithm to rearrange a matrix by recursively assigning a singleton row to the front (with its only column, as its pivot). The recursion applies to the submatrix defined by omitting the assigned row and column (and reducing other row counts accordingly). This results in a lower triangular rearrangement if and only if such a rearrangement exists.
FourierMotzkin elimination 
A procedure by which a variable is eliminated in a system of linear inequalities. The remaining inequalities, which include some new ones, has a solution if and only if the original system has a solution. Eventually, this reduces to one variable or an inconsistency is determined during the elimination. The one variable can be assigned any value in its range, then a backtracking procedure can be executed to obtain values of the other variables, which were eliminated. Originally presented by Fourier in 1826, Motzkin analyzed it in 1933 in light of new developments of the theory of linear inequalities to solve linear programs. Its computational complexity is exponential, and it gave way to the simplex method.
Fractional program 
Objective and/or constraint functions are sums of ratios of the form , where typically over some domain . The case of one term is special, and the linear fractional program has affine numerator and denominator. The linear 1term case, (where over the feasible region), is equivalent to solving a parametric linear program:
FrankWolfe Theorem 
If a quadratic function is bounded from below on a nonempty polyhedron, it attains a minimum there. In mathematical notation, we are given that is a nonempty polyhedron. Then, if there exists such that for all , there exists such that for all .
Free variable 
A variable with no (explicit) sign restriction. (The term is generally used in linear programming because the standard form has x >= 0.)
Fritz John conditions 
These extend Carathéodory's conditions to deal with inequality constraints:
 Suppose is in
, where
and are in smooth. Then, there exists multipliers
, not all zero, such that
Fuzzy CSP 
In a fuzzy constraint satisfaction problem each constraint and instantiation is assigned a degree of satisfaction , which is a value in the [0,1] real interval indicating the degree that is satisfied by . If this value is 1, is satisfied and if this value is 0, is violated. In the most common interpretation of fuzzy CSPs, the task is to find an instantiation for which the minimum of with ranging over all the constraints (i.e. the smallest degree of satisfaction for the instantition ) is maximal.
Fuzzy math program 
Some (or all) constraints are fuzzified by replacing the level set, , with the requirement that , where is a membership function that is given (or derived from primitive membership functions on the level sets of each , and is a parameter in to be set for each instance of the model. Typically (but not necessarily), means that the constraint must hold, and the lower the value of , the more 's are admitted. (This appears similar to the chanceconstraint model of stochastic programming, but it is more closely related to goal programming.) While fuzzy sets are used to represent uncertainty, they can also be used to represent preferences, e.g. a requirement, could mean that it is preferred that be in at least as much as it is preferred that be in .
Fuzzy set 
Given a universe set, , and a membership function, , a fuzzy set is a collection of pairs: . Often, the membership function is subscripted by the set name, say . Generally, for all , , and . In the context of uncertainty, the value is used to model the statement of how possible it is for to be in . For this reason, is sometimes called the possibility function of . What we consider the usual set (without a membership function) is called a crisp set in fuzzy mathematics.
Fuzzy set operations, say on fuzzy sets and , with membership functions and , resp., are defined by the following:
 Union: .
 Intersection: .
 Complement: .
One must be careful when using fuzzy sets to represent uncertainty (which is not the only type of interpretation – see fuzzy mathematical program). In particular, if , its complement is also . Thus, , despite the fact that (in ordinary set theory). Similarly, , despite the fact that . This illustrates the fact that need not equal even if as crisp sets.
While the fuzzy set is fundamental for fuzzy mathematical programming, other concepts in fuzzy mathematics also apply, such as fuzzy arithmetic, fuzzy graphs and fuzzy logic.
GAC 
See generalized arc consistency.
GAMS 
GAMS. Generalized Algebraic Modeling System.
GRASP 
GRASP stands for Greedy Randomized Adaptive Search Procedures. It is a metaheuristic that uses restart to search many portions of the solution space. Once it is started, some Heuristic is employed to reach a local optimum. The choice for the local search can be problemspecific, such as using an nopt neighborhood for the TSP, or it can use another metaheuristic, like simulated annealing or tabu search.
Game theory 
In general, a (mathematical) game can be played by one player, such as a puzzle, but its main connection with mathematical programming is if there are at least two players, and they are in conflict. Each player chooses a strategy that maximizes his payoff. If there are exactly two players and one player's loss is the other's gain, the game is called zero sum. In this case, a payoff matrix, , is given where is the payoff to player 1, and the loss to player 2, if player 1 uses strategy i and player 2 uses strategy j. In this representation each row of corresponds to a strategy of player 1, and each column corresponds to a strategy of player 2. If is , this means player 1 has m strategies, and player 2 has n strategies. Here is an example of a payoff matrix:
A primary connection with mathematical programming is duality. Suppose and are strategy probability vectors chosen by the two players, i.e. in a particular play, is the probability player 1 chooses strategy i and is the probability player 2 chooses strategy j. Then, the value, of the game is the expected payoff:
where player 1 seeks to maximize and player 2 seeks to minimize . A pure strategy is a unit vector. For example, if , then player 1 is using his first strategy 100% of the time; if , player 2 is using his second strategy 100% of the time. The expected payoff is simply the cell entry, , which is 5 in the example (player 2 pays 5 units to player 1). The strategies that are not pure are called mixed. For example, means player 1 plays each of the two strategies 50% of the time (flipping a coin on each play); means player 2 plays his first strategy 25% of the time and his second the remaining 75%. The expected payoff is 5/8.
The fundamental duality theorem of linear programming, see the supplement on mathematical programming duals,
provides the equivalent primaldual formulation:Player 1 is  Player 2 is  
Primal  Dual  




This is the simplest 2person game; more generally, there can be n persons, and subsets of the players can form cooperative agreements. If this is not allowed, the game is called noncooperative.
A combinatorial game is a cooperative game with restrictions on the formation of coalitions that induce a combinatorial structure, such as a matroid.
Gauge function 
A nonnegative, convex function, from to the extended reals that is positively homogeneous and . A norm is a gauge function. Here is an example that is not a norm:
GaussJordan elimination 
A method to solve that performs elementary row and column operations on to annihilate successive elements of in order to reduce to an identity matrix. On paper, the same operations are applied to , then the solution is obtained by solving the resulting identity system. In a computer, the matrices effecting the elementary operations are saved as elementary matrices, say for the ith operation. Then, the system is equivalent to , and forward transformation is applied to solve for . This is what is done in the (revised) simplex method, and each iteration is a pivot operation.
GaussSeidel method 
An iterative method to solve using the latest values of the components of during the updates. Specifically, is split into , where is lower triangular, and the update solves. (Surprisingly, history indicates Gauss did not know about this method, and Seidel did not recommend it.) Often people refer to the method to mean the use of latest values, rather than the specific splitting. It arises, for example, in parallel computation, where using the latest value of during the computation of is not done since it would not be able to take advantage of the parallelism.
Gaussian elimination 
A method to solve that performs elementary row operations on to annihilate successive elements of in order to reduce to an upper triangular matrix, . On paper, the same operations are applied to , then the solution is obtained by solving the resulting upper triangular system. In a computer, the product of the matrices effecting the elementary row operations is a lower triangular matrix, , with unit diagonal. Once this phase is completed, the system becomes . This is then solved in two steps: forward substitution solves ; then backward substitution solves . (Of course, computer implementations vary.)
Generalized Lagrange multiplier method 
The Generalized Lagrange Multiplier method (GLM) solves a sequence of Lagrangian optimization (relaxation) problems, searching the multiplier space by some method to seek a minimum of the maximum Lagrangian function.
 Given with , find .
 Test if 0 is in the convex hull of . If so, stop ( solves the original mathematical program if 0 is in ; otherwise, search alternative optima; if no maximum generates 0 as a member of , then the righthand side (0) is in a duality gap).
 Given , choose to reduce the value of .
 Column generation – maximizing the Lagrangian generates a column in the primal of the randomized program.
 Cutting plane – maximizing the Lagrangian generates a cutting plane in the dual of the randomized program.
 DantzigWolfe decomposition – the master problem is the randomized program; the subproblem is maximizing the Lagrangian.
 Lagrangian duality – GLM solves the Lagrangian dual; (a duality gap occurs when there is no optimal pure strategy in the randomized program).
 Randomized program – This is what GLM solves, and a mixed strategy could have meaning in an application.
Generalized arc consistency 
An extension of arc consistency to constraints with more than two variables in their scope.
A variable is generalized arc consistent (GAC) with a constraint if every value of the variable can be extended to all the other variables of the constraint in such a way the constraint is satisfied.
Generalized arc consistency is one of the most commonly enforced forms of consistency in constraint programming.
GAC has also been called: hyperarc consistency and domain consistency.
Generalized equation 
This has the form: 0 is in , where is a function from into and is a point to set map that maps into subsets of . If is absent, the condition reduces to an ordinary equation, . Usually, is assumed to satisfy the monotonicity condition:
This includes a variational inequality (and hence the complementarity problem) as a special case.
Generalized inverse 
Suppose is any matrix. is a generalized inverse of if is and . Then, a fundamental theorem for linear equations is:
The equation, , has a solution if and only if for any generalized inverse, , in which case the solutions are of the form:for any
Here is an Example.
The MoorePenrose class additionally requires that and be symmetric (hermitian, if is complex). In particular, if , is a MoorePenrose inverse. Note that , and is a projection matrix for the subspace , since implies . Further, projects into its orthogonal complement, which is the null space of , i.e. for any .
Generalized network 
A network in which the flow that reaches the destination could be different from the flow that left the source. In the incidence matrix, instead of
we have
where . If , there is a loss; if , there is a gain.
Generalized reduced gradient method 
The Generalized reduced gradient method (GRG) is a generalization of the reduced gradient method by allowing nonlinear constraints and arbitrary bounds on the variables. The form is:
where has dimension . The method supposes we can partition such that:
 has dimension (and has dimension );
 the values of are strictly within their bounds: (this is a nondegeneracy assumption);
 is nonsingular at .
As in the linear case, for any there is a unique value, , such that (c.f., Implicit Function Theorem), which implies that . The idea is to choose the direction of the independent variables to be the reduced gradient: , where . Then, the step size is chosen and a correction procedure applied to return to the surface, .
The main steps (except the correction procedure) are the same as the reduced gradient method, changing the working set as appropriate.
Generalized upper bound 
A Generalized upper bound (GUB) is an upper bound constraint on a sum of variables: . A collection of these that do not overlap (i.e., index sets are disjoint) comprise the foundation for the generalized upper bounding technique in linear programming. This is where the summation equation is marked in the data structure, rather than represented explicitly as a row in the LP, and the basis is handled in its reduced dimension to gain computational efficiency in both time and space.
Genetic algorithm 
Genetic algorithms (GAs) are a class of algorithms inspired by the mechanisms of genetics, which has been applied to global optimization (especially for combinatorial programs). It requires the specification of three operations (each is typically probabilistic) on objects, called "strings" (these could be realvalued vectors):
 Reproduction  combining strings in the population to
create a new string (offspring);
Example: Taking 1st character from 1st parent + rest of string from 2nd parent:
 Mutation  spontaneous alteration of characters in a
string;
Example: Just the leftmost string:
 Crossover  combining strings to exchange values,
creating new strings in their place.
Example: With crossover location at 2:
0. Initialize population.
1. Select parents for reproduction and GA operators (viz., mutation and crossover).
2. Perform operations to generate intermediate population and evaluate their fitness values.
3. Select members of population to remain with new generation.Repeat 13 until some stopping rule is reached.
The original GA (1973, by John Holland) used crossover and total population replacement. This means a population with 2N objects (called chromosomes) form N pairings of parents that produce 2N offsprings. The offsprings comprise the new generation, and they become the total population, replacing their parents. More generally, a population of size N produces an intermediate population of N+M, from which Ñ is kept to form the new population. One way to choose which Ñ survive is by those with the greatest fitness values – survival of the fittest.
Geometric convergence 
See convergence.
Geometric mean 
Given a nonnegative and a weight vector, in the simplex, , the geometric mean is the value:
For example, for , the geometric mean of is . A fundamental inequality that provides a foundation for geometric programming is that the geometric mean is never greater than the arithmetic mean:
for all nonnegative and . Further, equality holds only if is constant for all .
Geometric program 
A mathematical program of the form
where each is a posynomial (k=0,...,m). Conventionally, the monomials are indexed sequentially, so that the posynomials appear as:
For example, consider:
Then, (variables), (constraints), (monomials in objective), ( monomial in constraint 1), and ( monomials in constraint 2).
The exponent matrix is the matrix whose ith row contains the exponents of the variables in the ith monomial. The example has 5 rows and 3 columns:
The degree of difficulty is the number of terms in all posynomials () minus the number of independent variables (one less than the rank of exponent matrix). In the above example, the rank of is 3, so the degree of difficulty is 531 = 1. If the last constraint were not present, only the first three rows comprise the exponent matrix, and its rank is 2. In that case, the degree of difficulty is 0. (Using the <a href="second.php?page=duals.html#Geometric">geometric dual</a>, the solution reduces to solving a system of linear equations, which is what motivates the terminology.)
Also see superconsistent.
Since its inception, the posynomial form has been extended to signomials. In that case, the duality need not be strong, as there can be a duality gap. The general, or signomial, geometric program is of the form:
for , where and are posynomials ().
Global Cardinality constraint 
This is a global constraint that is specified on assignment variables and count variables , and enforces that each value is assigned to exactly of the assignment variables, . The global cardinality constraint is a generalization of the all different constraint which requires that every value is assigned to at most one variable.
Global constraint 
A global constraint is a family of scopeparameterized constraints (normally ), where is the size of the scope of the constraint. The relation represented by the constraint is often defined implicitly (i.e., intensionally) via a compact set of characteristics that all satisfying tuples must exhibit. For example, the alldifferent constraint is satisfied by all assignments to its variables where no pair have the same value. Such implicit definitions are then expressed and enforced by the constraint propagator corresponding to the global constraint.
Global constraint may also be explicitly (i.e., extensionally) defined by providing a set of satisfying (or nonsatisfying) tuples.
The most common global constraint are:
Global convergence 
See convergence.
Global optimization 
Generally refers to mathematical programming without convexity assumptions, which are NPhard. In general, there could be a local optimum that is not a global optimum. Some authors use this term to imply the stronger condition there are multiple local optima. Some solution strategies are given as heuristic search methods (including those that guarantee global convergence, such as branch and bound). As a process associated with algorithm design, some regard this simply as attempts to assure convergence to a global optimum (unlike a purely local optimization procedure, like steepest ascent). See the Glover's linearization.
Global optimum 
A solution to a mathematical program (as distinct from a local optimum). When the mathematical program is not convex, there could be (and typically are) many local optima that are not global optima.
Glovers linearization 
Given a binary variable and a linear function in discrete and/or continuous variables in for some set , this linearization reformulates the product with a (free) continuous variable and enforces that by adding four linear inequalities:
where the values and are defined as
and is any relaxation of .
Goal program 
A goal is neither a constraint nor an objective because it is neither a firm requirement nor a function that must be at an extreme value. In English, a goal represents a relation that is desired, but it may be violated if there is adequate compensation. For example, we may have a budget of $1000, and we prefer to operate within it, but if spending just one more dollar yields a major improvement in operations, we would consider spending $1001.
A goal program is put into a standard form mathematical program in two steps:
 Add the variable, , and the constraints, and , to measure the level of violation. (If the goal were , instead of , the added constraint would be , where and are (nonnegative) levels of violation below and above the goal, respectively.)
 Add the penalty term to the objective: , where , and is strictly increasing  i.e., and for some imply .
The resulting mathematical program represents the goal program. If the goal is satisfied, ; otherwise, the penalty term, , reflects "adequate compensation" for the violation.
Golden mean 
The golden mean, or golden ratio is the positive solution to the quadratic equation, , which is , or approximately . This has the proportionality property: , which defines the placements of evaluations for the golden section search.
Golden section search 
This finds an interval, , that contains a maximum of a unimodal function whose domain is , such that the length of the final interval, , satisfies: (where is specified). The golden ratio is denoted below by , which is approximately 0.618.
 Initially compute and , and set
the interval of uncertainty to the original endpoints,
and . In general, we
have and placed in , such
that the distances from the endpoints are the same:
g(ba)  axyb  g(ba)
 Given the interval contains evaluations
and ,
with , compare: if , replace
with ; if
, replace with .
(If , this leaves the
interval , in which case evaluate
.)
One of the following cases prevail:
* * * * axyb axyb : drop : : drop : f(x) > f(y) f(x) < f(y) x=g(ya) y=g(bx) * * axyb : drop : : drop : f(x) = f(y)
The golden ratio is the limit of successive fibonacci numbers: (. Thus, the golden section search approaches Fibonacci search as the number of functional evaluations (N) becomes large.
Gomory cut 
This is an early cutting plane for an integer program. Consider , where and have integer values. For any , let and (i.e., is the fractional part of the linear combination of the equations, and is the fractional part of the same linear combination of the righthand sides). Also, let (fractional part of ). Then, Gomory's cut is:
The vector is chosen such that and the current solution (with ) violates this inequality in the LP relaxation.
Gomory function 
A recursively defined class of functions on to (rational nvectors to rationals) using three operations: (1) nonnegative, rational linear combinations, (2) integer rounding, and (3) maximum. Letting be the integer roundup of , i.e. the least integer not less than , we say is a Gomory function if it is the identity function, , or if it has one of the following forms:
 for some and rational;
 ;
 f;
where each is a Chvátal function.
This arises in integer linear programming in several ways. One fundamental result is that the optimal value as a function of is a Gomory function:
Here is an example:
Note: The above definition assumes minimization; if the ILP is maximization, would be rounddown (i.e., greatest integer not exceeding the value), and condition 3 would be the pointwise minimum.
Gomory group 
The group associated with the corner polyhedron problem:
where corresponds to the nonbasic variables and is the remainder, , where is the determinate of the basis, . The idea is that the nonbasic value chosen in this group, together with the basic levels imputed by the equations , yields an integer solution (but a basic level might be negative).
Gradient 
Vector of first partial derivatives of a function (assumed to be differentiable at least once). Typically denoted , where is the function and is a point in its domain.
Gradient projection method 
A feasible direction method by projecting the gradient into the working surface, . Suppose has full row rank. Then, projects any vector into the null space of : for all . The form of an iteration is , where is the projected gradient, , and is determined by line search. Since , , thus staying in the working surface. (This extends to nonlinear constraints by using the same correction procedure as the generalized reduced gradient method.)
Graph 
In one context, this is the functional value and domain: , where . In another context, this is a (maybe undirected) network. In the former context, see also epigraph and hypograph. In the latter context, the notation is sometimes used to mean a graph with vertex (or node) set and edge (or link) set . We say an edge is incident with the two nodes that define it, and those two nodes are called the endpoints of the edge. The endpoints are said to be adjacent nodes.
An edge whose endpoints are the same node is called a loop. Two edges with the same endpoints are called parallel. A graph is simple if it has no loops or parallel edges. Otherwise, it is called a multigraph. The degree of a node is the number of edges incident to it (counting a loop twice). If two edges have a common endpoint, they are said to be adjacent. A path is a sequence of edges that are successively adjacent. (If the graph is simple, a path can also be represented by the associated sequence of nodes.)
When the edge is directed (or oriented), it is called an arc. When all edges are directed, the graph is called a directed graph, or digraph. With additional properties (e.g., arc weights), this is a network. A path in a digraph can traverse its arcs in either the forward or backward direction. If all arcs are forward, it is a directed path, or dipath.
Here are some special graphs of interest (not exhaustive).
 bipartite. nodes decompose into 2 sets such that every link has its endpoints in opposite sets. This applies, for example, to the standard transportation problem, where there are sources in one set and destinations in the other.
 complete. There is a link between each pair of nodes. A complete graph on n nodes is usually denoted . In the case of a bipartite graph, the term bicomplete is used to mean all possible links are there: between all pairs of nodes in the two sets. A bicomplete graph on sets with and nodes, resp., is usually denoted .
 connected. There is a path between each pair of nodes. If the graph is directed, this divides into:
 strongly connected  arcs must be traversed in forward direction;
 weakly connected  arcs can be traversed in either direction.
 cycle (denoted for nodes). The links are .
 planar. The graph can be drawn in a plane (2D) such that no two links cross (they meet only at incident nodes). This has special properties in multicommodity flows and shortest paths.
 subgraph. such that is a subset of and is a subset of .
 tree. Connected and acyclic. One problem is finding a spanning tree that is optimal in some sense.
 undirected. No link is directed.
Greedy algorithm 
Applies when the optimization problem is to decide whether or not to include some element from a given set. A greedy algorithm begins with no elements and sequentially selects an element from the feasible set of remaining elements by myopic optimization. (The elements could have been sorted by some criterion, such as associated weights.) This results in an optimal solution to the problem if and only if there is an underlying matroid structure (for example, see spanning tree). Further details are in the supplement, Greedy Algorithms.
Groebner basis 
This arises in parametric integer (linear) programming, varying the righthand side . A precise definition is involved, stemming from ideal theory in abstract algebra, and there are many other applications outside of integer programming. In the context of ILP, a Gröbner basis is a minimal set of change directions, , called a test set, such that the objective function improves along those directions. To elaborate, suppose we have
where and have integer elements. Let and (i.e., is feasible and bounded). A Gröbner basis for , holding fixed, is a set, , for which the following conditions hold:
 .
 (for minimization).
 For and , there exists such that.
Condition 1 says that if , and condition 2 says . Therefore, if is any feasible solution to either is optimal for a particular righthand side , or condition 3 ensures there is an improvement by adding some vector, .
Here is a numerical example.
Group problem 
Solving the corner polyhedron problem from the perspective of the Gomory group.
Haar condition 
See randomized program.
Hadamard inequality 
For an matrix ,
Halfline 
A halfline is a collection of points of the form , where is in such that (some state , without loss in generality). Note the halfline is rooted at the origin  see ray for translation.
Halfspace 
Consider the hyperplane, (where ). This partitions into two (closed) half spaces: and . The two associated open halfspaces are and .
Hard constraint 
A constraint is hard if it has to be satisfied in any feasible solution. Compare with soft constraint.
Hausdorff metric 
This metric arises in normed spaces, giving a measure of distance between two (point) sets, and . Given a norm, let be the distance function between a point and a set:
Define
Then, is the Hausdorff metric. In words, this says that for each , let be its closest point in . Then, maximize this distance among all .
For example, let be the interval, , and let be the interval, . The Hausdorff distance between these two intervals is:
Hessenberg matrix 
An upper Hessenberg matrix is a square matrix with for . A lower Hessenberg matrix is one whose transpose is upper Hessenberg.
Hessian 
The matrix of second partial derivatives of a function (assumed to be twice differentiable). This is often denoted, where is the function and is a point in its domain.
Heuristic 
In mathematical programming, this usually means a procedure that seeks an optimal solution but does not guarantee it will find one, even if one exists. It is often used in contrast to an algorithm, so branch and bound would not be considered a heuristic in this sense. In AI, however, a heuristic is an algorithm (with some guarantees) that uses a heuristic function to estimate the "cost" of branching from a given node to a leaf of the search tree. (Also, in AI, the usual rules of node selection in branch and bound can be determined by the choice of heuristic function: bestfirst, breadthfirst, or depthfirst search.)
Heuristic function 
A heuristic function is used to guide a heuristic search, much lika a human would do. In this sense, it is a rule of thumb. Formally, in AI, a heuristic function is defined on the state of a search tree and is used in the evaluation of nodes for expansion in what is sometimes called a bestfirst directed tree search strategy.
Heuristic search 
In mathematical programming, this is any (purposeful) search procedure to seek a solution to a global optimization problem, notably to combinatorial programs. In AI, this is a (partially) informed search (vs. blind search), using a heuristic function for guidance.
Two types of (blind) search methods are:
 Breadthfirst: expanding a node of least depth;
 Depthfirst: expanding a node of greatest depth (requiring backtracking).
A specific class of local heuristic search algorithms is the greedy algorithm. Another type is the 1Opt for the TSP.
Here are heuristic search strategies that are based on some biological metaphor:
 Ant colony optimization, based on how ants solve problems;
 Genetic algorithm, based on genetics and evolution;
 Neural networks, based on how the brain functions;
 Simulated annealing, based on thermodynamics;
 Tabu search, based on memoryresponse;
 Target analysis, based on learning.
Hirsch conjecture 
If the basis of one basic feasible solution differs from another by k columns, there exists k feasible pivots to go from the first to the second. This was eventually shown to be false for general polyhedra and is an open problem for bounded polyhedra. It was posed by Warren M. Hirsch in 1957.
Holder inequality 
Holder's inequality states
where denotes the norm, , and . Equality holds if and only if for some scalar . (Note: this is the CauchySchwartz inequality if .)
Homogeneous function 
A homogeneous function of degree p satisfies
It is positively homogeneous if we restrict . When the degree is not specified (even by context), it is generally assumed to be 1. For example, is homogeneous, is homogeneous of degree 2, is not homogeneous, and is positively homogeneous on the positive elements of .
Homotopy 
A continuous transformation from one mapping to another. One example is the convex combination: . As varies from 0 to 1, the function changes continuously from to .
Hungarian method 
An algorithm to solve the standard assignment problem (presented by H. Kuhn in 1955). Let C be the cost matrix.
 Step 0 (initialize). Subtract the least element of each row from that row of . Then, do likewise for each column. The resulting matrix, has a zero in every row and column. (Further, a least cost assignment for is also a least cost assignment for .) Redefine .
 Step 1 (cover zeros). Draw the minimum number of lines through the rows and columns to cover all zeros in . If that minimum is , you can assign to such that ; then you can remove row and column , repeating the process to obtain an optimal assignment. Otherwise, if that minimum is greater than , continue.
 Step 2 (revise ). Select the minimum uncovered element. Subtract this from each uncovered element and add it to each twicecovered element (i.e., to those with both a horizontal and vertical line intersecting). Return to step 1.
Example:
Job _ 1 2 3 4 5 _    1 2 3 4 7  1  7 4 2 6 5  2 C =  2 4 3 1 4  3  2 3 1 2 3  4  5 4 3 2 4  5 _ _
After subtracting min row elements:
_ 1 2 3 4 5 _    0 1 2 3 6  1  5 2 0 4 3  2  1 3 2 0 3  3  1 2 0 1 2  4  3 2 1 0 2  5 _ _
After subtracting the minimum column elements:
_ 1 2 3 4 5 _      00235 1  5 1 0 4 1  2  1 2 2 0 1  3  1 1 0 1 0  4  3 1 1 0 0  5 _    _
The minimum number of lines to cover all zeros is 4: a horizontal line through row 1 and vertical lines through columns 3, 4 and 5. The minimum uncovered element is 1. Subtract from each uncovered element and add to each twice covered element (cells (1,3), (1,4) and (1,5)):
_ 1 2 3 4 5 _    0* 0 4 5 7  1  4 0* 0 4 1  2  0 2 2 0* 1  3  0 0 0* 1 0  4  2 0 1 0 0*  5 _ _
The minimum cover is now 5, by drawing a line through each column. A minimum assignment is indicated by *. For example, we assign person 1 to job 1, person 2 to job 2, person 3 to job 4, person 4 to job 3, and person 5 to job 5. (There are alternative optimal assignments.)
Hypergraph 
A set of nodes (or vertices), say V, plus a set of edges, say E, such that each member of E is a subset of V. When each member of E has exactly 2 nodes, [V,E] is a graph. The hypergraph is a convenient mathematical way to describe relations that involve more than two objects (nodes).
Specific cases:
 An IIS hypergraph: each node represents an inequality and each edge represents an IIS.
 A common graphical representation of nonbinary constraint networks is as a hypergraph, where nodes represent variables and edges group variables together that belong to the same scope.
Hyperplane 
An affine set of dimension , i.e. the set , where is a nonzero vector called the normal of the hyperplane.
Hypograph 
Abbreviated . The hypograph of over is , which is the region on or below the graph of .
Implicit enumeration 
Applied to integer programming, this is a systematic evaluation of all possible solutions without explicitly evaluating all of them. For example, suppose a feasible solution is at hand with objective value 100. Now suppose we solve a relaxation, such as fixing some of the variables and solving the resulting linear program, and we obtain a value of only 90. If we seek a maximum, we can ignore all possible solutions having the fixed values in the relaxation since they must all have an objective value of at most 90. This is often said to be equivalent to branch and bound; however, the early versions of these methods were related, but different. The early branch and bound was a bestfirst (even breadthfirst) heuristic search strategy. Early implicit enumeration schemes were depthfirst.
Implicit function theorem 
Suppose , where , and is in smooth. Further, suppose we can partition the variables, , such that is mdimensional with nonsingular at . Then, there exists for which there is an implicit function, , on the neighborhood, such that for all . Further, is smooth with .
Implied constraint 
An implied constraint is a redundant constraint that (typically) improves propagation if it is added to a constraint model.
Implied equality 
Implied equality. An inequality constraint, say , such that for all feasible .
Inactive constraint 
A constraint that is not active (at a point).
Incidence matrix 
A matrix, say , that represents the incidence of edges or arcs to nodes in a graph or network. In the undirected case, is binary valued: if edge has endpoints and , then and for . In the directed case, the entry indicates the tail: if the arc is directed from to , and .
Inconsistent 
A system of inequalities and/or equations for which there is no feasible solution.
Incumbent solution 
The current best solution found during an algorithmic search procedure. Typically in (mixed) integer programming this refers to the best known feasible solution in the branching tree.
Independent set 
Given a graph, , an independent set (also called a stable set) is a subgraph induced by a subset of vertices, , plus all edges with both endpoints in , such that no two nodes in the subgraph are adjacent. A maximal independent set is one such that adding any node causes the subgraph to violate independence  the added node is adjacent to some node already in the set. Given weights, for , the weight of an independent set, , is the sum of the weights in . The maximum independent set problem is to find an independent set of maximum weight. This is equivalent to finding a minimum weight vertex cover, say , since is an independent set. (One can also refer to an independent set of arcs (or edges) as a subgraph with no two arcs (or edges) adjacent.)
Indicator function 
A function indicating membership in a set, say . Such functions are usually denoted or and are evaluated as
Inequality 
A relation of the form or . Such constraints are typical of mathematical programs. With equality allowed, these are called weak inequalities; strict inequalities are and .
 Here are particular inequalities in this glossary:
 CauchySchwarz
 Hadamard
 Hölder
 Jensen
 Kantorovich
 Minkowski
 Triangle
Also see variational inequality.
Inequality of degree k 
An inequality, usually linear, with variables. This arises in integer programming, where the variables are binary. In particular, an inequality of degree 2 can represent a precedence constraint, and it has special computational advantages in the node packing problem.
Infeasible 
Not feasible.
Inference dual 
See it in the list of particular duals.
Infimum 
(abbr. Inf). The greatest lower bound on a (realvalued) function over (a subset of) its domain. If f is unbounded from below, Inf{f} = infinity, and if the domain is empty, Inf{f} = infinity. Otherwise, suppose L is any lower bound: f(x) >= L for all x in X. L is a greatest lower bound if, for any e > 0, there exists x in the domain for which f(x) <= L+e. (That is, we can get arbitrarily close to L in the range of f.) Note that the infimum is always defined, and its range is in the extended reals. The infimum is the minimum, if it is attained by some point in its domain.
Infinite program 
An infinite [dimensional] program is a mathematical program with an infinite number of variables and constraints (also see semiinfinite program). Generally, this is approached as an abstract program.
Inner approximation 
This solves a mathematical program by a sequence of approximations whose feasible regions are contained in the original feasible region. One example is the use of a barrier penalty method. Another example is polyhedral annexation.
Integer equivalent aggregation 
Integer equivalent aggregation is a reduction of a system of linear algebraic equations with nonnegative integer solutions to a single equation, which is a linear combination of the equations of the system and has the same set of nonnegative integer solutions. For example, consider the system:
By simply adding the equations, we have the equivalent description:
Both sets consist of two the points and .
Integer polyhedron 
The convex hull of the feasible region of a linear integer program: .
Integer program 
Abbreviated IP. The variables are required to be integervalued. Historically, this term implied the mathematical program was otherwise linear, so one often qualifies a nonlinear integer program vs. a linear IP. Also see mixed integer program and combinatorial program.
Integer rounding 
Rounding a noninteger solution to an integer neighbor. This will generally not yield an optimal solution to an integer program  see Myths and Counterexamples in Mathematical Programming.
Interior penalty function 
Same as barrier function.
Interior point method 
A family of algorithms that stays in the strict interior of the feasible region, possibly through the use of a barrier function. The term grew from Karmarkar's algorithm to solve a linear program. In that case, the resulting solution (if it exists) is strictly complementary, which defines the (unique) optimal partition.
Interior solution 
An optimal solution to a mathematical program that is in the relative interior of its set of optimal solutions. This arises in interior point methods, and relates to the strict complementarity of the solution.
Interpolation 
An estimate of a value between two others. In LP, an interpolation estimate of the optimal objective value, say , is , where and are righthand side vectors.
Intersection cut 
See convexity cut.
Inventory balance equation 
The constraint that says that the amount of inventory in the next time period must equal the current inventory plus what is produced or purchased minus what is lost or sold. Let be the inventory at the beginning of period , with given. Then, the inventory equation is:
where is the level of production (or somehow acquired), and is the level of sales (or somehow consumed). Typically, , but if it is called a loss factor, and if it is called a gain factor.
The language used is for the inventory control in the production scheduling problem, but this has become a standard system of equations that appears in many mathematical programs. Thus, the meaning of the variables can be substantially different. One example is the
steel beam assortment problem.Inventory control problem 
See production scheduling problem.
Inverse problem 
Given a point in the decision space, find parameter values that render it optimal. For example, suppose we have the LP: . Let be a basis from , and we ask for for which this basis is optimal. This has a simple solution. Let and partition and conformally, so and Then, the set of for which the associated basic solution is optimal is the following cone:
A more difficult inverse problem is when there is some target value for the parameters. We might, for example, fix and seek to minimize subject to , where is a target value for .
The problem can be combinatorial. We might want to minimize for some norm where 's are costs on the arcs or edges of a network. The solution at hand might be a spanning tree or a TSP tour. We might also impose constraints on directly, such as simple bounds.
The general inverse problem may thus be stated:
where is the target, is some norm, is a nonempty, closed set, which generally does not contain , and is the feasible region for , given .
There are variations, such as when only a partial solution is given. An example is a partial TSP tour or some partial scheduling of jobs, and we ask for cost values for which there is an optimal
solution that contains these subsets.
Involutionary property 
Of a transformation, it means something is its own inverse. In linear programming, the dual of the dual is the primal.
Irreducible infeasible subsystem 
Abbreviated IIS. A subsystem of inequalities and equations such that the subsystem is inconsistent and every proper subsystem is consistent.
Irredundant 
A system of constraints is irredundant if it contains no redundant constraint.
Isoperimetric problem 
Among all closed plane curves with a given perimeter find one that maximizes the area. This is also known as Queen Dido's problem and serves as a classical problem for variational calculus. Its significance in mathematical programming is that it led to Lagrange's multiplier theorem. Restricting shapes to rectangles, the problem is:
where is the perimeter. For positive and and multiplier , Lagrange's multiplier conditions requires and , so , which means the solution is the square.
Isoquant 
Same as Contour.
Isotonic function 
An order preserving function , i.e. for any and in
Jacobian 
For a transformation, , such that is differentiable, the Jacobian is the determinant, . Historically, the number of variables equals the number of variables, say , so is . The Jacobian matrix is defined as the matrix, , allowing the number of functions to be less than the number of variables.
Jamming 
Slow convergence, perhaps to a nonsolution point  see zigzag phenomenon.
Jensen inequality 
Let be convex on , and let be a random variable with expected value, . Then, . For example, let and . Suppose is normally distributed with mean zero and standard deviation . Then,
Jeroslow formula 
This arises in mixed integer linear programming as an extension of Gomory functions. Consider the following standard form
where ( is the number of rows). Let be the set of bases of such that there exists to satisfy the dual conditions: and . Let denote the floor (or rounddown) function, and define . Let be the set of integer linear combinations of vectors spanned by a dual feasible basis, :
Let , and define
Let G be a Gomory function on . A Jeroslow formula (or function) for given is:
Job scheduling/sequencing 
See scheduling jobs and sequencing problems.
Kconsistency 
A consistency notion in constraint programming.
Let be a CSP.
 Given a set of variables with , a locally consistent instantiation on is kconsistent iff for any kth variable there exists a value such that is locally consistent.
 The CSP is kconsistent iff for any set of variables, any locally consistent instantiation on is kconsistent.
For example, 3consistency ensures that any instantiation any pair of variables can be extended to an instantiation involving any third variable without violating any constraint. It is equivalent to path consistency. Similarly, 2consistency is also known as arc consistency.
Kopt 
See nOpt.
Kantorovich inequality 
Let be a symmetric, positive definite matrix, with eigenvalues in , and let . Then,
Its significance in mathematical programming is in convergence analysis: it bounds the convergence rate of Cauchy's steepest ascent.
Karmarkar algorithm 
An interior point method for linear programming, which began a series of variations (e.g., see affine scaling). The original algorithm applies to the system , for , i.e. x is an element of the ndim. simplex, and assumes (so is a feasible solution). It further assumes the optimal objective value is known to be zero. (All of these assumptions have been relaxed in subsequent developments.)
Here are the basic steps of Karmarkar's (original) algorithm, given , and .
 Form and
 Project: Project to null space of :
 Normalize Normalize the ascent direction: If c*=0, terminate since the solution is optimal.)
 Move Move in projected space to , where is a fixed step size (= 1/4).
 Project Project back into xspace: .
Kernel of basis 
See basis.
KleeMinty polytope 
This is an example to show that the elementary simplex method does not have polynomial time complexity. The polytope is
Maximizing
over this polytope shows the exponential complexity.
Knapsack problem 
An integer program of the form,
where . The original problem models the maximum value of a knapsack that is limited by volume or weight , where is the number of items of type put into the knapsack at unit return , that uses units per item.
The group knapsack problem has this form, except that it pertains to Gomory's corner polyhedron problem for general integer programming.
The multidimensional knapsack problem has more constraints (e.g., volume and weight), , where with and .
KuhnTucker conditions 
Abbreviated KT and KKT (latter credits Karush, who had the conditions in his earlier thesis).
This is the Lagrange Multiplier Rule, which Kuhn and Tucker published as an extension to allow inequality constraints. They also introduced the notion of a constraint qualification (which had been only the linear independence condition stemming from Lagrange's multiplier theorem). The most important contribution that sets KT apart from similar discoveries is the connection with saddle points that led to Lagrangian duality.
KuhnTucker point 
Abbreviated KKT point. A feasible point that satisfies the KuhnTucker conditions.
LINDO 
A solver using a simplified problem specification for Linear, Integer, Nonlinear and Dynamic optimization. See [1]
LINGO 
A modeling language companion to LINDO.
LPL 
A Linear Programming Language
LU decomposition 
A factorization of a matrix into a lower triangular matrix and an upper triangular matrix so that . The advantage is to solve the system , we first apply forward substitution to solve , then backward substitution to solve .
Label correcting algorithm 
Arises in labeling algorithms for shortest path problem. Each iteration a label is set to an estimate of the shortest path from a given node. All labels become exact values at termination.
Label setting algorithm 
Arises in labeling algorithms for shortest path problem. Each iteration a label becomes the actual shortest path from some node. (Termination occurs when the destination node(s) are permanently lablelled.)
Labeling algorithm 
A class of algorithms for pathrelated problems on graphs and networks. To illustrate, consider a network with arc values .
 The FordFulkerson max flow labeling algorithm ( is capacity).
 Input is source , destination , and capacities . Output is the max flow from to .
 Initialize all flows to be zero and assign the label to the source. All other nodes are unlabeled, and all nodes (including the source) are unscanned.
 Labeling process: Select any labeled, unscanned node, say with label . If node is an unlabeled neighbor and , assign the label to node , where . If node is an unlabeled neighbor and , assign the label to node , where . (In either case, define node as labeled, but unscanned, while node is now labeled and scanned.) Repeat this until either the sink is labeled and unscanned, or until no more labels can be assigned. In the latter case terminate; in the former case, go to the flow change step.
 Flow change: Let the sink have label . If is , replace by ; if it is , replace by . In either case, apply the flow change ( or ) along the path back to the source. The result is a new flow that is more than it was. Erase labels, except on the source, and go to the labeling process.
 Dijkstra's shortest path algorithm ( is distance).
 Input is source , destinations , and distances . Output is a shortest path from to each node in .
 Initialize labels: for and . Set .
 Label: select (terminate if ). Then, update , and for : if , set
 Terminate when is a subset of (i.e., all destination nodes are labeled with their shortest distance from node ).
 A generic shortest path labeling algorithm from all nodes to destination node
( is distance).
 Input is destination and distances ). Output is a shortest path from each node to .
 Initialize labels, is the estimate of a shortest path length from node to node , with
 Label: Find an arc, say , that violates the distance equations, say . If none, terminate; otherwise, update the label: .
The labeling step is repeated until there are no violations. At termination, is the length of a shortest path from node to node .
Another type of labeling algorithm occurs in simplicial subdivision, used to compute a fixed point that represents an economic equilibrium.
Lagrange conditions 
The optimality equation (plus feasibility conditions) in Lagrange's multiplier theorem.
Lagrange multiplier 
Lagrange multiplier rule 
From the extension of Lagrange's multiplier theorem. Suppose
where , , and are smooth. Then, there exist multipliers for which the following conditions hold:
 ;
 ;
 .
Since the last condition, given , is equivalent to complementary slackness. These are considered firstorder optimality conditions, though the Lagrange Multiplier Rule is not always valid  see constraint qualifications.
For extensions see the Generalized Lagrange multiplier method.
Lagrange multiplier theorem 
Let and be smooth functions and suppose has full row rank. Then, only if there exists such that:
The th component of , is called the Lagrange multiplier associated with the th constraint, . Extensions to remove the rank condition and/or allow inequality constraints were by Carathéodory, John, Karush, and Kuhn & Tucker. Also see the Lagrange Multiplier Rule.
Lagrangian 
For the mathematical program
for and . Note that the Lagrange Multiplier Rule can be written as the firstorder conditions for to be a saddle point of . In Lagrange's multiplier theorem (where and is vacuous), this is simply that , which could be any type of stationary point.
Lagrangian duality 
See it in the list of particular duals.
Lagrangian relaxation 
The removal of constraints and putting them into the objective as in the Lagrangian function. Specifically, suppose we have
The Lagrangian relaxation is
where ( is unrestricted).
This is a relaxation of the original mathematical program because the constraints have been removed (expanding the feasible region), and for feasible (i.e., satisfying these constraints), the relaxed objective satisfies:
because and
The objective is the usual Lagrangian function when is simply . It provides a foundation for Lagrangian duality and the Generalized Lagrange Multiplier Method.
Note that could retain some constraints, such as in the separable form . Now suppose and Then, the Lagrangian relaxation decomposes into independent mathematical programs for any multiplier value:
Lagrangian saddlepoint equivalence 
The tuple in , with , is a saddlepoint of of the Lagrangian if, and only if, the strong duality properties hold:
 and ( is feasible)
 ( and satisfy complementary slackness)
See the supplement on Lagrangian Saddle Point Equivalence for further information.
Lattice 
A set that contains and for all and in .
Lattice program 
Minimizing a subadditive function on a lattice.
Lattice search 
This finds the maximum of a unimodal function on a finite set, , by evaluating the function at points placed by the modified fibonacci sequence: . This modification comes from the improvement obtained by dropping the inferior end point after an evaluation: if the set is in and , the new interval is dropping the point from the remaining search set.
The method proceeds the same as fibonacci search, except the placements follow the modified fibonacci sequence, which is equivalent to , reflecting the improvement that accumulates as increases, for the extra point dropped after each evaluation. The placement ratio, , also approaches the golden mean, so lattice search also approaches the golden section search.
Some refer to fibonacci search when they mean lattice search. The following table shows the difference. For example, with 20 evaluations, fibonacci search can guarantee finding the maximum on a set of 10,946 points, while lattice search can find it on a set of 17,710 points. Golden section is included for comparison as well. With 20 evaluations it can find the maximum on a set of only 9,349 points.
n 5 10 15 20 =================================== F_n 8 89 987 10946 K_n 12 143 1596 17710 golden 6 76 843 9349 section ===================================
Lattice search minimizes the maximum number of evaluations needed to find the maximum on a given set of points. If the number in that set is , the number of evaluations is . If the number in the set is not exactly some modified fibonacci number, one can fill in dummy points at the end of the set. For example, if let the set be , where . Then, the (minimum) number of evaluations to guarantee finding the maximum of any unimodal function is .
Level set 
For the function the level set is . Sometimes, the 0level set is called simply the level set of f.
Levenberg Marquardt algorithm 
This is designed for solving nonlinear programs using the equation:
where is the Hessian. For unconstrained optimization, the solution serves as a direction vector for the iteration. This is also used in a trust region method. The parameter is set to give a balance between Newton's method and Cauchy's steepest descent . A low value of helps get through difficult landscape curvature, and a high value yields some descent.
Lexicographic Ordering constraint 
is a global constraint that allows the comparison of sequences of variables. Given two sequences and of variables, and , let denote the lexicographic ordering constraint. The constraint holds iff or or and .
Lexicographic order 
A nonzero vector is lexicographically positive if its first nonzero coordinate is positive. The vector is lexicographically greater than the vector if is lexicographically positive, and this defines the lexicographic order in This is a total ordering in that every two vectors are either equal, or one is lexicographically greater than the other.
This was first used in mathematical programming to resolve cycling in the simplex method. It also provides a way to obtain solutions for multiple objectives with the property that is a Pareto maximum if is lexicographically greater than or equal to for all feasible .
Lifting 
In integer programming, it is creating a valid inequality for the original set, given one for a lower dimensional set, by augmenting a binary variable that is absent from the original inequality. That is, suppose is a subset of and is a valid inequality for (where it does not matter what is since ). Then, under certain conditions, the inequality is valid for . The "conditions" are problem dependent, and is set as large as possible.
For example, suppose is a valid inequality for . A lifting is the strengthening, , if it can be ascertainted that this is a valid inequality for . The coeffiecient of ( in the example) is determined by considering how large can be with . (In this case, the lifted inequality is valid if is valid for and if implies ) The motivation for doing this arises in a branch and cut algorithm strategy. At a node in the search tree we might have a valid inequality, , when the branch had , and we want to make it valid even if . The inequality is valid for all
Line search 
Optimizing the objective along a line: , where is a given point and is a given direction vector ( is a scalar). A coarse tolerance could be used, so "Opt" is not entirely accurate. (In fact, there are reasons to believe not optimizing is best.)
Here are some particular search methods in this glossary:
Line segment 
The line segment with end points and in is
which is denoted and is the closed line segment. The open line segment is
Linear combination 
The vector is a linear combination of vectors if , where is called the coefficient of .
Linear convergence 
See convergence.
Linear program 
An optimization problem in which the objective and constraints are linear. Forms include and . The standard form assumes has full row rank. Computer systems ensure this by having a logical variable augmented, so the form appears, for example, as (also allowing general bounds on the variables). The original variables are called structural. Note that each logical variable can be a slack, surplus, or artificial variable, depending on the form of the original constraint. This computer form also represents a range constraint with simple bounds on the logical variable. Some bounds can be infinite (i.e., absent), and a free variable (logical or structural) is when both of its bounds are infinite.
Linearity interval 
When a univariate function is piecewise linear, it has the form for in the interval , where is not equal to . (The phrase usually means the function is continuous.) This arises in linear programming when considering the optimal value as a function of varying righthand sides (or cost coefficients) in fixed proportions: (or ), where is an admissible change vector and is the (scalar) variable. Then, for the range of where the LP has a solution, the optimal value function has this piecewise linear (continuous) form, and the intervals that comprise the domain are called linearity intervals.
Linearization 
The strategy of linearization is to reformulate a nonlinear program as a linear progarm through the introduction of auxiliary variables and constraints that are used to circumvent the nonlinearities. See standard linearization, Glover's linearization, and ReformulationLinearizationTechnique
Lipschitz continuous 
The function is Lipschitz continuous if there exists a value , called the Lipschitz constant, such that for all and in . This relation is called the Lipschitz condition. It is stronger than continuity because it limits the slope to be within . The generalized Lipschitz condition is that there exists a monotonically increasing function, with the property that as such that there exists for which for all and in
Local convergence 
See convergence.
Local optimum 
A feasible point that is an optimal solution to the mathematical program whose feasible region is the intersection of the original region and some neighborhood of .
Locally convex function 
There exists a subset of the domain, such as the neighborhood of a point, such that the function is convex on that subset.
Location problem 
See facility location problem.
Lockbox problem 
A firm wants a check that it receives to clear as quickly as possible (so it can use the money). To reduce the number of days to clear a check, the firm opens offices, called lockboxes, in different cities to handle the checks. There is a cost to open a lockbox, and there are average annual losses calculated from interest rates and the number of days to clear a check sent from source to lockbox city .
Logical variable 
See linear program.
Lot size problem 
This is one of the oldest mixedinteger programs in operations research, first presented by Wagner and Whitin in 1958. The problem is to minimize cost while satisfying product demands over (discrete) time. Let be the number of units produced in period , for ( is called the planning horizon), and let if a setup occurs in period ; otherwise. Let be the demand in period , and let the demand from period to period inclusive, be Then, a MILP formulation is:
(This is preferred over the MILP that defines inventory balance equations. Even though they are equivalent, the LP relaxation of the above is sharper.)
Lower semicontinuity 
Suppose . Of a function, is lower semicontinuous if
Of a pointtoset map, , let be a neighborhood of the set : for each , there exists such that for all , is a subset of . A pathology to show that the feasibility map may not be lower semicontinuous is:
where if is in , and if
Lower triangular matrix 
A square matrix, , such that for .
MAX CSP 
A constraint satisfaction problem where some constraints may be violated, and the quality of the solution is measured by the number of satisfied constraints (i.e., the optimal solution will have the maximum of satisfied constraints).
MIMI 
Manager for Interactive Modeling Interfaces  A modeling system that integrates mathematical programming, database management and artificial intelligence.
MINOS 
Modular Incore Nonlinear Optimization System  An optimization system, available from Stanford Optimization Lab.
MODLER 
Modeling by Object Driven Linear Elemental Relations
MOSEK 
A software package for solving largescale sparse linear, conic and convex optimization problems.
MPL 
Mathematical Programming Language.MPL.
Main Page 
Mathematical Programming Glossary©
This glossary contains terms specific to mathematical programming and related disciplines like economics, computer science, and mathematics. A thread of terms is constructed by following other terms that are defined within those already displayed. Terms can be removed from a thread by clicking . A new thread is initiated by following the title of one of the thread's terms.
The following are of general interest (you should read them if this is your first time here).
 General Information  A list of dictionaries, suggested methods of citation, and contribution instructions.
 Acknowledgments  We are thankful to those who have helped.
 The Nature of Mathematical Programming  See this for basic terms and a standard form of a mathematical program that is used throughout this glossary.
 Notation  Read this to clarify notation.
 Supplements  A list of supplements that are cited by entries.
 Myths and Counterexamples in Mathematical Programming, by Harvey Greenberg, ver. 6, 2010  Some common and uncommon misconceptions.
 Tours  Collections of Glossary entries for a particular subject.
 Editorial Board.
Please remember this is a glossary, not a survey, so no attempt is made to cite credits.  ∴ 
Manpower planning problem 
This has many variations, but typically focuses on either shortterm problems, like scheduling workers, or on longterm problems, like recruiting. There are workers of different categories, some relate to each other by level (e.g., copilot and pilot), others do not. For longterm manpower planning, inventory balance equations are typically used to represent changes in the number of employees in each category over time, including new hires and various forms of departures. For the shortterm scheduling, a simple model is the shortest path through a network. Often, more complicated combinatorial programs are required.
Marginal price 
The rate at which the optimal objective value changes with respect to a perturbation of a righthand side, like a supply, demand or capacity limit. When it exists, the marginal price is often equal to a most pessimistic dual price (e.g., consumer's marginal price is the greatest dual price, which reflects what another unit of demand would cost to satisfy).
Markov decision process 
A stochastic dynamic program, whereby for each policy the resulting state variables comprise a Markov process (a stochastic process with the property that the conditional probability of a transition to any state depends only on the current state, and not on previous states).
Matching problem 
Given a graph, a matching is a subgraph with the property that no two edges are incident to the same node. Subject to certain restrictions, the problem is to find a feasible matching that optimizes some objective, such as minimizing the total cost of the edges in the matching. A perfect matching is when each node is incident with exactly one edge.
The assignment problem is a classical perfect matching, whereby the graph is bipartite (nodes are two sets: people and jobs), and the matching must have all nodes (every person must do a job, and every job must be done). Then, the matching corresponds precisely to an assignment since each job node is incident with exactly one person node in the matching. Further, the cost of each such matching is precisely the total cost of the corresponding assignment, so this mincost matching problem is the same as the assignment problem.
Another type of matching problem, called maximum cardinality matching, is to find a matching with the greatest number of nodes, and this is also solvable in polynomial time. However, minimum weight – maximum cardinality is NPcomplete.
Mathematical program 
Commonly an optimization problem of the form
where is a subset of and is the domain of , and which map into real spaces. The function is called the objective function, which is typically realvalued. If not, then maps into with and the problem is a multiple objective problem. The feasible region is the collection of that simultaneously satisfy in and which are the program's constraints.
Matrix norm 
See norm.
Matroid 
This is an abstraction of independence that includes vectors in linear algebra and circuits in graphs. First, we need some preliminary definitions. Let be a finite set, and let be a collection of subsets of . is an independence system if in implies every subset of is in . Elements of are called independent sets, and the remaining subsets of are called dependent sets. An independent set, say , is maximal if is not in for any . The maxcardinality independent set of any subset of , say is given by:
A matroid is an independence system in which for any subset of , say , every independent set in that is maximal in has cardinality . The definition essentially means that maximal and maximum are the same. In other words, a system is a matroid if, and only if, a greedy algorithm yields a globally optimal solution to the associated max weight problem. An example is the spanning tree.
MaxMin 
(or maximin). The objective is, itself, the value of another mathematical program:
Then, the objective is to maximize f(x) (on some feasible region).
Max flowmin cut theorem 
The maximum flow through a (single commodity) capacitated network from a specified node, called the source, to another node, called the sink, equals the value of the minimum cutset. Originally proven directly from principles of networks, this was discovered to be a special case of the [[LP]]uality Theorem of Linear Programming.
Max flow problem 
In a network, there are two special nodes in a source and a sink . The problem is to maximize the flow from to subject to conservation of flow constraints at each node and flow bounds on each arc. The max flow labeling algorithm provides a constructive proof of the Max flow  Min cut theorem.
This can also be obtained from the duality of the following linear program:
for ;
for
are upper bounds (capacities) on arc flows, and each sum is for such that the indicated arc is in
Maximal 
In a partially ordered set, a maximal element is one for which no element follows it in the ordering. This is not to be confused with maximum. For example, consider the following knapsack problem:
The maximum value is , with Define the partial ordering to be the ordinary greaterthanorequalto, so that means and either or . In words, means we can put at least one more item into the knapsack, in which case we would increase our return. When we cannot do so, is a maximal element. In particular, is a maximal element, and its value is , which is not the maximum.
Another definition pertains to a subset with respect to some property such that no proper superset has that property. In the 01 knapsack problem with variables, we define the subset of items taken (for any solution) as Then, we define as maximal if no item can be added without violating the limit, i.e., for all not in
Maximand 
The objective function in a mathematical program whose sense of optimization is to maximize.
Maximum 
(pl. maxima). A feasible point at which the supremum is achieved. (See Weierstrass' Theorem.) An maximum is within of being a maximum: , where is the supremum and
Maximum principle 
Necessary conditions for a solution to the (constrained) problem of variational calculus, given in terms of nonlinear differential equations. Generally credited to Pontryagin (1962), it was derived as an extension of the EulerLagrange conditions for variational calculus, and later was derived from dynamic programming.
Maxmin 
(or maximin). The objective is, itself, the value of another mathematical program:
Then, the objective is to maximize (on some feasible region).
Memetic algorithm 
This a populationbased approach for heuristic search in optimization problems. It uses the crossover operation of genetic algorithms, but combines it with local search heuristics. Implementations are typically parallel with an MIMD architecture.
Metaheuristic 
This is a general framework for heuristics in solving hard problems. The idea of ``meta is that of level. An analogy is the use of a metalanguage to explain a language. For computer languages, we use symbols, like brackets, in the metalanguage to denote properties of the language being described, such as parameters that are optional. Examples of metaheuristics
are: Ant Colony Optimization
 Genetic Algorithms
 Memetic Algorithms
 Neural Networks
 Particle Swarm Optimization
 Scatter Search
 Simulated Annealing
 Tabu Search
 Target Analysis
Besides parameters that define specific algorithms with each framework, these metaheuristics also require some concept of representation, which is problem dependent. Also see GRASP.
Method of centers 
This is an interior method defined for a mathematical program without equality constraints with a nonempty strict interior. Considering define
Observe that if, and only if, and for all . Then, the algorithm map is
Metric 
A nonnegative, realvalued function, , on pairs from a set such that for each and in
The function is also called a distance; the pair is a metric space. Condition (3) is called the Triangle inequality. A space is metrizable if a metric can be defined. A metric is induced by a norm if one exists: See Hausdorff metric.
Minconflicts heuristic 
The minconflicts heuristic is a local search or heuristic repair method (see heuristic search) for solving CSPs. This heuristic randomly selects a variable in the scope of a violated constraint and assigns it to the value in its domain that minimizes the number of violated constraints. If there are more than one such value, it chooses randomly among them.
Minimal 
In a partially ordered set, a minimal element is one that does not follow another in the ordering. This is not to be confused with minimum.
Another definition pertains to a subset with respect to some property such that no proper subset has that property. In the 01 knapsack problem the set of items not taken is minimal with respect to the greedy algorithm of adding the most valuable item, if it fits, until there are no more items than can fit. This does not necessarily produce a maximum objective value, but the converse is certainly true: in order that be an optimal solution, must be minimal with respect to adding items that fit. See maximal for analogy and numerical example.
Minimal inequality 
In integer programming, a valid inequality is minimal if it not dominated by any valid inequality. Originally, this was limited to not being able to decrease any coefficient and remain valid. For example, suppose is a valid inequality. Then, if we decrease the first coefficient to obtain either this is not valid or it dominates the former, rendering it nonminimal.
More generally, suppose is a valid inequality, and we consider such that and for some positive such that If is a valid inequality, it dominates the original one because
For example, dominates (use ), so if this is valid, cannot be minimal. Every facetdefining inequality is minimal, but not conversely.
Minimand 
The objective function in a mathematical program whose sense of optimization is to minimize.
Minimax 
The objective is, itself, the value of another mathematical program:
Then, the objective is to minimize (on some feasible region).
Minimax theorem 
Proven by von Neumann in 1928, this is a cornerstone of duality (and of game theory). It states that there exists and such that is a saddlepoint of the bilinear form:
This extends to the following.
Let be such that and are nonempty, convex, compact sets, is convex on for each , and is concave on for each . Then, there exists a saddlepoint, such that
Minimum 
(pl. minima). A feasible point at which the infimum is achieved. (See Weierstrass' Theorem.) An minimum is within e of being a minimum: where is the infimum and .
Minkowski inequality 
The Minkowski inequality is
where denotes the norm and . Equality holds if, and only if, for some scalar,
Mixedinteger program 
A mixedinteger program (MIP) is a mathematical program in which some of the variables are required to be integervalued. Historically, this term implied the mathematical program was otherwise linear, so older papers (and some recent ones) mean this, but most now refer to the following:
 MILP: Mixed integer linear program
 MINLP: Mixed integer nonlinear program
 MIQP: Mixed integer quadratic program
Modified Newton method 
The Newton method is designed to find the root of a function, say , by the algorithm map This need not converge, so a modification is to use line search, resulting in the algorithm map:
where More generally, we could have another step size rule; as long as it is chosen to converge, the modified algorithm is sometimes called the damped Newton method.
Monoid 
A set, , defined on rational vectors, such that: (1) , and (2) if are in , then is in The monoid is integral if it contains only integer vectors. One can think of a monoid as a discrete analogue of a convex cone.
Monotonic function 
A function that either never decreases or never increases. A nondecreasing, or isotonic, function satisfies: whenever (it is strictly increasing if for ). A nonincreasing, or anatonic, function satisfies: whenever (it is strictly decreasing if for ). This extends to a vector function, where is in :
Monte Carlo optimization 
A class of algorithms that seek a maximum by sampling, using a pseudorandom number generator. One example is simulated annealing.
MoorePenrose inverse 
See generalized inverse.
More for less paradox 
This was originally introduced in the context of linear network flows:
It is possible to send more flow from supply to demand nodes at lower cost, even if all arc costs are positive.
See Myths and Counter Examples in Mathematical Prgramming for a nonnetwork example.
Mosel 
A modelling and solving environment and language, which was designed to fit with XpressMP.
Multicommodity flow 
A network flow problem in which more than one commodity share arc capacities. This applies to communication networks as well as to shipments of different grains on barges of limited capacities. It is more complicated than the singlecommodity flow, generally NPcomplete (except for some special cases). See [1] for one of the complexities.
Multistage decision process 
See dynamic programming and the recourse model in stochastic programming.
Multilevel program 
The "level" refers to sets of variables. A bilevel program has two sets:
A reason for identifying levels is to apply a decomposition principle for algorithm design. One example is the bilinear program. Another is when one set of variables is constrained to be a solution to an inner optimization problem:
where is some subset of .
Multiple objectives 
Headline text
The problem has more than one objective function. Since these do not lie in a totally ordered set, a solution is often defined as an efficient point (sometimes called a Pareto optimum): is feasible, and there does not exist another feasible point, , such that and for some , where indexes the objective functions, and we assume we are maximizing. This reduces to the usual definition of an optimum when there is only one objective.
There have been two principle approaches to solving a multiple objective mathematical program (MOMP):
 weighted sum: Maximize , where
 lexicographically: and for This results in for which is lexicographically greater than (or equal to) any feasible solution. (Note: the order is fixed in advance.)
Both methods yield an efficient point (if one exists). Under certain assumptions, both methods can be used to generate the set of all efficient points, called the efficient frontier. (Care must be exercised in doing so, however; for example, see MOP Myth #2 for issues that arise when varying the weights of the objectives.)
Mutation operation 
See genetic algorithm.
Myopic optimization 
Given any partial solution, assign another value that improves the objective the most. (Algorithms that produce solutions based on myopic optimization are called greedy algorithms.)
NOpt 
This is a heuristic for the TSP, though it can apply more generally to a problem that seeks a subgraph with all nodes. It is a local search, which considers a move by removing n edges from a trial solution, then replacing them with some other n edges. In TSP, the 2Opt neighbors of the tour are pairwise exchanges. The set of exchanges, however, is larger than the set of 2Opt neighbors. For example, consider so the original tour is The six pairwise exchanges of this permutation are:
(2,1,3,4): (1)*(2) (3,2,1,4): (1)*(2) \ *  * \/   /\   / * *  (4)*(3) (4)*(3) (4,2,3,1): (1) (2) (1,3,2,4): (1) (2) * *  * *   \/   \/   /\   /\  */ \ * */ \ * (4) (3) (4) (3) (1,4,3,2): (1)*(2) (1,2,4,3): (1)*(2)  * * /   \/   /\ *  * \ (4)*(3) (4)*(3)
We see duplicates due to the equivalence of any cyclic permutation. Also, there are only two 2Opt moves if the TSP is symmetric:
(1)(2) (1) (2) \ / \ /  \/  \/  /\  /\  / \ / \  (4)(3) (4) (3)
For the replacement is unique  that is, once two edges are chosen for removal, there is only one replacement. This is not true for and part of the heuristic is to decide what replacement to make.
NPcomplete 
Problems are divided into two categories: those for which there exists an algorithm to solve it with polynomial time complexity, and those for which there is no such algorithm. We denote the former class of problems by There are problems for which no known algorithm exists that solves it in polynomial time, but there is also no proof that no such algorithm exists. Among these problems that are not known to be in (or in ), there is a subclass of problems known as NPcomplete: those for which either all are solvable in polynomial time, or none are. Formally, a problem is NP if there exists an algorithm with polynomial time complexity that can certify a solution. For example, it is not known whether there exists a polynomial algorithm to solve a system of Diophantine equations, (integer nvectors). However, we can certify any trial in polynomial time, just by checking that it is in then multiplying by to compare with A problem, is NPcomplete if it is NP and for any problem in NP, there exists a polynomial time algorithm to reduce it to A fundamental member of the NPcomplete class is the satisfiability problem. It is an open question whether and most regard the NPcomplete problems as having exponential time complexity. Further details are in the supplement, Introduction to Computational Complexity.
NPhard 
An optimization problem that relies upon the solution of an NPcomplete problem. In that sense, NPhard problems are at least as hard as NPcomplete problems.
 Here are some NPhard problems:
 Bin packing
 Covering, Cutting stock
 Knapsack
 Packing, Partitioning, Pooling
 Traveling Salesman
 Vehicle routing
Near optimal 
A point that is within a small value, say of optimality in some sense. The following are the common measures of nearness:
 in value: feasible and
 in policy:
 in resource level: where
Nearest neighbor algorithm 
A type of greedy algorithm for combinatorial programs where there is measure of nearness between neighbors. An example is the traveling salesman problem, where the procedure is to choose the next city to be one that is nearest the current city in the sequence.
Negative definite matrix 
for all nonzero
Negative semidefinite matrix 
for all
Neighborhood 
For a normed space, the neighborhood of a point, is the open ball centered at where The closed ball, with is also a neighborhood. The usual notation (in this glossary) is and context dictates whether it is open or closed. This extends to the neighborhood of a set by taking the union; equivalently,
In integer programming, the neighborhood could mean integervalued neighbors of the form In a combinatorial program, where variables are binary, integervalued neighbors comprise all members of In this case the neighborhood is defined relative to some subset of binary variables reachable by some operation that depends on the problem. This is what is generally meant by a neighborhood in heuristic search in general, and simulated annealing or tabu search in particular, where a move is defined in terms of neighbors. In that context, a neighborhood could be as simple as complementing the value of one variable, as deletions or additions in a knapsack problem, or it could be more complex, like a [[nOpt]]Opt neighbor of a TSP tour.
NelderMead simplex method 
This is a heuristic search (not guaranteed to find a solution) for unconstrained optimization. Let be points in that define a simplex. Define the extreme values: and Seeking a maximum of the idea is to replace with a point having a better objective value.
 Here is an iteration:
 Define the centroid of the simplex without this point of least value:
 reflection: compute where ( is called the "reflection coefficient").
 expansion: If (i.e., we have a better point), compute where ( is called the "expansion coefficient"). If replaces and the iteration is complete. Otherwise, replaces and the iteration is complete.
 At this step, the reflection point is not better than the worst of the simplex vertices (i.e., ). This is divided into the following cases.
 If replace with
 If define as the better of the two: (with resp.). Then, take a contraction step: where ( is called the "contraction coefficient"). If replaces and the iteration is complete. Otherwise, the last resort is to replace all with:
Network 
A collection of nodes, V, sometimes called vertices, plus a collection of arcs, A, which are directed from one node to another. The two sets form a network, denoted As such, it can be considered a directed graph (see other terms, like special graphs).
 Here are associated functions and data values:
 tail of kth arc is node we sometimes write
 head of kth arc is node we sometimes write
 indegree of node is
 outdegree of node is
 arc capacity limits the total flow across the arc at any one time.
 node capacity limits the total flow through a node at any one time.
 supply or demand at a node provides external input or an output requirement.
Network flows 
This is an assignment of arc values, called flows, say for the kth arc, that satisfy two types of constraints: (1) arc bounds, and (2) node balances, The flow out of node can be expressed as and the flow into node can be expressed as
If is a supply at node (called a supply node); if is a demand at node (called a demand node). If node is simply a transshipment node, and the balance equation says that the flow into node must equal the flow out of node Another way to express the node flow balance equations is with the nodearc incidence matrix:
Still another representation is to define flow variables, on which describes how much flow goes from node to node This can be used as long as there are no parallel arcs  i.e., no two arcs have both the same tail and the same head. (In some applications, parallel arcs are needed, such as to increase capacity across a pair of arcs with an increased cost.) In this form, the capacity constraints are still of the form but the node equations have a different form:
The (linear) min cost network flow problem is to minimize total cost,
where is a unit cost of flow, subject to the flow bounds and balance equations.
Neural network 
(also called artificial neural network, abbr. ANN). A network where the nodes correspond to neurons and the arcs correspond to synaptic connections in the biological metaphor. The following properties are included:
 neural state. Each node has a state variable, say x. In the brain, this could be the potassium level; in computing applications, it could be anything the modeler chooses.
 arc weight. Each arc has a weight that affects the state of its neighboring nodes when firing. If the weight is positive, it said to be excitatory; if it is negative, it is inhibitory.
 state equations. The neural states change by some differential (or difference) equation, say Typically (but not necessarily), is the gradient of an energy function (in keeping with the biological metaphor), say so that follows a path of steepest descent towards a minimum energy state.
 learning mechanism. This could be equations to change the weights: Various learning mechanisms are represented this way, including a form of supervised learning that uses a training set to provide feedback on errors. Other elements can be learned besides the arc weights, including the topology of the network.
New term 
Hello test, Freeman
Newsboy problem 
A newspaper is concerned with controlling the number of papers to be distributed to newstands. The cost of a paper varies (i.e., Sunday vs. daily), and the demand is a random variable, with probability function Unsold papers are returned, with no salvage value the next day, losing millions of dollars in the production cost. The demand for newspapers is a random variable, with probability function = probability that demand equals It is possible, however, for a newstand to order more papers the same day. There are holding and shortage (penalty) costs. The problem is to determine a reorder policy so as to minimize total expected cost. This problem was used to consider a reorder policy with a 2parameter decision rule:
 = inventory level at which an order is placed;
 = inventory level to which to order.
Then, the decision rule to be employed is the following policy:
 Order nothing if the inventory of papers is
 Order if there are s papers on hand and
The significance of this problem is that it was used to introduce the notion (and optimality) of an policy in inventory theory.
Newton method 
(also called NewtonRaphson method). This is the iterative scheme to find a zero of a function:
where
and is the jacobian of (assumed nonsingular). In mathematical programming this is used to find a stationary point, where and Lacking global convergence, this leads to the modified Newton method, sometimes called the damped Newton method. (See the associated myth, [[./myths/mythNLP.html#13Myth NLP13]] to avoid misconception.)
Nofreelunch theorem 
In heuristic search, this is the proposition that all methods perform the same when averaged over all possible objective functions. The idea is that a particular search algorithm, like simulated annealing, may be designed to perform especially well for some functions, compared to a genetic algorithm, but when applied to a representative sample of all possible costs, they will perform exactly the same, on the average. This implies that to do especially well on some problems, the search method must do worse on others; hence, the "nofreelunch" description.
Nogood 
A partial assignment of variables to values that does not lead to a solution. In other words, there does not exist a solution to the overall problem that satisfies all the assignments in the no good.
In a backtracking search to find a solution, each deadend corresponds to a no good. However, where nogoods become useful is when they can be learned and added to the constraint problem as implied constraints that remove many deadends that would otherwise have to be searched over.
In particular, nogood learning and reasoning are very important for modern techniques to solve SAT problems.
Node consistency 
A simple consistency property concerning unary constraints: a variable is node consistent if all values within its domain are consistent with the all unary constraints on the variable.
Node packing problem 
See Packing problem.
Nonbasic 
A variable or column that is not basic.
Nondegeneracy 
A nondegenerate solution is one that is not degenerate. A nondegeneracy assumption is one that ensures every optimal solution is not degenerate. In general, a nondegenerate solution is strictly complementary. In linear programming, nondegeneracy is equivalent to uniqueness in both the primal and the dual.
Nonlinear program 
At least one of the functions (objective or constraint) is not affine on (Usually, it is the rule of the function that classifies it as nonlinear. In particular, a linear integer program is not generally considered to be a nonlinear problem (NLP).
Norm 
This is a function of a vector, say that satisfies three properties:
 Homogeneous:
 Positive: (Note: by homogeneity, so is the unique vector with zero norm.)
 Subadditive:
Norms that arise frequently in mathematical programming are:

Euclidean norm (on ):
L_inf (on ):
L_p
Matrix norm (induced by vector norm):
Supremum norm (on function space):
Normal cone 
For a convex set the normal cone to at for is where is an inner product.
Northwest corner rule 
This constructs a feasible shipment for the transportation problem, from the cost table ( = unit cost from source to destination ), as follows. Start at the NW corner (source 1, destination 1), and allocate the most possible: the min of the supply and demand. If supply exceeds the demand, proceed to the next destination, and continue until all of supply 1 is allocated. Then, go to source 2 and repeat the allocation process, starting with the first (lowest index) destination whose demand has not been fulfilled. If demand exceeds the supply, proceed to the next source, and continue until all of demand 1 is allocated. Then, go to destination 2 and repeat the allocation process. Eventually, all demand must be fulfilled if the problem is feasible (i.e., total supply total demand).
Null space 
Null space of matrix It is the orthogonal complement of its row space, Its dimension is which complements the subspace spanned by its row space.
Numeric CSP 
A constraint satisfaction problem is numeric if the constraints are numeric relations (e.g., log, sin, power, etc.) and the decision variables' domains are continuous.
Objective function 
The (realvalued) function to be optimized. In a mathematical program in standard form, this is denoted Also see multiple objectives.
Online problem 
A decision problem that is ongoing, so its data values are frequently changing with uncertainty. Here are some examples:
 Allocation of bandwidth during video conferencing on a telecommunications network;
 Investment decisions in a portfolio, selling and buying as the market changes;
 Scheduling jobs as they arrive;
 Load forecasting and dispatching for electric power.
In general, an online decision problem could be the allocation of resources that is done in real time with changes in the amount of resources, their costs and the demands for what they produce. A problem that is not online is sometimes called an offline problem.
Optimal 
For a mathematical program in standard form, (the domain) is an optimal solution if it is a maximum (or a minimum):
 is feasible;
 for all feasible (maximum value).
Some authors refer to an optimal solution when they mean a local optimum; others mean a member of the optimality region (which are global optima). In either case, the optimal value is the objective value, evaluated at an optimal solution. A solution is nearly optimal if it is feasible, but the optimality condition (2) is replaced by
where ( corresponds to being optimal). Typically, is specified as a small fraction, such as a cutoff tolerance for an algorithm to terminate finitely.
Optimal partition 
Consider a primaldual pair of linear programs:
Define primal surplus variables, and dual slack variables, For any nonnegative vector, define its support set as those indexes for which the coordinate value is positive: In an optimal solution, complementary slackness must hold, so that implies and implies Thus, the intersections, and are empty. If the solution is strictly complementary, these support sets yield partitions because then and A strictly complementary solution is an interior solution (and conversely), so each interior solution yields a partition of these index sets.
A key fact is that every LP with an optimal solution must have an optimal interior solution. Further, every optimal interior solution yields the same partition, so it is called the optimal partition.
Optimal response function 
The optimal value of the objective as a function of parameters, typically the righthand side:
Optimality gap 
Generally the difference between a best known solution, e.g. the incumbent solution in mixed integer programming, and a value that bounds the best possible solution. One such measure is the duality gap. The term is often qualified as the absolute gap, which is the magnitude of the difference between the best known solution and the best bound, or as the relative gap, which is the absolute gap divided by the best bound.
Optimality region 
Set of optimal points.
Optimization Software Library 
(OSL). A library of optimization routines, callable from Fortran or C, produced by IBM.
Optimum 
(pl. optima). Minimum or maximum.
Order of convergence 
See convergence.
Orthogonal complement 
Orthogonal complement of a subspace, In particular, if where is an matrix, its orthogonal complement is
Orthogonal matrix 
A matrix, such that
Orthogonal vectors 
Two vectors in whose inner product is zero.
Outofkilter algorithm 
Applies to network flows, where the balance equations hold every iteration, but bounds and duality (optimality) conditions could be violated. The dual prices, often called potentials in this context, are modified along with flows so as to move closer to both primal and dual feasibility.
Outer approximation 
This solves a sequence of approximations to a mathematical program where the approximating problem contains the original feasible region. Examples are cutting plane algorithms and relaxation.
Overconstrained problem 
A term generally meaning that a mathematical program has too many constraints in that either the feasible region can be described with a subset of the constraints or that it is empty.
Overoptimize 
A term that means the model is too optimistic in expecting an optimal solution to produce the benefits in its objective. For example, a timestaged model generally presumes perfect information over the planning horizon, and can be "overoptimizing" in its allocation of resources.
Pmatrix 
A square matrix with all of its principal minors positive. This includes all symmetric, positive definite matrices. Here is an example of a Pmatrix that is not positive definite:
The principal minors are positive, but
The significance of this class is in the theorem:
 The linear complementarity problem, defined by has a unique solution for each q in R^{n} if, and only if, is a Pmatrix.
Packing problem 
The set packing problem is the question, "Does a collection of sets contain at least mutually disjoint subsets?" ( is a positive integer not greater than the number of sets given.) This is solvable in polynomial time if no set has more than 2 members. Otherwise, it has the NPhard equivalent integer program: where if element is selected; else, The matrix has 0's and 1's, where the ith row corresponds the ith set to be packed: means element j is in set i; else, The constraint means that at most one element of each set can be selected. If the IP maximum is less than or if it equals number of elements, the answer to the set packing problem is "No". Otherwise, let be among those elements selected (reindex, if necessary). Let be all sets containing Due to the packing constraints, no selected element can be in two sets, so these are disjoint. The remaining set, is composed of all other elements and must be disjoint from the other sets, so the answer to the set packing question is "Yes" (and we can construct the disjoint subsets).
A special case is the node packing problem on a graph: find a set of nodes of maximum cardinality such that no node is incident with the same edge (or arc). In that case, every row of has exactly two 1's, and this is sometimes called [[Degree2_inequality]]egree2 inequalities. Another special case is the bin packing problem.
The IP formulation has the natural extension: where and This allows up to elements to be selected from the ith set, and elements can be weighted by value
Parallel algorithm 
This is an algorithm that executes more than one instruction at a time by having a computing environment with more than one processor. The types of parallel computing environments are:
MIMD:  Multiple Instruction/Multiple Data (includes massively parallel); 
MISD:  Multiple Instruction/Sequential Data (unusual); 
SIMD:  Sequential Instruction/Multiple Data (e.g., array processor). 
Parallel tangents 
(PARTAN). An algorithm developed from the zigzag phenomenon observed using Cauchy's steepest descent. It takes two gradient steps, then performs a line search on the line through the first and last points (For a quadratic objective, PARTAN is equivalent to the conjugate gradient method.)
Parameter 
A constant in a mathematical program, not subject to choice in the decision problem, but one that could vary outside the control of the decisions. Examples are supplies, demands, loss factors, exponents and coefficients in polynomial functions (of the decision variables). Not all coefficients are parameters, as many are zero by the logic of the model. For example, the only data for a standard transportation problem are the costs, supplies and demands. These can depend upon parameters, but the LP matrix does not – it is the incidence matrix of the network. In general, parameters are datadependent constants, rather than logically fixed for all instances of the model. Some parameters are simply units of measurement, such as the amount of energy (Btu) in a ton of coal, whereas some parameters are uncertain, like demand for a product.
Parametric analysis 
The sensitivity analysis of parameters, which can be vectors in any designated region.
Parametric programming 
Solving a family of mathematical programs over a parameter space. For example, consider the family of linear programs:
where t is a (scalar) parameter and d is a specified direction vector. Starting with the LP is solved, then an optimal basis is found (if possible) that remains optimal as is increased. The max value of is determined; if this is finite, the basis changes to a new optimal basis (for the max value) such that can be further increased, if possible. This is continued until either cannot be increased any further or a basis is found that remains optimal on the interval, where are the break points of the optimal objective value as a function of the parameter.
Pareto optimum 
See multiple objectives.
Partial conjugate gradient method 
The conjugate gradient method is performed for some number of iterations, say then it is restarted. (If this is the special case of Cauchy's steepest descent.)
Partial quasiNewton method 
A quasiNewton method is performed for some number of iterations, say then it is restarted.
Partial solution 
Some of the variables are specified. This arises in a branch and bound algorithm, where some variables have been fixed by the branching process. It also arises in bidding algorithms.
Partially ordered set 
(or poset). A set plus a binary relation on that set that is reflexive, antisymmetric and transitive. This arises in the presence of precedence constraints, and other relations that arise in combinatorial programs.
Particle Swarm Optimization 
(PSO). This is an evolutionary algorithm based on swarm behavior of animals, like bird flocking. A move from a state is influenced by directions of states in its neighborhood. A consensus function is used to average neighbors' best fitness values, and this is combined with the original state's fitness to obtain a new position for the state. It applies to continuous variables, using a velocity term to derive state increments. The fundamental equation that governs state evolution is
where p is a particle, or state, and
velocity vector of particle 
position vector of particle 
previous position of particle giving best fitness value 
position of globally best fitness value 
= parameter, called "inertia weight" 
are pseudorandom numbers in 
positive parameters, generally in 
Partitioning problem 
This is the combination of packing and covering. Its IP form is where Opt could be Min or Max, and is a 01 matrix. The equation means that exactly elements must be selected from set In particular, a multiple choice constraint is
Path 
As a (differentiable) curve, see path following. As a finite sequence, see graph or network.
Path consistency 
A consistency property that is similar to arc consistency but that considers pairs of variables instead of just one. A pair of variables is pathconsistent with another variable if for every consistent value pair for the binary constraint on and there exists a value for such that all binary constraints between and are satisfied.
Path following 
In this context a path is a piecewise differentiable curve in space. The idea is to follow such a path (if one exists) from some initial point to a solution. One example of a path following algorithm is the barrier penalty function, with the path created by the parameter in the solution, in where is the strict interior of the feasible region. (This is called the central path, or the path of the analytic center.)
Pattern search 
This is a heuristic (in the sense of not guaranteeing convergence) for unconstrained optimization. It consists of two basic steps, where is a current point and is a vector of widths (both having been initialized).
 Exploration move. Set and do for
 If replace with
 If replace with
 Pattern move. If replace with Otherwise, reduce widths and return to exploration move.
The idea is to perform exploration moves as long as they are successful. Then, saving the last direction when it was successful, this is used to advance if it is an improvement, giving a new base point for the next series of exploration moves. Termination occurs when the widths become too small.
Penalty function 
The traditional concept is to augment the objective with a function and one positive constant, so that the original mathematical program is replaced by solving a parametric family of the form The function, is called a penalty function if it satisfies for not feasible and if is feasible. The set depends on the type of penalty function, and there are two classical types, each requiring to be continuous: interior (or barrier) and exterior. A penalty function is exact if there exists a finite parameter value such that its maximum is a solution to the original mathematical program.
More generally, a penalty function could involve many parameters, such as the Lagrangian, or it could be stated in parameter free form. A general model is the generalized penaltyfunction/surrogate dual, see the supplement on duals. The notion of an exact penalty function leads to a strong dual.
Perturbation 
An imprecise term that means changing some parameter value (or function) and seeing what effect this has on the solution. This embodies continuity and differentiability properties of the optimal responses: objective value, policy set, and perhaps dual values too. This is what is sometimes called marginal analysis of solutions, and changes are limited to some neighborhood of the initial values. Some use perturbation function to mean the optimal objective value, even for large changes in the parameters, as in parametric programming.
Phase I and Phase II 
Phase I of a mathematical program is finding a feasible solution, and Phase II is entered with a feasible solution to find an optimal solution. The standard Phase I is:
Then, for any one could set to have a feasible solution to the Phase I mathematical program. The optimal objective value is 0 if, and only if, and in which case is feasible in the original mathematical program (to commence Phase II). The Phase I problem is sometimes called an elastic program (thinking of the artificial variables as providing "elastic" behavior in the levels of constraint violation).
In linear programming, the standard Phase I & II are:
 I. Min ev: and
 II. Max cx:
Pivot 
This is the algebra associated with an iteration of GaussJordan elimination, using the forward transformation. The tableaux for a pivot on element which means nonbasic variable enters the basis in exchange for basic variable are as follows:
 Before pivot:
After pivot:
A pivot is primal degenerate if the associated basic solution does not change (i.e., the nonbasic variable enters the basis, but its level remains at the same bound value, in which case no basic variable changes level). Similarly, the pivot is dual degenerate if the associated dual solution (i.e., pricing vector and reduced costs) does not change. For dealing with degenerate pivots, see Bland's rule and the TNP rule.
Pivot selection 
In the simplex method, this is a tactic to select a basis exchange. The incoming column is based on its effect on the objective function, and the outgoing column is based on its effect on feasibility.
Pointtoset map 
Image of map is a set. If this means is a subset of for each An example is the feasibility map.
Pointed cone 
A convex cone, such that An example is (so ).
Polar cone 
Polar cone, of a set
Policy iteration 
This is an algorithm for infinite horizon dynamic programs (generally stochastic) that proceeds by improving policies to satisfy the fundamental equation:
where is the maximum expected value when starting in state is the immediate expected return when in state and following an optimal policy (a decision rule), and is probability of a transition from state to state in one time period.
The algorithm has some policy, at the beginning of an iteration. This determines an approximation of which is exact if the equation holds. If the equation is violated, the violation identifies how the policy can be improved. This changes the policy for the next iteration. Convergence depends upon the underlying Markov process (e.g., whether it is ergodic). Another approach is value iteration.
Polyhedral annexation 
This is a cutting plane approach to find an optimal solution known to lie at an extreme point of a polyhedron, The general algorithm is to start at some extreme point and solve the polyhedral annexation problem. This will result in ascertaining that the extreme point is (globally) optimal, or it will generate a recession direction from which a convexity cut is used to exclude the extreme point. The approach generates a sequence of shrinking polyhedra by eliminating the cut region. Its name comes from the idea of annexing a new polyhedron to exclude from the original, homing in on the extreme point solution.
Notes:
 When applied to reverse convex programming, the convex set in the polyhedral annexation problem is the objective level set, where is an optimality tolerance, and is the best (extreme point) solution so far. The initial polyhedron is the feasible region, so this is an inner approximation strategy.
 When applied to mixed integer programming, the original polyhedron is the LP relaxation, and the convex set for the polyhedral annexation problem can vary according to the particular problem (exploiting structure, especially to find facets of the feasible region).
Polyhedral annexation problem 
Given a convex cone a polytope contained in and a compact, convex set with 0 in int find a point in or ascertain that is contained in
Polyhedron 
(pl. polyhedra). A set that equals the intersection of a finite number of halfspaces. This is generally written as where the representation is not unique. It is often useful to separate the implied equalities: so that the relative interior is The system, is a prime representation if it is irredundant, and it is minimal if it is irredundant and contains no implied equality. A polyhedron is degenerate if it contains an extreme point that is the intersection of more than halfspaces (where is the dimension of the polyhedron). An example is the pyramid.
Polymatroid 
Let be a finite set and let be a submodular function on the subsets of with The polymatroid associated with is the polytope:
Polytope 
A bounded polyhedron.
Pooling of inventory 
When there are two or more demand points for a product, it may possible to save money if the separate inventories for these demand points can be combined or "pooled." There is a "square root economy of scale" from pooling for two of the simplest inventory models: a) the Economic Order Quantity (EOQ) model, and b) the Newsboy (NV) model.
For the EOQ case, where we want to minimize the fixed charge of ordering plus the holding cost of inventory carried, if we have two separate inventories for two demand points, each with constant demand rate fixed charge of ordering and holding cost/unit of inventory then the minimum total cost is at each demand point for a total cost of If we combine the two demand streams and carry a single inventory for them, then the minimum total cost is which decreases the total cost by a factor of
For the NV case, if we have two demands, each independently Normal distributed with mean and standard deviation a specified service level at an inventory point, and being the number of standard deviations corresponding to a left tail probability of then we must carry inventory at each inventory point for a total inventory of If we can combine the two demands so we carry just one inventory, then the mean and standard deviation for the combined demands are and Thus, the total inventory we must carry is and there is a square root economy of pooling in the amount of safety stock we must carry. If we say that the system service level should be then the benefits of pooling are even greater.
There is an interesting anomaly wherein pooling can increase inventory. Specifically, for the NV problem, if instead of there being a constraint on service level, the objective is simply to minimize the expected cost of unsatisfied demand and the cost of surplus inventory, and the demand distributions at each of two demand points (with associated inventory) are right skewed, e.g., the mean is greater than the median, then pooling the two inventories may in fact result in a higher total optimal inventory. For example, if the cost/unit short is 5, the cost/unit of inventory left over is 4, the demands at each of two demand points are independent Poisson distributed with mean 3.3, then the optimal amount to stock at each location is 3. If, however, the demands are pooled and a single inventory is carried so the demand is Poisson with mean 6.6, then the optimal amount to stock is 7. So pooling increased the optimal inventory. See Myths and Counter Examples for an additional example.
Pooling problem 
A type of blending problem, as follows.
 Given:
 A set of sources, indexed by Each source supplies raw feeds in limited amounts, such as crude oil.
 A set of attributes, indexed by Each supply contains one or more attributes. An example of an attribute is the percent of sulfur in a crude oil supply.
 A set of products, indexed by for which there is demand; each product has a quality defined by its attributes. For example, if percent sulfur is an attribute, each refined petroleum product has a range of percent sulfur: 1.5% sulfur is a higher quality product than 2.5% percent sulfur.
 A set of pools, indexed by Each pool is formed by inflows from sources, whose attributes are linearly mixed to determine the attributes of the pool, and hence its quality. Pools combine to form products (see Flow of Materials below). For example, suppose a pool receives flow values and from two sources whose attribute values are and respectively. Then, the pool's attribute value is the average:
Flow of Materials ~~~~~~~~~~~~~~~~~ Sources Pools Products S(s,p) P(p,d) SUPPLY >(s)>(p)>(d)>DEMAND
The objective is to minimize cost, which is the sum of source flow costs, subject to four types of constraints:
 Limited supplies:
 Balances at pools:
 Quality constraints:
The numerator is a weighted sum, where the weights (w) are the (given) quality values of attribute a at the sources (s). The denominator is the total flow into the pool (p).  Demand requirements:
Portfolio selection problem 
In its elementary form, this is the same as the capital budgeting problem, except that the objective is to minimize the risk, rather than maximize expected return. Let be the percent of capital invested in the jth opportunity (e.g., stock or bond), so must satisfy and Let be the expected return per unit of investment in the jth opportunity, so that is the sum total rate of return per unit of capital invested. It is required to have a lower limit on this rate: (where is between and ). Subject to this rate of return constraint, the objective is the quadratic form, where is the variancecovariance matrix associated with the investments (i.e., if the actual return rate is then
Positive definite matrix 
The symmetric matrix is positive definite if for all nonzero Equivalently, the matrix is positive definite if all its eigenvalues are positive.
Also see Positive semidefinite matrix
Positive semidefinite matrix 
The symmetric matrix is positive semidefinite if for all
Also see positive definite matrix.
Postman problem 
Posynomial 
A positive sum of monomials: where Each monomial is the product,
and is called the exponent matrix. This is the fundamental function in geometric programming.
 Example: is a posynomial with 2 monomial terms and 3 variables. The exponent matrix is 2 by 3, showing the exponent in each monomial (row) of each variable (column):
Preprocessing 
Generally the same as presolve, but the purpose need not be to speed up optimization. Instead, it could be aimed at deepening one's knowledge about the mathematical program, such as what is forced by implication of the constraints versus what is determined by the economic tradeoff.
Precedence constraint 
When ordering objects, like jobs to to be performed, this is a constraint that restricts the order: i must precede j, denoted If order really means time, and if the model has decision variables and to denote the start times of and resp., this precedence constraint can be written as where is the time job takes. On the other hand, a precedence constraint need not correspond to real time. For example, could mean if project is not selected, we cannot select project . In that case, suppose the model has binary variables and where means project is selected, and means it is not selected. Then, the precedence constraint is represented as:
Preconditioning 
A generalization of scaling that modifies the hessian so as to improve convergence properties. In the case of quadratic programming, this typically means multiplying the quadratic form matrix, by a preconditioner, so that the condition number of is as close to 1 as possible. There are many variations, including splitting a matrix, to achieve a better conditioned matrix in the sense of its eigenvalue (and eigenspace) structure that governs convergence properties of algorithms like steepest ascent and conjugate gradient.
Predictorcorrector algorithm 
This is a class of algorithms whose origin is in ordinary differential equations. One first "predicts" a solution with either a functional approximation or an algorithm, getting "close" to a solution. Then, this is the initial point for a "corrector" method, such as using Newton's method to get "closer" to the curve. Although it could be used to characterize many methods in mathematical programming, it arises most naturally in path following methods. The term began being widely used in connection with following the central path in an interior point method. One such algorithm is using the affine scaling direction as a predictor and centering as a corrector.
Presolve 
A heuristic applied to reduce the problem in some way before starting an algorithm. In linear programming, for example, one might scan for an equation of the form then simply fix at zero, thus reducing the number of variables and constraints. Further details are in the supplement, simplex method, by which each variable is evaluated for its potential to improve the value of the objective function. Let where B is a basis, and is a vector of costs associated with the basic variables. The vector is sometimes called a dual solution, though it is not feasible in the dual before termination. is also called a simplex multiplier or pricing vector. The price of the jth variable is The first term is its direct cost and the second term is an indirect cost, using the pricing vector to determine the cost of inputs and outputs in the activity's column The net result is called the reduced cost, and its value determines whether this activity could improve the objective value.
Other pricing vectors are possible to obtain information about the activity's rates of substitution without actually computing If then and is the special case to get the reduced cost. Another special case is to obtain information about whether the (nonbasic) activity would need to perform a degenerate pivot. For LP in standard from, let if and if Then, Thus, if activity must have a positive tableau value with respect to some basic variable whose level is zero, so the pivot would have to be degenerate.
Pricing 
This is a tactic in the simplex method, by which each variable is evaluated for its potential to improve the value of the objective function. Let where is a basis, and is a vector of costs associated with the basic variables. The vector is sometimes called a dual solution, though it is not feasible in the dual before termination. is also called a simplex multiplier or pricing vector. The price of the jth variable is The first term is its direct cost and the second term is an indirect cost, using the pricing vector to determine the cost of inputs and outputs in the activity's column The net result is called the reduced cost, and its value determines whether this activity could improve the objective value.
Other pricing vectors are possible to obtain information about the activity's rates of substitution without actually computing If then and is the special case to get the reduced cost. Another special case is to obtain information about whether the (nonbasic) activity would need to perform a degenerate pivot. For LP in standard from, let if and if Then, Thus, if activity must have a positive tableau value with respect to some basic variable whose level is zero, so the pivot would have to be degenerate.
Primal degeneracy 
See Degeneracy
Primal method 
An algorithm for which each iterate is feasible in the (primal) mathematical program.
Primal program 
The original mathematical program, cited in the context of having its dual.
Prime representation 
A set represented by an irredundant system of inequalities. (See polyhedron.)
Principle of optimality 
The idea that an optimal path, or trajectory, is composed of optimal subpaths. Equivalently, once we reach a particular state in a sequential decision process, the remaining decisions must be optimal with respect to that state. This is the foundation of dynamic programming.
Problems 
There are various problems that have evolved that are popularly cited by name.
Product form of basis 
Product mix problem 
To determine what mix of products to make from shared resources. For example, if a company has apples and cranberries, one could make a certain amount of apple juice, cranberry juice, cranapple juice, and applecran juice, each differing in the percentages of apples and cranberries. More extensive examples abound to illustrate one of the early, fundamental applications of linear programming.
Production scheduling problem 
To determine levels of production over time. Constraints include demand requirements (possibly with backordering), capacity limits (including warehouse space for inventory), and resource limits. One basic model is as follows.
Let
 level of production in period t (before demand);
 level of inventory at the end of period t;
 production capacity in period t;
 warehouse capacity in period t;
 holding cost (per unit of inventory);
 production cost (per unit of production);
 demand at the end of period t.
Then, the mathematical program is
is the given initial inventory, and is the planning horizon.
The condition that means there is no backordering. Other tacit assumptions, which could be relaxed to gain more scope of the model are as follows.
 Letting quantities be multiple – e.g., level of production of product at time In such cases, competition for warehouse space and other resources result in more equations.
 There could be an ending inventory constraint, such as
 There could be additional decision variables to allow capacity expansion.
 Costs could be nonlinear functions.
 This could be a stochastic program; for example, with demands not known with certainty.
Also see warehouse problem.
Projected gradient method 
A feasible direction is obtained for the region by projecting the gradient of the objective function to the null space of where (note for all so is the null space of ). Thus, is feasible for all feasible and all (since ). Further, if for so the objective value improves each iteration until its projected gradient is null. At that point where satisfies firstorder conditions.
Propagation 
Proper optimum 
A local optimum that is uniquely optimal in some feasible neighborhood.
Pseudoboolean function 
This maps
Pseudoboolean program 
and all functions are pseudoboolean.
Pseudoinverse 
(of a matrix). Same as generalized inverse.
Pseudomonotone function 
is pseudomonotone over (subset of ) if
The gradient of a pseudoconvex function is a pseudomonotone function.
Pseudoconcave function 
Negative is pseudoconvex.
Pseudoconvex function 
is in and satisfies the property: implies While every differentiable