All Pages

From Glossary

Jump to: navigation, search


Let LaTeX: \textstyle f : \mbox{ X } \times \mbox{ Y } \to \mathbb{R}. Then, LaTeX: \textstyle (x^*, y^*) is a saddle point of LaTeX: \textstyle f if LaTeX: \textstyle x^* minimizes LaTeX: \textstyle f(x, y^*) on LaTeX: \textstyle \mbox{ X }, and LaTeX: \textstyle y^* maximizes LaTeX: \textstyle f(x^*, y) on LaTeX: \textstyle \mbox{ Y }. Equivalently,

f(x^*, y) \le f(x^*, y^*) \le f(x, y^*) \mbox{ for all } x \mbox{ in X, } y \mbox{ in Y.}

Von Neumann (1928) proved this equivalent to:

f(x^*, y^*) & = &
\inf \{ \sup \{ f(x, y) : y \in \mbox{ Y } \} : x \in \mbox{ X } \} \\
& = & \sup \{ \inf \{ f(x, y) : x \in \mbox{ X } \} : y \in \mbox{ Y } \}.

A sufficient condition for a saddle point to exist is that LaTeX: \textstyle \mbox{ X } and LaTeX: \textstyle \mbox{ Y } are non-empty, compact, convex sets, LaTeX: f(.,y) is convex on LaTeX: \textstyle \mbox{ X } for each LaTeX: y in LaTeX: \textstyle \mbox{ Y }, and LaTeX: f(x,.) is concave on LaTeX: \textstyle \mbox{ Y } for each LaTeX: x in LaTeX: \textstyle \mbox{ X }.

Saddle point equivalence underlies duality. See also Lagrangian saddlepoint equivalence as well as the associated supplement.

Satisfiability problem

(SAT). Find a truth assignment to logical propositions such that a (given) collection of clauses is true (or ascertain that at least one clause must be false in every truth assignment). This fundamental problem in computational logic forms the foundation for NP-completeness. When SAT is in conjunctive normal form and may not be satisfiable, we define the problem of maximizing the number of true clauses as MAXSAT.


Changing the units of measurement, usually for the numerical stability of an algorithm. The variables are transformed as LaTeX: \textstyle x' = \mbox{S}x, where LaTeX: \textstyle \mbox{S} = \mbox{diag}(s_j). The diagonal elements are the scale values, which are positive: LaTeX: \textstyle s_1, \dots , s_n > 0. Constraint function values can also be scaled. For example, in an LP, the constraints LaTeX: \textstyle \mbox{A}x = b, can be scaled by LaTeX: \textstyle \mbox{RA} x = \mbox{R} b , where LaTeX: \textstyle \mbox{R} = \mbox{diag}(r_i) such that LaTeX: \textstyle r > 0. (This affects the dual values.) Some LP scaling methods simply scale each column of A by dividing by its greatest magnitude (null columns are identified and removed).

Example A column scaling A row scaling
10x + 100y = 500
–30x + .3y = .2
.3333x + y = 500
–x + .003y = .2
x + 10y = 50
–300x + 3y = 2

Another method is logarithmic scaling, which scales by the logarithm of the greatest magnitude. More sophisitcated methods are algorithmic, taking both row and column extremes into account.

Scatter search

This is a population-based metaheuristic, starting with a collection of reference points, usually obtained by the application of some heuristic. A new point is created by taking combinations of points in the population, and rounding elements that must be integer-valued. It bears some relation to a genetic algorithm, except that scatter search uses linear combinations of the population, while the GA ``crossover operation can be non-linear.


(e.g., jobs). A schedule for a sequence of jobs, say LaTeX: \textstyle j_1, \dots , j_n, is a specification of start times, say LaTeX: \textstyle t_1, \dots , t_n, such that certain constraints are met. A schedule is sought that minimizes cost and/or some measure of time, like the overall project completion time (when the last job is finished) or the tardy time (amount by which the completion time exceeds a given deadline). There are precedence constraints, such as in the construction industry, where a wall cannot be erected until the foundation is laid.

There is a variety of scheduling heuristics. Two of these for scheduling jobs on machines are list heuristics: the Shortest Processing Time (SPT), and the Longest Processing Time (LPT). These rules put jobs on the list in non-decreasing and non-increasing order of processing time, respectively.

Other scheduling problems, which might not involve sequencing jobs, arise in production planning.


The scope of a constraint is the set of directly constrained variables. For example, in the constraint LaTeX: x \neq y, the set LaTeX: \{x,y\} is the scope.


Search is a systematic way of trying out different values for variables until a solution is found that satisfies all constraints or it is proven that no solution exists.

During search, a search tree is constructed where each node represents the current state of the variables and edges represent value assignments to a particular variable. This means, with increasing depth of the search tree, more variables are assigned to specific values.

Search tree

The tree formed by a branch and bound algorithm strategy. It is a tree because at each (forward) branching step the problem is partitioned into a disjunction. A common one is to dichotomize the value of some variable, LaTeX: \textstyle x \le v \mbox{ or } x \ge v + 1. This creates two nodes from the parent:

Image:Search tree.png

Secant method

A method to find a root of a univariate function, say F. The iterate is

x^{k+1} = x^k - \frac{\mbox{F}(x^k) [x^k - x^(k-1)]}{\mbox{F}(x^k) - \mbox{F}(x^{k-1})}.

If LaTeX: \textstyle \mbox{F} \in \mbox{C}^2 \mbox{ and F}''(x) \ne 0, the order of convergence is the golden mean, say g (approx.= 1.618), and the limiting ratio is:

\left | \frac{2 \mbox{F}'(x)}{\mbox{F}''(x)} \right |^{g-1}

Second-order conditions

This descends from classical optimization, using the second-order term in Taylor's expansion. For unconstrained optimization, the second-order necessary condition (for LaTeX: \textstyle f \in C^2 ) is that the hessian is negative semi-definite (for a max). Second-order sufficient conditions are the first-order conditions plus the hessian be negative definite. For constrained optimization, the second-order conditions are similar, using projection for a regular mathematical program and the Lagrange Multiplier Rule. They are as follows (all functions are in LaTeX: \textstyle C^2c and the mathematical program is in standard form, for LaTeX: \textstyle x^* a local maximum):

Second-order necessary conditions. There exist Lagrange multipliers, (u,v), such that LaTeX: \textstyle u \ge 0 and LaTeX: \textstyle ug(x^*)=0 for which: (1) LaTeX: \textstyle \nabla x [ \mbox{L}(x^*,u,v)]=0, and (2) LaTeX: \textstyle \mbox{H}_x [\mbox{L}(x^*,u,v)] is negative semi-definite on the tangent plane.

Second-order sufficient conditions. The above necessary conditions hold but with (2) replaced by (2') H_x[L(x*,u,v)] is negative definite on the tangent plane.

Self concordance

Properties of a function that yield nice performance of Newton's method used for line search when optimizing a barrier function. Specifically, let B be a barrier function for LaTeX: \textstyle \mbox{S} = \{x \in \mbox{X}: g(x) \le 0\} with strict interior LaTeX: \textstyle \mbox{S}^0. Let x be in S and let d be a direction vector in LaTeX: \textstyle \mathbb{R}^n such that the line segment LaTeX: \textstyle [x-td, x+td] is in S for LaTeX: \textstyle t \mbox{ in } [0, t^*], where LaTeX: \textstyle t^* > 0. Then, define LaTeX: \textstyle \mbox{F}:[0,t^*] \to \mathbb{R} by:

\mbox{F}(t) = \mbox{B}(x + td)

(while noting that F depends on x and d). The function F is self-concordant if it is convex in C3 and satisfies the following for all x and d:

| \mbox{F}'''(0) | \le 2 \mbox{F}''(0)^\frac{3}{2}

One calls F k-self-concordant in an open convex domain if

| \mbox{F}'''(0) | \le 2 k \mbox{F}''(0)^{\frac{3}{2}}

The logarithmic barrier function, associated with linear programming, is self-concordant with LaTeX: k = 1. This further extends naturally to functions in LaTeX: \textstyle \mathbb{R}^n.

Semi-assignment problem

This is like the assignment problem, except only the rows or the columns (not both) of the assignment matrix is constrained to equal 1. With linear objective, the problem is:

\min \sum_{i, j} \mbox{C}(i, j) \mbox{ X}(i, j): \mbox{ X} \le 0, \sum_{i} \mbox{X}(i, j) = 1 \mbox{ for all } j.

(The sum constraint could be over j, for all i.) An example is to assign jobs to people, but allow some jobs to be assigned to more than one person and some jobs not assigned at all. A more realistic example arises in biology. We are given a rotamer library of side chain confirmations, and each amino acid residue in a protein must be assigned some rotamer. The same rotamer could be assigned to more than one site, and some rotamers need not be assigned at all.

Semi-definite program

LaTeX: \textstyle \min \{cx: S(x) \in P\}, where LaTeX: \mbox{P} is the class of positive semi-definite matrices, and LaTeX: \textstyle \mbox{S}(x) = S_0 + \sum_{j} x(j)S_j , where each LaTeX: \textstyle \mbox{S}_j, for LaTeX: \textstyle j = 0,\dots,n is a (given) symmetric matrix. This includes the linear program as a special case.

Semi-infinite program

A mathematical program with a finite number of variables or constraints, but an infinite number of constraints or variables, respectively. The randomized program is a semi-infinite program because it has an infinite number of variables when X is not finite.

Sensitivity analysis

The concern with how the solution changes if some changes are made in either the data or in some of the solution values (by fixing their value). Marginal analysis is concerned with the effects of small perturbations, maybe measurable by derivatives. Parametric analysis is concerned with larger changes in parameter values that affect the data in the mathematical program, such as a cost coefficient or resource limit.

Under suitable assumptions, the multipliers in the Lagrange Multiplier Rule provide derivatives of the optimal response function – i.e., under certain conditions, LaTeX: \textstyle (u, v) = \nabla f^* (b, c). For special approaches in LP, see compatibility theory, the 100% Rule, and the tolerance approach.

Separable program

The functions are separable:

f(x) = \sum_{j} f_j (x_j),
g(x) = \sum_{j} g_j(x_j),
\mbox{and } h(x) = \sum_{j} h_j(x_j).

The classical (LP) approaches to separable programming are called lambda-form and delta-form, both using piece-wise linear approximations.

Let LaTeX: \textstyle \{x^k\} be a specified set of points, where LaTeX: \textstyle x^k = [x(k,1), x(k,2), ..., x(k,n)], and let LaTeX: \textstyle y={y(k,j)} be decision variables that are the coefficients of convex combinations, giving the following linear program:

\max \sum_{kj} y(k, j) f_j(x(k, j)) : y \ge 0, \sum_{k} y(k, j) = 1 \mbox{ for each } j,

\sum_{kj} y(k, j) g_j(x(k, j)) \le 0, \sum_{kj} y(k, j) h_j(x(k, j)) = 0.

A restricted basis entry rule is invoked during the simplex method to yield an approximate solution. (However, this is dominated by the Generalized Lagrange Multiplier method, which can be viewed as generating the approximating breakpoints posteriori, getting successively finer near the solution.)

The delta form uses the differences: LaTeX: \textstyle u(k, j) = x(k, j) - x(k-1, j). The associated functional differences are:

\Delta f(k, j) = f_j(x(k, j)) - f_j(x(k-1, j));
\Delta g(k, j) = g_j(x(k, j)) - g_j(x(k-1, j));
\Delta h(k, j) = h_j(x(k, j)) - h_{j}(x(k-1, j)).

Then, the approximating LP is:

\max \sum_{kj} \Delta f(k, j) u(k, j)}: 0 \le u \le 1;

\sum_{kj} \Delta g(k, j) u(k, j)} \le b, \sum_{kj} \Delta h(k, j) u(k, j) = c,

where LaTeX: \textstyle b = -\sum_{j} g_j(x(0, j)) and LaTeX: \textstyle c = -\sum_{j} h_j(x(0, j)) (a similar constant was dropped from the objective). Another restricted basis rule is invoked: LaTeX: u(k,j) > 0 implies LaTeX: u(k,q)=1 for all LaTeX: q < j and all LaTeX: k.

Separating hyperplane

A hyperplane for which two (given) sets lie in opposite halfspaces. The separation is strict if the two sets are contained in their respective open halfspace.

              ____   |   ____
             |    |  |   \  /
             | S1 |  |   /S2\
             |____|  |   \__/ 
                   hyperplane (line) 

Sequencing problems

Finding an ordering, or permutation, of a finite collection of objects, like jobs, that satisfies certain conditions, such as precedence constraints.

Sequential decision process

See dynamic programming and Time-staged models.

Sequential linear programming

(SLP). Solving a nonlinear program by a sequence of linear approximations and using linear programming to solve each one. The linear approximations are usually done by using the first-order Taylor expansion.

Sequential quadratic programming

(SQP). Solving a nonlinear program by a sequence of quadratic approximations and using quadratic programming to solve each one. The approximations are usually done by using the second-order Taylor expansion.

Sequential unconstrained minimization technique

(SUMT). This is the penalty function approach.

Set constraint

A set constraint is a constraint on one or more set variables. For example, the following are common set constraints:

  • LaTeX: S \in [a, b] for ground values (e.g., integers) LaTeX: a, b
  • LaTeX: S \subseteq S_1
  • LaTeX: S = S_1 \cup S_2
  • LaTeX: S = S_1 \cap S_2
  • LaTeX: S = S_1 \setminus S_2
  • LaTeX: e \in S
  • LaTeX: e \not\in S
  • LaTeX: |S| \geq c

Set covering problem

See covering problem.

Set variable

A variable (often in constraint programming) whose value is a set, and therefore whose domain is the power-set of all elements.

For example, the set of all people who will be in the group that will play golf Sunday starting at 3 PM can be represented as a set variable.

Shadow price

An economic term to denote the rate at which the optimal value changes with respect to a change in some right-hand side that represents a resource supply or demand requirement. (This is sometimes taken as synonymous with the dual price, but this can be erroneous, as in the presence of degeneracy – see marginal price.) Also see pricing.

Sherman-Morrison formula

This is a useful identity:

\left [ A + a b^T \right ]^{-1} = A^{-1} - \frac{A^{-1} a b^T A^{-1}}{1 + b^T A^{-1} a},

where LaTeX: A is a nonsingular LaTeX: n \times n matrix and LaTeX: a and LaTeX: b are n-vectors, provided the inverse (on the left) exists.

Shortest path

In a graph or network, this is a path from one node to another whose total cost is the least among all such paths. The "cost" is usually the sum of the arc costs, but it could be another function (e.g., the product for a reliability problem, or max for a fuzzy measure of risk). There are some particular labeling algorithms given.


The difference between two posynomials. This class of function defines the general geometric program.


(pl. simplices). LaTeX: \textstyle \left \{ x \in \mathbb{R}: \sum x_j = 1 \right \}. For LaTeX: n=1, this is a point LaTeX: (x=1). For n=2, this is a line segment, joining points (1,0) and (0,1). For LaTeX: n=3, this is a triangle, joining the vertices (1,0,0), (0,1,0), and (0,0,1). This is sometimes called an n-simplex, denoted by LaTeX: S_n (note its dimension is n-1). The open simplex excludes the axes: LaTeX: \textstyle \{ x \in S_n: x > 0\}.

More generally, some authors define a simplex to be the convex hull of any LaTeX: n+1 affinely independent vectors, and refer to the special case of the unit vectors as the standard simplex.

Simplex method

An algorithm invented to solve a linear program by progressing from one extreme point of the feasible polyhedron to an adjacent one. The method is an algorithm strategy, where some of the tactics include [[Pricing|]]ricing and [[Pivot|]]ivot selection.

The elementary simplex method is the name of Dantzig's original (1947) algorithm, with the following rules applied to the standard form: LaTeX: \textstyle \min \left \{ cx: Ax=b, x \ge 0 \right \}.

    Let LaTeX: d_j = reduced cost of LaTeX: x_j \; terminate if LaTeX: d_j \ge 0 \mbox{ for all } j.
  1. Select LaTeX: d_j < 0 as one of greatest magnitude.
  2. In the associated column (LaTeX: j) of the tableau, compute the min ratio: LaTeX: x_i / a(i, j): a(i, j) > 0. (If LaTeX: a(., j) \le 0, LP is unbounded).
  3. Enter LaTeX: x_j into the basic set, in exchange for LaTeX: x_i, and update the tableau.

Among the variations are:

The revised simplex method is the use of a particular factored form of the basis: LaTeX: \textstyle B = \begin{bmatrix}
E_1 & E_2 & \dots & E_k
(after k iterations), where each LaTeX: E_i is an elementary matrix. Then, the revised simplex method uses forward transformation to pivot and backward transformation to update the pricing vector. (For this distinction, the elementary simplex method is sometimes called the tableau method.)

The simplex method draws its name from imagining a normalization constraint, LaTeX: \textstyle \sum_{j} x_j = 1, and thinking of the j-th column of LaTeX: A to be selected by the weight LaTeX: x_j in LaTeX: S_w. Then, at an iteration, an m-simplex is specified by the basic variables, and an adjacent simplex is chosen to improve the objective value. This view is in requirements space.

(For a different algorithm, used as a heuristic in NLP, see the Nelder-Mead simplex method.)

Simplex multiplier

See pricing.

Simplicial subdivision

Given a simplex, LaTeX: S, its simplicial subdivision is a collection of simplices, say LaTeX: \{T_i\} such that LaTeX: \textstyle \lor_i\{T_i\} = S and for any LaTeX: i and LaTeX: j, either the intersection LaTeX: \textstyle T_i \land T_j is empty or equals the closure of a common face. The mesh of the subdivision is the diameter of the largest sub-simplex. This arises in a fixed point approach to compute an economic equilibrium.

Simulated annealing

An algorithm for solving hard problems, notably combinatorial programs, based on the metaphor of how annealing works: reach a minimum energy state upon cooling a substance, but not too quickly in order to avoid reaching an undesirable final state. As a heuristic search, it allows a non-improving move to a neighbor with a probability that decreases over time. The rate of this decrease is determined by the cooling schedule, often just a parameter used in an exponential decay (in keeping with the thermodynamic metaphor). With some (mild) assumptions about the cooling schedule, this will converge in probability to a global optimum.

Skew symmetric matrix

LaTeX: A is square and LaTeX: A^T = -A.

Slack variable

In an inequality constraint of the form LaTeX: \textstyle g(x) \le b, the slack is LaTeX: \textstyle b - g(x), which is designated by the slack variable, LaTeX: s. Then, the original constraint is equivalent to the defining equation, LaTeX: \textstyle g(x) + s = b, \mbox{ plus } s \ge 0.

Slater condition

Originally for the purely inequality system with g convex, it means there exists LaTeX: x for which LaTeX: g(x) < 0. More generally, for a mathematical program in standard form, it means there exists LaTeX: x \in X for which LaTeX: \textstyle g(x) < 0 \mbox{ and } h(x) = 0.


A function is smooth if it has continuous first derivatives.

Soft constraint

A soft constraint can be seen as a preferential constraint whose satisfaction is not required but preferred. Originally, soft constraints were proposed to model and solve over-constrained problems, i.e., the problems where there is no solution satisfying all the constraints. By weakening some constraints into soft constraints, we can find a solution satisfying all the hard constraints and as many soft constraints as possible.

See also:

Spanning tree

A subgraph that is a tree containing all nodes. The max weight spanning tree problem is to find a spanning tree such that the sum of (given, positive) weights of the edges is a maximum.

The max spanning tree problem is solvable by the following greedy algorithm:
Input. connected graph with weights, LaTeX: w_1 \ge \dots \ge w_m.
Output. maximum weight spanning tree, T.
  • Initialization: Set LaTeX: k=1; T = graph with given nodes and no edges.
  • Iteration (until LaTeX: k=m-1): Test if the k-th edge forms a cycle with T. If not, add it to T; if so, discard the edge. In either case, increment k and iterate.

Here is a supplement, Greedy Algorithms for Minimum Spanning Tree.


The fraction of zeroes in a matrix. If LaTeX: \textstyle A is LaTeX: \textstyle m \times n, and LaTeX: \textstyle A_{(i, j)} \ne 0 for LaTeX: k of its elements, its sparsity is LaTeX: \textstyle \frac{k}{mn}. Large linear programs tend to be very sparse, increasingly so as the dimensions grow. For example, consider the standard transportation problem with s sources and d destinations. This has LaTeX: \textstyle m = (s+d) constraints and LaTeX: \textstyle n = sd variables. Each column, however, has exactly 2 nonzeroes since A is the incidence matrix of the network, so its sparsity is LaTeX: \textstyle \frac{2n}{mn} = \frac{2}{m}, which decreases as the number of sources and/or destinations grows large.

The sparsity of a simple graph (or network) is the sparsity of its adjacency matrix. More generally, the sparsity of a multigraph refers to the average degree of its nodes.

See super sparsity for extensions of this idea.

Specially ordered set

(SOS). These are sets of non-negative variables that are required to sum to 1. For computational efficiency, it is sometimes better to define these sets by some marking data structure, rather than include them along with other equality constraints. There are two types of SOSs, distinguished by what they represent. A Type 1 SOS is when each variable is binary, so the constraint is one of Multiple Choice. A Type 2 SOS is when a restricted basis entry rule is used, as in the lambda-form of separable programming.

Spectral radius

The spectral radius of a matrix, LaTeX: A. The radius of a disk that contains the spectrum:

LaTeX: \textstyle r(A) = \max \left \{\left | y \right |: y \mbox{ is an eigenvalue of } A \right \}.


The set of eigenvalues of a matrix.

Speed of convergence

See convergence.

Stability region

The set of parameter values for which an optimal solution remains optimal. This arises naturally when solving combinatorial programs, where a solution is often a subgraph, such as a tree, and the question is for what range of arc weights is this subgraph optimal (such as a spanning tree that is minimum for given weights). More generally, LaTeX: x could be a solution generated by some algorithm, LaTeX: A, from an initial value LaTeX: x_0. Then, suppose the feasibility region depends on the parameter LaTeX: p, say LaTeX: F(p), and the objective also depends on LaTeX: p, say LaTeX: f(x;p). Let LaTeX: X(p,A,x_0) denote the generated solution from algorithm A, starting at LaTeX: x_0, with parameter value LaTeX: p. Let the parameter set be LaTeX: P (which includes LaTeX: p^*). The stability region of LaTeX: x^* = X(p^*,A,x_0) is LaTeX: \textstyle \{ p \in P: x^* =& X(p,A,x_0)\}. The algorithm may be a heuristic, so LaTeX: x^* need not be optimal. For example, one could use an n-Opt heuristic for the travelling salesman problem, so x represents a tour. The parameters could be the costs, or they could be the location of each point in a Euclidean TSP. The stability region is the set of costs, or coordinates in the plane, for which the tour generated by n-Opt is the same.

Stable math program

Roughly, one whose solution does not change much under perturbation. For the inequality case, we have the following

    Stability conditions:
  1. LaTeX: \textstyle \{x \in X: g(x) \le b\} is bounded for some LaTeX: \textstyle b > 0.
  2. LaTeX: \textstyle \mbox{cl} \{x \in X: g(x) < 0\} = \{x \in X: g(x) \le 0\}.

The first stability condition pertains to upper semi-continuity and the second, called the closure condition, pertains to lower semi-continuity.

The conditions are not only sufficient to ensure the respective semi-continuity, but they are necessary when:
  1. LaTeX: \textstyle \{x \in X: g(x) \le 0\} is bounded.
  2. LaTeX: \textstyle \{x \in X: g(x) < 0 \} \ne \emptyset .

Stable set

Same as independent set

Standard linearization

The standard way to linearize the product of two binary variables LaTeX: x and LaTeX: y is to replace LaTeX: xy with a continuous variable LaTeX: w and add four linear inequalities as auxiliary constraints:

LaTeX: \begin{array}{ll}
& w \le x, \\
& w \le y, \\
& w \ge x + y - 1, \\
and & w \ge 0.

Collectively, these imply LaTeX: w = xy for all binary values of LaTeX: x and LaTeX: y. This can be generalized for the product of binary variables LaTeX: x_j for all LaTeX: j in some index set LaTeX: J by replacing LaTeX: \textstyle \prod_{j \in J} x_j with a continuous variable LaTeX: w and adding LaTeX: |J| + 2 auxiliary constraints:

LaTeX: \begin{array}{ll}
& w \le x_j \mbox{ for all } j \in J, \\
& w \ge \sum_{j \in J} x_j - (|J| - 1), \\
and & w \ge 0.

Stationary point

Usually this is used to mean a Kuhn-Tucker point, which specializes to one for which grad_f(x)=0 if the mathematical program is unconstrained. In the context of an algorithm, it is a fixed point.

Stationary policy

In a dynamic program, this is a policy that is independent of time – i.e., LaTeX: \textstyle x^*(s,t) = T(s) (some function of state, but not of time, t).

Steel beam assortment problem

A steel corporation manufactures structured beams of a standard length, but a variety of strengths. There is a known demand of each type of strength, but a stronger one may fulfill demand (or part thereof) for another beam (but not conversely). The manufacture of each type of steel beam involves a fixed charge for its setup. In addition, there is a shipping cost proportional to the difference in the demanded strength and the actual strength, and proportional to the quantity shipped.

Let LaTeX: N = number of varieties of strengths LaTeX: D(t) = demand for beam of strength LaTeX: s_t (where LaTeX: \textstyle s_1 \ge s_2 \ge \dots \ge s_N ) LaTeX: x(t) = amount of beams of strength LaTeX: s_t manufactured LaTeX: \textstyle p_t(x(t)) = manufacturing cost of LaTeX: x(t) units of beam of strength LaTeX: s_t (incl. fixed charge) LaTeX: y(t) = total excess of beams of strength LaTeX: \textstyle s_1, \dots , s_t before fulfilling demand LaTeX: \textstyle D(t+1), \dots , D(N) h_t(y(t)) = shipping cost LaTeX: \textstyle ( = c [ s_{(t+1)}- s_t] \min \{y(t), D(t)\}).

Although LaTeX: t does not index time, the mathematical program for this problem is the same form as the production scheduling problem, using the inventory balance equations to relate LaTeX: y and LaTeX: x. This is valid because LaTeX: \textstyle s_1 \ge s_2 \ge \dots \ge s_N implies LaTeX: y(t) can be used to fulfill demand LaTeX: \textstyle D(t+1) + D(t+2) + \dots + D(N). (Also, note that here LaTeX: h_t is not a "holding cost".)

Steepest ascent

(descent, if minimizing). This is a class of algorithms, where LaTeX: \textstyle x' = x + sd, such that the direction vector d is chosen by maximizing the initial "velocity" of change, and the step size LaTeX: (s) is chosen by line search. Generally used in the context of unconstrained optimization, the mathematical program is LaTeX: \textstyle \max \{f(x): x \in R^n\}, where LaTeX: \textstyle f \in C^1. (For descent, change Max to Min.) Then, LaTeX: d is chosen to maximize the first-order Taylor approximation, subject to a normalization constraint: LaTeX: \textstyle \max \{ \nabla f(x) d: ||d|| = 1 \}, where LaTeX: ||d|| denotes the norm of the direction vector, LaTeX: d. When the Euclidean norm is used, this yields the original steepest ascent algorithm by Cauchy, which moves in the direction of the gradient:

d = \nabla f(x) / ||\nabla f(x)||.

(No direction vector is sought if LaTeX: \textstyle \nabla f(x)=0; such algorithms stop when reaching a stationary point.)

Other norms, such as LaTeX: \textstyle ||d||^2 = d' Q d, where LaTeX: Q is symmetric and positive definite, lead to other directions that are steepest relative to that norm. In particular, if LaTeX: \textstyle Q = H_f(x), this yields the modified Newton method.

Steiner problem

Find a subgraph of a graph, say LaTeX: \textstyle G' = [V',E'], such that LaTeX: V' contains LaTeX: V^* (a specified subset of nodes), and LaTeX: \textstyle \sum \{c(e): e \in E'\} is minimized. It is generally assumed LaTeX: \textstyle c \ge 0. When LaTeX: \textstyle |V^*|=2, this is the shortest path problem. When LaTeX: \textstyle |V^*|=|V|, this is the (minimum) spanning tree problem.

Step size

This is a scalar LaTeX: (s) in an algorithm of the form: LaTeX: \textstyle x' = x + sd, where LaTeX: d is the direction vector. After LaTeX: d is chosen (nonzero), the step size is specified. One step size selection rule is line search; another is a fixed sequence, like LaTeX: \textstyle s_k = 1/k.

Stochastic matrix

A non-negative matrix whose row sums are each 1. (A column stochastic matrix is one whose column sums are 1.) This arises in dynamic programs whose state transition is described by a stochastic matrix containing the probabilities of each transition.

Stochastic program

It is written in the form of a mathematical program extended to a parameter space whose values are random variables (generally with known distribution function). This is converted to a standard form by forming a certainty equivalent.

  • Here are some certainty equivalents:
  • Average value. Replace all random variables with their means.
  • Chance constraint. Given a stochastic program with a random variable, p, in its constraint: LaTeX: \textstyle g(x; p) \le 0, a certainty equivalent is to replace this with the constraint, LaTeX: \textstyle P \left [g(x; p) \le 0 \right ] \ge a, where LaTeX: P[] is a (known) probability function on the range of LaTeX: g, and LaTeX: a is some acceptance level (LaTeX: a=1 means the constraint must hold for all values of LaTeX: p, except on a set of measure zero). Some models separate constraints with several levels:

P \left [g_i(x; p) \le 0 \mbox{ for all } i \in I_k] \ge a_k \mbox{ for } k=1, \dots ,K.

    The case of one chance constraint with the only random variable being the right-hand side is particularly simple. Suppose LaTeX: F is the cumulative distribution function of LaTeX: b for the chance constraint LaTeX: \textstyle P[g(x) \le b] \ge a. If LaTeX: b is a continuous random variable and LaTeX: F is continuous and strictly increasing, the chance constraint is equivalent to LaTeX: \textstyle g(x) \le F^{-1}(1-a) (where LaTeX: F^{-1} is the inverse function of LaTeX: F). In particular, if LaTeX: \textstyle g(x) = Ax, the program remains linear.
  • Recourse model. This assumes decisions are made over time where the effect of an early decision can be compensated by later decisions. The objective is to optimize the expected value. The 2-stage model has the form:

\max f_1(x_1; p_1) + f_2(x_2; p_2): x_1 \in X_1, x_2 \in X_2, g(x_1; p_1) + g(x_2; p_2) \le 0.

    (Sums could be replaced by other operators.) Once LaTeX: x_1 is implemented, LaTeX: p_1 becomes known and LaTeX: x_2 is chosen. The certainty equivalent is: LaTeX: 
\max E[f_1(x1; p1) + F_2(x_1 | p_1)]: x_1 \in X_1, where

F_2(x_1 | p_1) = \sup \{E[f_2(x_2; p_2)]: x_2 \in X_2(p_2), g(x_2; p_2) \le -g(x_1; p_1)\}

    for all p2 (except on set of measure zero). (E[] denotes the expected value.) The 'Sup' is used to define LaTeX: F_2, the second stage value for a particular value of LaTeX: x_1, because the choice of LaTeX: x_1 might be infeasible. The nature of the recourse model is pessimistic: LaTeX: x must be chosen such that the original constraints hold no matter what the values of the random variables. With a finite number of possibilities, this means a system of constraints for each possible realization of LaTeX: \textstyle p=(p_1, p_2). This extends recursively to a k-stage model. The linear 2-stage recourse model has the form:

\max E[c]x + E[F(x; p)]: Ax=b, x \ge 0,


F(x; p) = \max d(p)y: W(p)y = w(p) - T(p)x, y \ge 0.

    Here the second stage variable is denoted LaTeX: y; it is determined after LaTeX: x has been set and the random variable LaTeX: p has been realized. The LP data depend on LaTeX: p as functions, LaTeX: d(p), W(p), w(p) and LaTeX: T(p). The fixed recourse model has LaTeX: W(p)=W. The complete recourse model assumes it fixed and LaTeX: \textstyle \{Wy: y \ge 0\} is all of LaTeX: \mathbb{R}^m (where LaTeX: m = number of rows of LaTeX: W). This means that no matter what value of LaTeX: x is chosen for the first stage, there is feasible recourse LaTeX: (y). This is simple recourse if LaTeX: \textstyle W=[I -I], so we can think of y as having two parts: LaTeX: y^+ and LaTeX: y^-. The second stage LP simplifies to the following:

\max d^-(p) y^+ + d^-(p) y^- : y^+, y^- \ge 0, y^+ - y^- = w(p) - T(p)x.

Also see robust optimization. The certainty equivalent depends upon the underlying decision process. If it is adaptive, the recourse model applies (but RO might be more practical). The chance constraint model represents a notion of an allowed frequency of violations, as in environmental control models.

Strict interior

Let LaTeX: \textstyle x \in X : g(x) \le b be the level set of LaTeX: g. Then, its strict interior is LaTeX: \textstyle x \in X : g(x) < b. (This is not to be confused with the relative interior or the set interior - see Myths and Counter Examples in Mathematical Programming.)

Strict optimum

Same as proper optimum.

Strictly complementary

Each complementary pair of variables must have exactly one zero (the other positive).

Strictly concave function

Negative is strictly convex.

Strictly convex function

A convex function that also satisfies the defining inequality strictly for distinct points, say x and y:

f(ax + (1-a)y) < af(x) + (1-a)f(y) \mbox{ for all } a \in (0,1).

Strictly quasiconcave function

Negative is strictly quasiconvex.

Strictly quasiconvex function

LaTeX: X is a convex set and LaTeX: \textstyle f(ax + (1-a)y) < \max \{f(x), f(y)\} \mbox{ for all } x, y \in X for which LaTeX: \textstyle f(x) \ne f(y), and LaTeX: a is in (0, 1). (Note: LaTeX: f need not be quasiconvex - see Myths and Counter Examples in Mathematical Programming.)

Strong duality

See dual

Strongly concave function

Negative is strongly convex.

Strongly convex function

Arises for LaTeX: \textstyle f \in C^2: eigenvalues of hessian are bounded away from zero (from below): there exists LaTeX: \textstyle K > 0 such that LaTeX: \textstyle h' H_f(x) h \ge K||h||^2 for all LaTeX: \textstyle h \in \mathbb{R}^n. For example, the function LaTeX: \exp(-x) is strictly convex on LaTeX: \mathbb{R}, but its second derivative is LaTeX: \textstyle \exp(-x), which is not bounded away from zero. The minimum is not achieved because the function approaches its infimum of zero without achieving it for any (finite) LaTeX: x. Strong convexity rules out such asymptotes.

Strongly quasiconcave function

Negative is strongly quasiconvex.

Strongly quasiconvex function

(LaTeX: f on LaTeX: X). LaTeX: X is a convex set and LaTeX: \textstyle f(a \, {x} + (1-a)\, y) < \max\{f(x), f(y)\} \mbox{ for all } {x}, y \in X, with LaTeX: \textstyle x \ne y, and LaTeX: \textstyle a \in (0, 1).

Structural variable

See linear program.

Subadditive function

LaTeX: \textstyle f(x+y) \le f(x) + f(y) (where LaTeX: x, y in the domain implies LaTeX: x+y is in the domain).


(of LaTeX: f at LaTeX: x). LaTeX: \textstyle \partial f(x) = \{y: x \in \argmax\{vy - f(v): v \in X\}\} Also see conjugate function. If LaTeX: f is convex and differentiable with gradient, LaTeX: \textstyle \nabla f, \partial f(x) = \{\nabla f(x)\}.

LaTeX: \mbox{Example: } f(x) = | x |. \mbox{ Then, } \partial f(0) = \begin{bmatrix} -1 & 1 \end{bmatrix}.

The subdiffenential is built on the concept of supporting hyperplane, generally used in convex analysis. When LaTeX: f is differentiable in a deleted neighborhood of LaTeX: x (but not necessarily at LaTeX: x), the B-subdifferential is the set of limit points:

\partial_B f(x) = {d: \mbox{ there exists } \{x^k\} \to x \mbox{ and } \{\nabla f(x^k)\} \to d\}.

If LaTeX: f is continuously differentiable in a neighborhood of LaTeX: x (including LaTeX: x), LaTeX: \textstyle \partial_B f(x) = \nabla f(x). Otherwise, LaTeX: \textstyle \partial_B f(x) is generally not a convex set. For example, if LaTeX: \textstyle f(x) = | x |, \partial_B f(0) = \{-1, 1\}.

The Clarke subdifferential is the convex hull of LaTeX: \textstyle \partial_B f(x).


A member of the subdifferential.

Sublinear rate of convergence

See convergence.

Submodular function

Let LaTeX: N be a finite set and let LaTeX: f be a function on the subsets of LaTeX: N into LaTeX: \mathbb{R}. Then, LaTeX: f is submodular if it satisfies:

f(S \lor T) \le f(S) + f(T) - f(S \land T) \mbox{ for } S, T \subset N.

Example: LaTeX: \textstyle N=\{0,1\}^n (binary n-vectors) and LaTeX: \textstyle f(S) = \sum_{x \in S} \sum_{j=1}^{n} x_j. Instance: LaTeX: \textstyle n=2, LaTeX: \textstyle S=\{(0,1), (1,1)\}, LaTeX: \textstyle T=\{(1,0),(1,1)\}. Then, LaTeX: \textstyle f(S \lor T) = 4, LaTeX: \textstyle f(S)=3, LaTeX: \textstyle f(T)=3, LaTeX: \textstyle \mbox{and } f(S \land T) = 2. (The linearity of Sum makes the inequality hold with inequality.)


A subset of a vector space that is, itself, a vector space. An example is the null space of a matrix, as well as its orthogonal complement.


See forward substitution and backward substitution.

Successive approximation

The iterative scheme by which an approximation is used for the basic design of an algorithm. The sequence generated is of the form LaTeX: \textstyle x^(k+1) = x^k + A(x^k), where LaTeX: A is an algorithm map specified by its approximation to some underlying goal. Typically, this is used to find a fixed point, where LaTeX: A(x)=0 (e.g., seeking LaTeX: f(x)=x, let LaTeX: A(x) = f(x) - x, so the iterations are LaTeX: x^(k+1) = f(x^k), converging to LaTeX: x^* = f(x^*) if f satisfies certain conditions, such as a contraction map).

Sufficient matrix

Let LaTeX: A be an LaTeX: n \times n matrix. Then, LaTeX: A is column sufficient if

\left [ x_i (Ax)_i \le 0 \mbox{ for all } i \right ] \Rightarrow \left [ x_i (Ax)_i = 0 \mbox{ for all } i \right ].

LaTeX: A is row sufficient if its transpose is column sufficient. LaTeX: A is sufficient if it is both column and row sufficient. One example is when LaTeX: A is symmetric and positive semi-definite. Here is an example of a matrix that is column sufficient, but not row sufficient:

A = \begin{bmatrix}
0 & 1 \\
0 & 1
\end{bmatrix} .

This arises in linear complementarity problems.

Super sparsity

The fraction of distinct nonzero values in a matrix. If a matrix has z nonzeros -i.e. z = sparsity × m × n, then the super-sparsity is number of distinct values divided by z. Although related to the idea of sparsity, a matrix can be highly super-sparse yet not sparse. For example if LaTeX: \textstyle A_{(i, j)} = 1 for all LaTeX: i,j, then sparsity is 0 but super-sparsity is LaTeX: 1/mn.

These matrices arise naturally in models with repetitive substructures. For example, a process may be modeled as a linear program with activities that convert raw materials into finished products (like a refinery converting crude oil into petroleum products). The yields are determined by the engineering, so this model would be the same if embedded into a larger linear program that uses it in numerous regions. Then, the yield values are replicated for each region, which makes the resulting linear program matrix super-sparse. Additionally, any network problem in which the coefficient matrix is the incidence matrix is super-sparse. This leads to the special case of only zeros and ones.

Superadditive function

Negative is subadditive.

Superbasic variable

When using a method like the convex simplex method or projected gradient, the nonbasic variables are partitioned into those that are at one of their bound values and those that are not. The latter are called superbasic.


Arises in geometric programming, and in semi-infinite programming as one that satisfies the Slater interiority condition.

Superlinear rate of convergence

See convergence.

Supermodular function

Negative is submodular.

Support set

of LaTeX: \textstyle x \ge 0. \left \{j : x_j > 0 \right \}.

Supporting hyperplane

Supporting hyperplane of a set, S. A hyperplane that contains S in one of its closed halfspaces and intersects the closure of S with at least one point.

            ______________________  supporting hyperplane (line)                     /\                    /  \                   / S  \                  /______\

Suppose S is closed and convex. A key fact is that every supporting hyperplane contains an extreme point of S. If S is a polyhedron, the facets define a finite collection of supporting hyperplanes that completely determine the polyhedron (as the intersection of the associated halfspaces that contain S).


(abbr. Sup). This is the least upper bound on a (real-valued) function over (a subset of) its domain. If LaTeX: f is unbounded from above, LaTeX: \textstyle \sup \{f\} = \infty, and if the domain is empty, LaTeX: \textstyle \sup \{f \} = -\infty. Otherwise, suppose U is any upper bound: LaTeX: \textstyle f(x) \le U for all LaTeX: x \in X. LaTeX: U is a least upper bound if, for any LaTeX: e > 0, there exists LaTeX: x in the domain for which LaTeX: \textstyle f(x) \ge U-e. (That is, we can get arbitrarily close to LaTeX: U in the range of LaTeX: f.) Note that the supremum is always defined, and its range is in the extended reals. The supremum is the maximum, if it is attained by some point in its domain.

Surplus variable

In an inequality constraint of the form LaTeX: \textstyle g(x) \ge b, the level of surplus is LaTeX: g(x)-b, which is designated by the surplus variable, LaTeX: s. Then, the original constraint is equivalent to the defining equation, LaTeX: g(x) - s = b, plus the non-negativity, LaTeX: s \ge 0.

Surrogate constraint

An aggregation of the constraints, such as a (non-negative) linear sum of the (inequalities and) equations. Of special interest is the integer equivalent aggregation. (Also see the surrogate dual in the supplement on duals.)

Surrogate dual

See it in the supplement on duals.

Surrogate relaxation

Solving the surrogate dual problem, see the supplement on duals. For given multipliers LaTeX: \textstyle (u,v) (u \ge 0), the surrogate dual objective is the relaxed mathematical program:

\max \left \{f(x) : x \in X, ug(x) \le 0, vh(x) = 0 \right \}.

(More generally, the surrogate could be a nonlinear combination of the constraints.)

Symbolic CSP

A CSP is symbolic if its variables range over non-numeric domains.

Symmetric dual

See it in the supplement on duals.

Symmetry exclusion

Symmetry exclusion. Excluding symmetric solutions, usually when solving combinatorial programs. For example, suppose xij = 1 if item i is assigned to be at position j (in 2 or 3 dimensional space); otherwise, xij=0. When branching on xij=1, the alternative optimal value x_ik=1 is being considered for position k that maps from j under reflection, translation, or rotation. So, when considering the complementary branch, xij=0, the symmetry exclusion condition, xik=0, is also added. Symmetry exclusion can sometimes be done by ordering variables, or with cutting planes. It arises in most problems with a geometric meaning, including the symmetric traveling salesman problem, where every tour, (1,2,...,n,1), has the symmetric solutions (2,3,...,n,1,2) and (1,n,n-1,...,2,1). The first of these symmetries is excluded by defining city 1 as the first (or "home") city. Fixing the home does not exclude the second symmetry, which is a tour reversal. Non-geometric problems can also have symmetries, such as graph coloring; any coloring has the symmetric solution by swapping colors – e.g., every red node becomes green, and every green node becomes red.

Personal tools