# All Pages

Let $LaTeX: \textstyle f : \mbox{ X } \times \mbox{ Y } \to \mathbb{R}.$ Then, $LaTeX: \textstyle (x^*, y^*)$ is a saddle point of $LaTeX: \textstyle f$ if $LaTeX: \textstyle x^*$ minimizes $LaTeX: \textstyle f(x, y^*)$ on $LaTeX: \textstyle \mbox{ X }$, and $LaTeX: \textstyle y^*$ maximizes $LaTeX: \textstyle f(x^*, y)$ on $LaTeX: \textstyle \mbox{ Y }$. Equivalently,

$LaTeX: f(x^*, y) \le f(x^*, y^*) \le f(x, y^*) \mbox{ for all } x \mbox{ in X, } y \mbox{ in Y.}$

Von Neumann (1928) proved this equivalent to:

$LaTeX: \begin{array}{rcl} f(x^*, y^*) & = & \inf \{ \sup \{ f(x, y) : y \in \mbox{ Y } \} : x \in \mbox{ X } \} \\ & = & \sup \{ \inf \{ f(x, y) : x \in \mbox{ X } \} : y \in \mbox{ Y } \}. \end{array}$

A sufficient condition for a saddle point to exist is that $LaTeX: \textstyle \mbox{ X }$ and $LaTeX: \textstyle \mbox{ Y }$ are non-empty, compact, convex sets, $LaTeX: f(.,y)$ is convex on $LaTeX: \textstyle \mbox{ X }$ for each $LaTeX: y$ in $LaTeX: \textstyle \mbox{ Y }$, and $LaTeX: f(x,.)$ is concave on $LaTeX: \textstyle \mbox{ Y }$ for each $LaTeX: x$ in $LaTeX: \textstyle \mbox{ X }$.

# Satisfiability problem

(SAT). Find a truth assignment to logical propositions such that a (given) collection of clauses is true (or ascertain that at least one clause must be false in every truth assignment). This fundamental problem in computational logic forms the foundation for NP-completeness. When SAT is in conjunctive normal form and may not be satisfiable, we define the problem of maximizing the number of true clauses as MAXSAT.

# Scaling

Changing the units of measurement, usually for the numerical stability of an algorithm. The variables are transformed as $LaTeX: \textstyle x' = \mbox{S}x,$ where $LaTeX: \textstyle \mbox{S} = \mbox{diag}(s_j).$ The diagonal elements are the scale values, which are positive: $LaTeX: \textstyle s_1, \dots , s_n > 0.$ Constraint function values can also be scaled. For example, in an LP, the constraints $LaTeX: \textstyle \mbox{A}x = b,$ can be scaled by $LaTeX: \textstyle \mbox{RA} x = \mbox{R} b ,$ where $LaTeX: \textstyle \mbox{R} = \mbox{diag}(r_i)$ such that $LaTeX: \textstyle r > 0.$ (This affects the dual values.) Some LP scaling methods simply scale each column of A by dividing by its greatest magnitude (null columns are identified and removed).

Example A column scaling A row scaling
+ = + 10x 100y 500 –30x .3y 0.2
+ = + .3333x y 500 –x .003y 0.2
+ = + x 10y 50 –300x 3y 2

Another method is logarithmic scaling, which scales by the logarithm of the greatest magnitude. More sophisitcated methods are algorithmic, taking both row and column extremes into account.

# Scatter search

This is a population-based metaheuristic, starting with a collection of reference points, usually obtained by the application of some heuristic. A new point is created by taking combinations of points in the population, and rounding elements that must be integer-valued. It bears some relation to a genetic algorithm, except that scatter search uses linear combinations of the population, while the GA crossover operation can be non-linear.

# Scheduling

(e.g., jobs). A schedule for a sequence of jobs, say $LaTeX: \textstyle j_1, \dots , j_n,$ is a specification of start times, say $LaTeX: \textstyle t_1, \dots , t_n,$ such that certain constraints are met. A schedule is sought that minimizes cost and/or some measure of time, like the overall project completion time (when the last job is finished) or the tardy time (amount by which the completion time exceeds a given deadline). There are precedence constraints, such as in the construction industry, where a wall cannot be erected until the foundation is laid.

There is a variety of scheduling heuristics. Two of these for scheduling jobs on machines are list heuristics: the Shortest Processing Time (SPT), and the Longest Processing Time (LPT). These rules put jobs on the list in non-decreasing and non-increasing order of processing time, respectively.

Other scheduling problems, which might not involve sequencing jobs, arise in production planning.

# Scope

The scope of a constraint is the set of directly constrained variables. For example, in the constraint $LaTeX: x \neq y$, the set $LaTeX: \{x,y\}$ is the scope.

# Search

Search is a systematic way of trying out different values for variables until a solution is found that satisfies all constraints or it is proven that no solution exists.

During search, a search tree is constructed where each node represents the current state of the variables and edges represent value assignments to a particular variable. This means, with increasing depth of the search tree, more variables are assigned to specific values.

# Search tree

The tree formed by a branch and bound algorithm strategy. It is a tree because at each (forward) branching step the problem is partitioned into a disjunction. A common one is to dichotomize the value of some variable, $LaTeX: \textstyle x \le v \mbox{ or } x \ge v + 1.$ This creates two nodes from the parent:

# Secant method

A method to find a root of a univariate function, say F. The iterate is

$LaTeX: x^{k+1} = x^k - \frac{\mbox{F}(x^k) [x^k - x^(k-1)]}{\mbox{F}(x^k) - \mbox{F}(x^{k-1})}.$

If $LaTeX: \textstyle \mbox{F} \in \mbox{C}^2 \mbox{ and F}''(x) \ne 0,$ the order of convergence is the golden mean, say g (approx.= 1.618), and the limiting ratio is:

$LaTeX: \left | \frac{2 \mbox{F}'(x)}{\mbox{F}''(x)} \right |^{g-1}$

# Second-order conditions

This descends from classical optimization, using the second-order term in Taylor's expansion. For unconstrained optimization, the second-order necessary condition (for $LaTeX: \textstyle f \in C^2$ ) is that the hessian is negative semi-definite (for a max). Second-order sufficient conditions are the first-order conditions plus the hessian be negative definite. For constrained optimization, the second-order conditions are similar, using projection for a regular mathematical program and the Lagrange Multiplier Rule. They are as follows (all functions are in $LaTeX: \textstyle C^2c$ and the mathematical program is in standard form, for $LaTeX: \textstyle x^*$ a local maximum):

Second-order necessary conditions. There exist Lagrange multipliers, (u,v), such that $LaTeX: \textstyle u \ge 0$ and $LaTeX: \textstyle ug(x^*)=0$ for which: (1) $LaTeX: \textstyle \nabla x [ \mbox{L}(x^*,u,v)]=0,$ and (2) $LaTeX: \textstyle \mbox{H}_x [\mbox{L}(x^*,u,v)]$ is negative semi-definite on the tangent plane.

Second-order sufficient conditions. The above necessary conditions hold but with (2) replaced by (2') H_x[L(x*,u,v)] is negative definite on the tangent plane.

# Self concordance

Properties of a function that yield nice performance of Newton's method used for line search when optimizing a barrier function. Specifically, let B be a barrier function for $LaTeX: \textstyle \mbox{S} = \{x \in \mbox{X}: g(x) \le 0\}$ with strict interior $LaTeX: \textstyle \mbox{S}^0.$ Let x be in S and let d be a direction vector in $LaTeX: \textstyle \mathbb{R}^n$ such that the line segment $LaTeX: \textstyle [x-td, x+td]$ is in S for $LaTeX: \textstyle t \mbox{ in } [0, t^*],$ where $LaTeX: \textstyle t^* > 0.$ Then, define $LaTeX: \textstyle \mbox{F}:[0,t^*] \to \mathbb{R}$ by:

$LaTeX: \mbox{F}(t) = \mbox{B}(x + td)$

(while noting that F depends on x and d). The function F is self-concordant if it is convex in C3 and satisfies the following for all x and d:

$LaTeX: | \mbox{F}'''(0) | \le 2 \mbox{F}''(0)^\frac{3}{2}$

One calls F k-self-concordant in an open convex domain if

$LaTeX: | \mbox{F}'''(0) | \le 2 k \mbox{F}''(0)^{\frac{3}{2}}$

The logarithmic barrier function, associated with linear programming, is self-concordant with $LaTeX: k = 1.$ This further extends naturally to functions in $LaTeX: \textstyle \mathbb{R}^n.$

# Semi-assignment problem

This is like the assignment problem, except only the rows or the columns (not both) of the assignment matrix is constrained to equal 1. With linear objective, the problem is:

$LaTeX: \min \sum_{i, j} \mbox{C}(i, j) \mbox{ X}(i, j): \mbox{ X} \le 0, \sum_{i} \mbox{X}(i, j) = 1 \mbox{ for all } j.$

(The sum constraint could be over j, for all i.) An example is to assign jobs to people, but allow some jobs to be assigned to more than one person and some jobs not assigned at all. A more realistic example arises in biology. We are given a rotamer library of side chain confirmations, and each amino acid residue in a protein must be assigned some rotamer. The same rotamer could be assigned to more than one site, and some rotamers need not be assigned at all.

# Semi-definite program

$LaTeX: \textstyle \min \{cx: S(x) \in P\},$ where $LaTeX: \mbox{P}$ is the class of positive semi-definite matrices, and $LaTeX: \textstyle \mbox{S}(x) = S_0 + \sum_{j} x(j)S_j ,$ where each $LaTeX: \textstyle \mbox{S}_j,$ for $LaTeX: \textstyle j = 0,\dots,n$ is a (given) symmetric matrix. This includes the linear program as a special case.

# Semi-infinite program

A mathematical program with a finite number of variables or constraints, but an infinite number of constraints or variables, respectively. The randomized program is a semi-infinite program because it has an infinite number of variables when X is not finite.

# Sensitivity analysis

The concern with how the solution changes if some changes are made in either the data or in some of the solution values (by fixing their value). Marginal analysis is concerned with the effects of small perturbations, maybe measurable by derivatives. Parametric analysis is concerned with larger changes in parameter values that affect the data in the mathematical program, such as a cost coefficient or resource limit.

Under suitable assumptions, the multipliers in the Lagrange Multiplier Rule provide derivatives of the optimal response function – i.e., under certain conditions, $LaTeX: \textstyle (u, v) = \nabla f^* (b, c).$ For special approaches in LP, see compatibility theory, the 100% Rule, and the tolerance approach.

# Separable program

The functions are separable:

$LaTeX: f(x) = \sum_{j} f_j (x_j),$
$LaTeX: g(x) = \sum_{j} g_j(x_j),$
$LaTeX: \mbox{and } h(x) = \sum_{j} h_j(x_j).$

The classical (LP) approaches to separable programming are called lambda-form and delta-form, both using piece-wise linear approximations.

Let $LaTeX: \textstyle \{x^k\}$ be a specified set of points, where $LaTeX: \textstyle x^k = [x(k,1), x(k,2), ..., x(k,n)],$ and let $LaTeX: \textstyle y={y(k,j)}$ be decision variables that are the coefficients of convex combinations, giving the following linear program:

$LaTeX: \max \sum_{kj} y(k, j) f_j(x(k, j)) : y \ge 0, \sum_{k} y(k, j) = 1 \mbox{ for each } j,$

$LaTeX: \sum_{kj} y(k, j) g_j(x(k, j)) \le 0, \sum_{kj} y(k, j) h_j(x(k, j)) = 0.$

A restricted basis entry rule is invoked during the simplex method to yield an approximate solution. (However, this is dominated by the Generalized Lagrange Multiplier method, which can be viewed as generating the approximating breakpoints posteriori, getting successively finer near the solution.)

The delta form uses the differences: $LaTeX: \textstyle u(k, j) = x(k, j) - x(k-1, j).$ The associated functional differences are:

$LaTeX: \Delta f(k, j) = f_j(x(k, j)) - f_j(x(k-1, j));$
$LaTeX: \Delta g(k, j) = g_j(x(k, j)) - g_j(x(k-1, j));$
$LaTeX: \Delta h(k, j) = h_j(x(k, j)) - h_{j}(x(k-1, j)).$

Then, the approximating LP is:

$LaTeX: \max \sum_{kj} \Delta f(k, j) u(k, j)}: 0 \le u \le 1;$

$LaTeX: \sum_{kj} \Delta g(k, j) u(k, j)} \le b, \sum_{kj} \Delta h(k, j) u(k, j) = c,$

where $LaTeX: \textstyle b = -\sum_{j} g_j(x(0, j))$ and $LaTeX: \textstyle c = -\sum_{j} h_j(x(0, j))$ (a similar constant was dropped from the objective). Another restricted basis rule is invoked: $LaTeX: u(k,j) > 0$ implies $LaTeX: u(k,q)=1$ for all $LaTeX: q < j$ and all $LaTeX: k$.

# Separating hyperplane

A hyperplane for which two (given) sets lie in opposite halfspaces. The separation is strict if the two sets are contained in their respective open halfspace.

              ____   |   ____
|    |  |   \  /
| S1 |  |   /S2\
|____|  |   \__/
|
separating
hyperplane (line)


# Sequencing problems

Finding an ordering, or permutation, of a finite collection of objects, like jobs, that satisfies certain conditions, such as precedence constraints.

# Sequential decision process

See dynamic programming and Time-staged models.

# Sequential linear programming

(SLP). Solving a nonlinear program by a sequence of linear approximations and using linear programming to solve each one. The linear approximations are usually done by using the first-order Taylor expansion.

(SQP). Solving a nonlinear program by a sequence of quadratic approximations and using quadratic programming to solve each one. The approximations are usually done by using the second-order Taylor expansion.

# Sequential unconstrained minimization technique

(SUMT). This is the penalty function approach.

# Set constraint

A set constraint is a constraint on one or more set variables. For example, the following are common set constraints:

• $LaTeX: S \in [a, b]$ for ground values (e.g., integers) $LaTeX: a, b$
• $LaTeX: S \subseteq S_1$
• $LaTeX: S = S_1 \cup S_2$
• $LaTeX: S = S_1 \cap S_2$
• $LaTeX: S = S_1 \setminus S_2$
• $LaTeX: e \in S$
• $LaTeX: e \not\in S$
• $LaTeX: |S| \geq c$

# Set covering problem

See covering problem.

# Set variable

A variable (often in constraint programming) whose value is a set, and therefore whose domain is the power-set of all elements.

For example, the set of all people who will be in the group that will play golf Sunday starting at 3 PM can be represented as a set variable.

An economic term to denote the rate at which the optimal value changes with respect to a change in some right-hand side that represents a resource supply or demand requirement. (This is sometimes taken as synonymous with the dual price, but this can be erroneous, as in the presence of degeneracy – see marginal price.) Also see pricing.

# Sherman-Morrison formula

This is a useful identity:

$LaTeX: \left [ A + a b^T \right ]^{-1} = A^{-1} - \frac{A^{-1} a b^T A^{-1}}{1 + b^T A^{-1} a},$

where $LaTeX: A$ is a nonsingular $LaTeX: n \times n$ matrix and $LaTeX: a$ and $LaTeX: b$ are n-vectors, provided the inverse (on the left) exists.

# Shortest path

In a graph or network, this is a path from one node to another whose total cost is the least among all such paths. The "cost" is usually the sum of the arc costs, but it could be another function (e.g., the product for a reliability problem, or max for a fuzzy measure of risk). There are some particular labeling algorithms given.

# Signomial

The difference between two posynomials. This class of function defines the general geometric program.

# Simplex

(pl. simplices). $LaTeX: \textstyle \left \{ x \in \mathbb{R}: \sum x_j = 1 \right \}.$ For $LaTeX: n=1$, this is a point $LaTeX: (x=1).$ For n=2, this is a line segment, joining points (1,0) and (0,1). For $LaTeX: n=3,$ this is a triangle, joining the vertices (1,0,0), (0,1,0), and (0,0,1). This is sometimes called an n-simplex, denoted by $LaTeX: S_n$ (note its dimension is n-1). The open simplex excludes the axes: $LaTeX: \textstyle \{ x \in S_n: x > 0\}.$

More generally, some authors define a simplex to be the convex hull of any $LaTeX: n+1$ affinely independent vectors, and refer to the special case of the unit vectors as the standard simplex.

# Simplex method

An algorithm invented to solve a linear program by progressing from one extreme point of the feasible polyhedron to an adjacent one. The method is an algorithm strategy, where some of the tactics include [[Pricing|]]ricing and [[Pivot|]]ivot selection.

The elementary simplex method is the name of Dantzig's original (1947) algorithm, with the following rules applied to the standard form: $LaTeX: \textstyle \min \left \{ cx: Ax=b, x \ge 0 \right \}.$

Let $LaTeX: d_j =$ reduced cost of $LaTeX: x_j \;$ terminate if $LaTeX: d_j \ge 0 \mbox{ for all } j.$
1. Select $LaTeX: d_j < 0$ as one of greatest magnitude.
2. In the associated column ($LaTeX: j$) of the tableau, compute the min ratio: $LaTeX: x_i / a(i, j): a(i, j) > 0.$ (If $LaTeX: a(., j) \le 0,$ LP is unbounded).
3. Enter $LaTeX: x_j$ into the basic set, in exchange for $LaTeX: x_i$, and update the tableau.

Among the variations are:

The revised simplex method is the use of a particular factored form of the basis: $LaTeX: \textstyle B = \begin{bmatrix} E_1 & E_2 & \dots & E_k \end{bmatrix}$ (after k iterations), where each $LaTeX: E_i$ is an elementary matrix. Then, the revised simplex method uses forward transformation to pivot and backward transformation to update the pricing vector. (For this distinction, the elementary simplex method is sometimes called the tableau method.)

The simplex method draws its name from imagining a normalization constraint, $LaTeX: \textstyle \sum_{j} x_j = 1,$ and thinking of the j-th column of $LaTeX: A$ to be selected by the weight $LaTeX: x_j$ in $LaTeX: S_w.$ Then, at an iteration, an m-simplex is specified by the basic variables, and an adjacent simplex is chosen to improve the objective value. This view is in requirements space.

(For a different algorithm, used as a heuristic in NLP, see the Nelder-Mead simplex method.)

See pricing.

# Simplicial subdivision

Given a simplex, $LaTeX: S$, its simplicial subdivision is a collection of simplices, say $LaTeX: \{T_i\}$ such that $LaTeX: \textstyle \lor_i\{T_i\} = S$ and for any $LaTeX: i$ and $LaTeX: j$, either the intersection $LaTeX: \textstyle T_i \land T_j$ is empty or equals the closure of a common face. The mesh of the subdivision is the diameter of the largest sub-simplex. This arises in a fixed point approach to compute an economic equilibrium.

# Simulated annealing

An algorithm for solving hard problems, notably combinatorial programs, based on the metaphor of how annealing works: reach a minimum energy state upon cooling a substance, but not too quickly in order to avoid reaching an undesirable final state. As a heuristic search, it allows a non-improving move to a neighbor with a probability that decreases over time. The rate of this decrease is determined by the cooling schedule, often just a parameter used in an exponential decay (in keeping with the thermodynamic metaphor). With some (mild) assumptions about the cooling schedule, this will converge in probability to a global optimum.

# Skew symmetric matrix

$LaTeX: A$ is square and $LaTeX: A^T = -A.$

# Slack variable

In an inequality constraint of the form $LaTeX: \textstyle g(x) \le b,$ the slack is $LaTeX: \textstyle b - g(x),$ which is designated by the slack variable, $LaTeX: s$. Then, the original constraint is equivalent to the defining equation, $LaTeX: \textstyle g(x) + s = b, \mbox{ plus } s \ge 0.$

# Slater condition

Originally for the purely inequality system with g convex, it means there exists $LaTeX: x$ for which $LaTeX: g(x) < 0.$ More generally, for a mathematical program in standard form, it means there exists $LaTeX: x \in X$ for which $LaTeX: \textstyle g(x) < 0 \mbox{ and } h(x) = 0.$

# Smooth

A function is smooth if it has continuous first derivatives.

# Soft constraint

A soft constraint can be seen as a preferential constraint whose satisfaction is not required but preferred. Originally, soft constraints were proposed to model and solve over-constrained problems, i.e., the problems where there is no solution satisfying all the constraints. By weakening some constraints into soft constraints, we can find a solution satisfying all the hard constraints and as many soft constraints as possible.

# Spanning tree

A subgraph that is a tree containing all nodes. The max weight spanning tree problem is to find a spanning tree such that the sum of (given, positive) weights of the edges is a maximum.

The max spanning tree problem is solvable by the following greedy algorithm:
Input. connected graph with weights, $LaTeX: w_1 \ge \dots \ge w_m.$
Output. maximum weight spanning tree, T.
• Initialization: Set $LaTeX: k=1; T =$ graph with given nodes and no edges.
• Iteration (until $LaTeX: k=m-1$): Test if the k-th edge forms a cycle with T. If not, add it to T; if so, discard the edge. In either case, increment k and iterate.

Here is a supplement, Greedy Algorithms for Minimum Spanning Tree.

# Sparsity

The fraction of zeroes in a matrix. If $LaTeX: \textstyle A$ is $LaTeX: \textstyle m \times n,$ and $LaTeX: \textstyle A_{(i, j)} \ne 0$ for $LaTeX: k$ of its elements, its sparsity is $LaTeX: \textstyle \frac{k}{mn}.$ Large linear programs tend to be very sparse, increasingly so as the dimensions grow. For example, consider the standard transportation problem with s sources and d destinations. This has $LaTeX: \textstyle m = (s+d)$ constraints and $LaTeX: \textstyle n = sd$ variables. Each column, however, has exactly 2 nonzeroes since A is the incidence matrix of the network, so its sparsity is $LaTeX: \textstyle \frac{2n}{mn} = \frac{2}{m},$ which decreases as the number of sources and/or destinations grows large.

The sparsity of a simple graph (or network) is the sparsity of its adjacency matrix. More generally, the sparsity of a multigraph refers to the average degree of its nodes.

See super sparsity for extensions of this idea.

# Specially ordered set

(SOS). These are sets of non-negative variables that are required to sum to 1. For computational efficiency, it is sometimes better to define these sets by some marking data structure, rather than include them along with other equality constraints. There are two types of SOSs, distinguished by what they represent. A Type 1 SOS is when each variable is binary, so the constraint is one of Multiple Choice. A Type 2 SOS is when a restricted basis entry rule is used, as in the lambda-form of separable programming.

The spectral radius of a matrix, $LaTeX: A$. The radius of a disk that contains the spectrum:

$LaTeX: \textstyle r(A) = \max \left \{\left | y \right |: y \mbox{ is an eigenvalue of } A \right \}.$

# Spectrum

The set of eigenvalues of a matrix.

See convergence.

# Stability region

The set of parameter values for which an optimal solution remains optimal. This arises naturally when solving combinatorial programs, where a solution is often a subgraph, such as a tree, and the question is for what range of arc weights is this subgraph optimal (such as a spanning tree that is minimum for given weights). More generally, $LaTeX: x$ could be a solution generated by some algorithm, $LaTeX: A$, from an initial value $LaTeX: x_0.$ Then, suppose the feasibility region depends on the parameter $LaTeX: p$, say $LaTeX: F(p),$ and the objective also depends on $LaTeX: p$, say $LaTeX: f(x;p).$ Let $LaTeX: X(p,A,x_0)$ denote the generated solution from algorithm A, starting at $LaTeX: x_0,$ with parameter value $LaTeX: p.$ Let the parameter set be $LaTeX: P$ (which includes $LaTeX: p^*$). The stability region of $LaTeX: x^* = X(p^*,A,x_0)$ is $LaTeX: \textstyle \{ p \in P: x^* =& X(p,A,x_0)\}.$ The algorithm may be a heuristic, so $LaTeX: x^*$ need not be optimal. For example, one could use an n-Opt heuristic for the travelling salesman problem, so x represents a tour. The parameters could be the costs, or they could be the location of each point in a Euclidean TSP. The stability region is the set of costs, or coordinates in the plane, for which the tour generated by n-Opt is the same.

# Stable math program

Roughly, one whose solution does not change much under perturbation. For the inequality case, we have the following

Stability conditions:
1. $LaTeX: \textstyle \{x \in X: g(x) \le b\}$ is bounded for some $LaTeX: \textstyle b > 0.$
2. $LaTeX: \textstyle \mbox{cl} \{x \in X: g(x) < 0\} = \{x \in X: g(x) \le 0\}.$

The first stability condition pertains to upper semi-continuity and the second, called the closure condition, pertains to lower semi-continuity.

The conditions are not only sufficient to ensure the respective semi-continuity, but they are necessary when:
1. $LaTeX: \textstyle \{x \in X: g(x) \le 0\}$ is bounded.
2. $LaTeX: \textstyle \{x \in X: g(x) < 0 \} \ne \emptyset .$

# Stable set

Same as independent set

# Standard linearization

The standard way to linearize the product of two binary variables $LaTeX: x$ and $LaTeX: y$ is to replace $LaTeX: xy$ with a continuous variable $LaTeX: w$ and add four linear inequalities as auxiliary constraints:

$LaTeX: \begin{array}{ll} & w \le x, \\ & w \le y, \\ & w \ge x + y - 1, \\ and & w \ge 0. \end{array}$

Collectively, these imply $LaTeX: w = xy$ for all binary values of $LaTeX: x$ and $LaTeX: y$. This can be generalized for the product of binary variables $LaTeX: x_j$ for all $LaTeX: j$ in some index set $LaTeX: J$ by replacing $LaTeX: \textstyle \prod_{j \in J} x_j$ with a continuous variable $LaTeX: w$ and adding $LaTeX: |J| + 2$ auxiliary constraints:

$LaTeX: \begin{array}{ll} & w \le x_j \mbox{ for all } j \in J, \\ & w \ge \sum_{j \in J} x_j - (|J| - 1), \\ and & w \ge 0. \end{array}$

# Stationary point

Usually this is used to mean a Kuhn-Tucker point, which specializes to one for which grad_f(x)=0 if the mathematical program is unconstrained. In the context of an algorithm, it is a fixed point.

# Stationary policy

In a dynamic program, this is a policy that is independent of time – i.e., $LaTeX: \textstyle x^*(s,t) = T(s)$ (some function of state, but not of time, t).

# Steel beam assortment problem

A steel corporation manufactures structured beams of a standard length, but a variety of strengths. There is a known demand of each type of strength, but a stronger one may fulfill demand (or part thereof) for another beam (but not conversely). The manufacture of each type of steel beam involves a fixed charge for its setup. In addition, there is a shipping cost proportional to the difference in the demanded strength and the actual strength, and proportional to the quantity shipped.

Let $LaTeX: N$ = number of varieties of strengths $LaTeX: D(t)$ = demand for beam of strength $LaTeX: s_t$ (where $LaTeX: \textstyle s_1 \ge s_2 \ge \dots \ge s_N$ ) $LaTeX: x(t) =$ amount of beams of strength $LaTeX: s_t$ manufactured $LaTeX: \textstyle p_t(x(t)) =$ manufacturing cost of $LaTeX: x(t)$ units of beam of strength $LaTeX: s_t$ (incl. fixed charge) $LaTeX: y(t) =$ total excess of beams of strength $LaTeX: \textstyle s_1, \dots , s_t$ before fulfilling demand $LaTeX: \textstyle D(t+1), \dots , D(N) h_t(y(t)) =$ shipping cost $LaTeX: \textstyle ( = c [ s_{(t+1)}- s_t] \min \{y(t), D(t)\}).$

Although $LaTeX: t$ does not index time, the mathematical program for this problem is the same form as the production scheduling problem, using the inventory balance equations to relate $LaTeX: y$ and $LaTeX: x$. This is valid because $LaTeX: \textstyle s_1 \ge s_2 \ge \dots \ge s_N$ implies $LaTeX: y(t)$ can be used to fulfill demand $LaTeX: \textstyle D(t+1) + D(t+2) + \dots + D(N).$ (Also, note that here $LaTeX: h_t$ is not a "holding cost".)

# Steepest ascent

(descent, if minimizing). This is a class of algorithms, where $LaTeX: \textstyle x' = x + sd,$ such that the direction vector d is chosen by maximizing the initial "velocity" of change, and the step size $LaTeX: (s)$ is chosen by line search. Generally used in the context of unconstrained optimization, the mathematical program is $LaTeX: \textstyle \max \{f(x): x \in R^n\},$ where $LaTeX: \textstyle f \in C^1.$ (For descent, change Max to Min.) Then, $LaTeX: d$ is chosen to maximize the first-order Taylor approximation, subject to a normalization constraint: $LaTeX: \textstyle \max \{ \nabla f(x) d: ||d|| = 1 \},$ where $LaTeX: ||d||$ denotes the norm of the direction vector, $LaTeX: d$. When the Euclidean norm is used, this yields the original steepest ascent algorithm by Cauchy, which moves in the direction of the gradient:

$LaTeX: d = \nabla f(x) / ||\nabla f(x)||.$

(No direction vector is sought if $LaTeX: \textstyle \nabla f(x)=0;$ such algorithms stop when reaching a stationary point.)

Other norms, such as $LaTeX: \textstyle ||d||^2 = d' Q d,$ where $LaTeX: Q$ is symmetric and positive definite, lead to other directions that are steepest relative to that norm. In particular, if $LaTeX: \textstyle Q = H_f(x),$ this yields the modified Newton method.

# Steiner problem

Find a subgraph of a graph, say $LaTeX: \textstyle G' = [V',E'],$ such that $LaTeX: V'$ contains $LaTeX: V^*$ (a specified subset of nodes), and $LaTeX: \textstyle \sum \{c(e): e \in E'\}$ is minimized. It is generally assumed $LaTeX: \textstyle c \ge 0.$ When $LaTeX: \textstyle |V^*|=2,$ this is the shortest path problem. When $LaTeX: \textstyle |V^*|=|V|,$ this is the (minimum) spanning tree problem.

# Step size

This is a scalar $LaTeX: (s)$ in an algorithm of the form: $LaTeX: \textstyle x' = x + sd,$ where $LaTeX: d$ is the direction vector. After $LaTeX: d$ is chosen (nonzero), the step size is specified. One step size selection rule is line search; another is a fixed sequence, like $LaTeX: \textstyle s_k = 1/k.$

# Stochastic matrix

A non-negative matrix whose row sums are each 1. (A column stochastic matrix is one whose column sums are 1.) This arises in dynamic programs whose state transition is described by a stochastic matrix containing the probabilities of each transition.

# Stochastic program

It is written in the form of a mathematical program extended to a parameter space whose values are random variables (generally with known distribution function). This is converted to a standard form by forming a certainty equivalent.

• Here are some certainty equivalents:
• Average value. Replace all random variables with their means.
• Chance constraint. Given a stochastic program with a random variable, p, in its constraint: $LaTeX: \textstyle g(x; p) \le 0,$ a certainty equivalent is to replace this with the constraint, $LaTeX: \textstyle P \left [g(x; p) \le 0 \right ] \ge a,$ where $LaTeX: P[]$ is a (known) probability function on the range of $LaTeX: g$, and $LaTeX: a$ is some acceptance level ($LaTeX: a=1$ means the constraint must hold for all values of $LaTeX: p$, except on a set of measure zero). Some models separate constraints with several levels:

$LaTeX: P \left [g_i(x; p) \le 0 \mbox{ for all } i \in I_k] \ge a_k \mbox{ for } k=1, \dots ,K.$

The case of one chance constraint with the only random variable being the right-hand side is particularly simple. Suppose $LaTeX: F$ is the cumulative distribution function of $LaTeX: b$ for the chance constraint $LaTeX: \textstyle P[g(x) \le b] \ge a.$ If $LaTeX: b$ is a continuous random variable and $LaTeX: F$ is continuous and strictly increasing, the chance constraint is equivalent to $LaTeX: \textstyle g(x) \le F^{-1}(1-a)$ (where $LaTeX: F^{-1}$ is the inverse function of $LaTeX: F$). In particular, if $LaTeX: \textstyle g(x) = Ax,$ the program remains linear.
• Recourse model. This assumes decisions are made over time where the effect of an early decision can be compensated by later decisions. The objective is to optimize the expected value. The 2-stage model has the form:

$LaTeX: \max f_1(x_1; p_1) + f_2(x_2; p_2): x_1 \in X_1, x_2 \in X_2, g(x_1; p_1) + g(x_2; p_2) \le 0.$

(Sums could be replaced by other operators.) Once $LaTeX: x_1$ is implemented, $LaTeX: p_1$ becomes known and $LaTeX: x_2$ is chosen. The certainty equivalent is: $LaTeX: \max E[f_1(x1; p1) + F_2(x_1 | p_1)]: x_1 \in X_1,$ where

$LaTeX: F_2(x_1 | p_1) = \sup \{E[f_2(x_2; p_2)]: x_2 \in X_2(p_2), g(x_2; p_2) \le -g(x_1; p_1)\}$

for all p2 (except on set of measure zero). (E[] denotes the expected value.) The 'Sup' is used to define $LaTeX: F_2,$ the second stage value for a particular value of $LaTeX: x_1,$ because the choice of $LaTeX: x_1$ might be infeasible. The nature of the recourse model is pessimistic: $LaTeX: x$ must be chosen such that the original constraints hold no matter what the values of the random variables. With a finite number of possibilities, this means a system of constraints for each possible realization of $LaTeX: \textstyle p=(p_1, p_2).$ This extends recursively to a k-stage model. The linear 2-stage recourse model has the form:

$LaTeX: \max E[c]x + E[F(x; p)]: Ax=b, x \ge 0,$

where

$LaTeX: F(x; p) = \max d(p)y: W(p)y = w(p) - T(p)x, y \ge 0.$

Here the second stage variable is denoted $LaTeX: y;$ it is determined after $LaTeX: x$ has been set and the random variable $LaTeX: p$ has been realized. The LP data depend on $LaTeX: p$ as functions, $LaTeX: d(p), W(p), w(p)$ and $LaTeX: T(p).$ The fixed recourse model has $LaTeX: W(p)=W.$ The complete recourse model assumes it fixed and $LaTeX: \textstyle \{Wy: y \ge 0\}$ is all of $LaTeX: \mathbb{R}^m$ (where $LaTeX: m$ = number of rows of $LaTeX: W$). This means that no matter what value of $LaTeX: x$ is chosen for the first stage, there is feasible recourse $LaTeX: (y).$ This is simple recourse if $LaTeX: \textstyle W=[I -I],$ so we can think of y as having two parts: $LaTeX: y^+$ and $LaTeX: y^-.$ The second stage LP simplifies to the following:

$LaTeX: \max d^-(p) y^+ + d^-(p) y^- : y^+, y^- \ge 0, y^+ - y^- = w(p) - T(p)x.$

Also see robust optimization. The certainty equivalent depends upon the underlying decision process. If it is adaptive, the recourse model applies (but RO might be more practical). The chance constraint model represents a notion of an allowed frequency of violations, as in environmental control models.

# Strict interior

Let $LaTeX: \textstyle x \in X : g(x) \le b$ be the level set of $LaTeX: g.$ Then, its strict interior is $LaTeX: \textstyle x \in X : g(x) < b.$ (This is not to be confused with the relative interior or the set interior - see Myths and Counter Examples in Mathematical Programming.)

# Strict optimum

Same as proper optimum.

# Strictly complementary

Each complementary pair of variables must have exactly one zero (the other positive).

# Strictly concave function

Negative is strictly convex.

# Strictly convex function

A convex function that also satisfies the defining inequality strictly for distinct points, say x and y:

$LaTeX: f(ax + (1-a)y) < af(x) + (1-a)f(y) \mbox{ for all } a \in (0,1).$

# Strictly quasiconcave function

Negative is strictly quasiconvex.

# Strictly quasiconvex function

$LaTeX: X$ is a convex set and $LaTeX: \textstyle f(ax + (1-a)y) < \max \{f(x), f(y)\} \mbox{ for all } x, y \in X$ for which $LaTeX: \textstyle f(x) \ne f(y),$ and $LaTeX: a$ is in (0, 1). (Note: $LaTeX: f$ need not be quasiconvex - see Myths and Counter Examples in Mathematical Programming.)

See dual

# Strongly concave function

Negative is strongly convex.

# Strongly convex function

Arises for $LaTeX: \textstyle f \in C^2:$ eigenvalues of hessian are bounded away from zero (from below): there exists $LaTeX: \textstyle K > 0$ such that $LaTeX: \textstyle h' H_f(x) h \ge K||h||^2$ for all $LaTeX: \textstyle h \in \mathbb{R}^n.$ For example, the function $LaTeX: \exp(-x)$ is strictly convex on $LaTeX: \mathbb{R},$ but its second derivative is $LaTeX: \textstyle \exp(-x),$ which is not bounded away from zero. The minimum is not achieved because the function approaches its infimum of zero without achieving it for any (finite) $LaTeX: x.$ Strong convexity rules out such asymptotes.

# Strongly quasiconcave function

Negative is strongly quasiconvex.

# Strongly quasiconvex function

($LaTeX: f$ on $LaTeX: X$). $LaTeX: X$ is a convex set and $LaTeX: \textstyle f(a \, {x} + (1-a)\, y) < \max\{f(x), f(y)\} \mbox{ for all } {x}, y \in X,$ with $LaTeX: \textstyle x \ne y,$ and $LaTeX: \textstyle a \in (0, 1).$

# Structural variable

See linear program.

$LaTeX: \textstyle f(x+y) \le f(x) + f(y)$ (where $LaTeX: x, y$ in the domain implies $LaTeX: x+y$ is in the domain).

# Subdifferential

(of $LaTeX: f$ at $LaTeX: x$). $LaTeX: \textstyle \partial f(x) = \{y: x \in \argmax\{vy - f(v): v \in X\}\}$ Also see conjugate function. If $LaTeX: f$ is convex and differentiable with gradient, $LaTeX: \textstyle \nabla f, \partial f(x) = \{\nabla f(x)\}.$

$LaTeX: \mbox{Example: } f(x) = | x |. \mbox{ Then, } \partial f(0) = \begin{bmatrix} -1 & 1 \end{bmatrix}.$

The subdiffenential is built on the concept of supporting hyperplane, generally used in convex analysis. When $LaTeX: f$ is differentiable in a deleted neighborhood of $LaTeX: x$ (but not necessarily at $LaTeX: x$), the B-subdifferential is the set of limit points:

$LaTeX: \partial_B f(x) = {d: \mbox{ there exists } \{x^k\} \to x \mbox{ and } \{\nabla f(x^k)\} \to d\}.$

If $LaTeX: f$ is continuously differentiable in a neighborhood of $LaTeX: x$ (including $LaTeX: x$), $LaTeX: \textstyle \partial_B f(x) = \nabla f(x).$ Otherwise, $LaTeX: \textstyle \partial_B f(x)$ is generally not a convex set. For example, if $LaTeX: \textstyle f(x) = | x |, \partial_B f(0) = \{-1, 1\}.$

The Clarke subdifferential is the convex hull of $LaTeX: \textstyle \partial_B f(x).$

A member of the subdifferential.

See convergence.

# Submodular function

Let $LaTeX: N$ be a finite set and let $LaTeX: f$ be a function on the subsets of $LaTeX: N$ into $LaTeX: \mathbb{R}$. Then, $LaTeX: f$ is submodular if it satisfies:

$LaTeX: f(S \lor T) \le f(S) + f(T) - f(S \land T) \mbox{ for } S, T \subset N.$

Example: $LaTeX: \textstyle N=\{0,1\}^n$ (binary n-vectors) and $LaTeX: \textstyle f(S) = \sum_{x \in S} \sum_{j=1}^{n} x_j.$ Instance: $LaTeX: \textstyle n=2,$ $LaTeX: \textstyle S=\{(0,1), (1,1)\},$ $LaTeX: \textstyle T=\{(1,0),(1,1)\}.$ Then, $LaTeX: \textstyle f(S \lor T) = 4,$ $LaTeX: \textstyle f(S)=3,$ $LaTeX: \textstyle f(T)=3,$ $LaTeX: \textstyle \mbox{and } f(S \land T) = 2.$ (The linearity of Sum makes the inequality hold with inequality.)

# Subspace

A subset of a vector space that is, itself, a vector space. An example is the null space of a matrix, as well as its orthogonal complement.

# Successive approximation

The iterative scheme by which an approximation is used for the basic design of an algorithm. The sequence generated is of the form $LaTeX: \textstyle x^(k+1) = x^k + A(x^k),$ where $LaTeX: A$ is an algorithm map specified by its approximation to some underlying goal. Typically, this is used to find a fixed point, where $LaTeX: A(x)=0$ (e.g., seeking $LaTeX: f(x)=x,$ let $LaTeX: A(x) = f(x) - x,$ so the iterations are $LaTeX: x^(k+1) = f(x^k),$ converging to $LaTeX: x^* = f(x^*)$ if f satisfies certain conditions, such as a contraction map).

# Sufficient matrix

Let $LaTeX: A$ be an $LaTeX: n \times n$ matrix. Then, $LaTeX: A$ is column sufficient if

$LaTeX: \left [ x_i (Ax)_i \le 0 \mbox{ for all } i \right ] \Rightarrow \left [ x_i (Ax)_i = 0 \mbox{ for all } i \right ].$

$LaTeX: A$ is row sufficient if its transpose is column sufficient. $LaTeX: A$ is sufficient if it is both column and row sufficient. One example is when $LaTeX: A$ is symmetric and positive semi-definite. Here is an example of a matrix that is column sufficient, but not row sufficient:

$LaTeX: A = \begin{bmatrix} 0 & 1 \\ 0 & 1 \end{bmatrix} .$

This arises in linear complementarity problems.

# Super sparsity

The fraction of distinct nonzero values in a matrix. If a matrix has z nonzeros -i.e. z = sparsity × m × n, then the super-sparsity is number of distinct values divided by z. Although related to the idea of sparsity, a matrix can be highly super-sparse yet not sparse. For example if $LaTeX: \textstyle A_{(i, j)} = 1$ for all $LaTeX: i,j,$ then sparsity is 0 but super-sparsity is $LaTeX: 1/mn.$

These matrices arise naturally in models with repetitive substructures. For example, a process may be modeled as a linear program with activities that convert raw materials into finished products (like a refinery converting crude oil into petroleum products). The yields are determined by the engineering, so this model would be the same if embedded into a larger linear program that uses it in numerous regions. Then, the yield values are replicated for each region, which makes the resulting linear program matrix super-sparse. Additionally, any network problem in which the coefficient matrix is the incidence matrix is super-sparse. This leads to the special case of only zeros and ones.

# Superbasic variable

When using a method like the convex simplex method or projected gradient, the nonbasic variables are partitioned into those that are at one of their bound values and those that are not. The latter are called superbasic.

# Superconsistent

Arises in geometric programming, and in semi-infinite programming as one that satisfies the Slater interiority condition.

See convergence.

# Supermodular function

Negative is submodular.

# Support set

of $LaTeX: \textstyle x \ge 0. \left \{j : x_j > 0 \right \}.$

# Supporting hyperplane

Supporting hyperplane of a set, S. A hyperplane that contains S in one of its closed halfspaces and intersects the closure of S with at least one point.

Suppose S is closed and convex. A key fact is that every supporting hyperplane contains an extreme point of S. If S is a polyhedron, the facets define a finite collection of supporting hyperplanes that completely determine the polyhedron (as the intersection of the associated halfspaces that contain S).

# Supremum

(abbr. Sup). This is the least upper bound on a (real-valued) function over (a subset of) its domain. If $LaTeX: f$ is unbounded from above, $LaTeX: \textstyle \sup \{f\} = \infty,$ and if the domain is empty, $LaTeX: \textstyle \sup \{f \} = -\infty.$ Otherwise, suppose U is any upper bound: $LaTeX: \textstyle f(x) \le U$ for all $LaTeX: x \in X.$ $LaTeX: U$ is a least upper bound if, for any $LaTeX: e > 0,$ there exists $LaTeX: x$ in the domain for which $LaTeX: \textstyle f(x) \ge U-e.$ (That is, we can get arbitrarily close to $LaTeX: U$ in the range of $LaTeX: f.$) Note that the supremum is always defined, and its range is in the extended reals. The supremum is the maximum, if it is attained by some point in its domain.

# Surplus variable

In an inequality constraint of the form $LaTeX: \textstyle g(x) \ge b,$ the level of surplus is $LaTeX: g(x)-b,$ which is designated by the surplus variable, $LaTeX: s.$ Then, the original constraint is equivalent to the defining equation, $LaTeX: g(x) - s = b,$ plus the non-negativity, $LaTeX: s \ge 0.$

# Surrogate constraint

An aggregation of the constraints, such as a (non-negative) linear sum of the (inequalities and) equations. Of special interest is the integer equivalent aggregation. (Also see the surrogate dual in the supplement on duals.)

# Surrogate dual

See it in the supplement on duals.

# Surrogate relaxation

Solving the surrogate dual problem, see the supplement on duals. For given multipliers $LaTeX: \textstyle (u,v) (u \ge 0),$ the surrogate dual objective is the relaxed mathematical program:

$LaTeX: \max \left \{f(x) : x \in X, ug(x) \le 0, vh(x) = 0 \right \}.$

(More generally, the surrogate could be a nonlinear combination of the constraints.)

# Symbolic CSP

A CSP is symbolic if its variables range over non-numeric domains.

# Symmetric dual

See it in the supplement on duals.

# Symmetry exclusion

Symmetry exclusion. Excluding symmetric solutions, usually when solving combinatorial programs. For example, suppose xij = 1 if item i is assigned to be at position j (in 2 or 3 dimensional space); otherwise, xij=0. When branching on xij=1, the alternative optimal value x_ik=1 is being considered for position k that maps from j under reflection, translation, or rotation. So, when considering the complementary branch, xij=0, the symmetry exclusion condition, xik=0, is also added. Symmetry exclusion can sometimes be done by ordering variables, or with cutting planes. It arises in most problems with a geometric meaning, including the symmetric traveling salesman problem, where every tour, (1,2,...,n,1), has the symmetric solutions (2,3,...,n,1,2) and (1,n,n-1,...,2,1). The first of these symmetries is excluded by defining city 1 as the first (or "home") city. Fixing the home does not exclude the second symmetry, which is a tour reversal. Non-geometric problems can also have symmetries, such as graph coloring; any coloring has the symmetric solution by swapping colors – e.g., every red node becomes green, and every green node becomes red.