# BFGS method

This is a method to solve an unconstrained nonlinear program that proceeds as follows.

1. Start with any symmetric, negative definite matrix, say $LaTeX: B^0$, and any initial point $LaTeX: x^0$. Set $LaTeX: k = 0$
2. Direction: Solve $LaTeX: B^k d^k = -\nabla f(x^k)$.
3. Step size: $LaTeX: s^k \in \arg\max \{ f(x^0 + td^k) : t \ge 0 \}$.
4. Change in position: $LaTeX: p^k = s^k d^k$.
5. New point: $LaTeX: x^{k+1} = x^k + p^k$.
6. Change in gradient: $LaTeX: q^k = \nabla f(x^{k+1}) - \nabla f(x^k)$.
7. Update $LaTeX: B^k$ to form $LaTeX: B^{k+1}$ with the BFGS update.
8. Let $LaTeX: k = k+1$ to complete the iteration.

Compare this with the DFP method, and note a key difference is that DFP estimates the inverse hessian, whereas BFGS estimates the hessian (and solves the linear system). There is a correspondance, and empirical evidence suggests, that the BFGS method has less of a problem with error.

# BFGS update

A way to update an approximation of the hessian, used for unconstrained optimization. The update is

$LaTeX: B' = B + \frac{(q - Bp)(q - Bp)^T}{p^T(q - Bp)},$

where $LaTeX: B'$ is the update from $LaTeX: B$ and $LaTeX: p$ and $LaTeX: q$ are defined in the BFGS method for the $LaTeX: k$-th iteration. Taking the inverse to compare with the DFP update,

$LaTeX: H' = H + \frac{1 + q^T H q}{(q^T p)^2} \left( pp^T \right) - \frac{1}{p^T q} \left( pq^T H + H qp^T \right)$

where $LaTeX: H'$ is the update from $LaTeX: H$.

# Backbone

Originally, this was a set of propositions that must be true in every solution of a satisfiability problem. If there is no solution, one seeks a set of truth values that maximizes the number of clauses that are true. Then, the backbone is the set of propositions that must be true in every optimal solution. This has been extended to various combinatorial programs, for which the backbone is

$LaTeX: \{j : x_j = 1 \mbox{ for all } x \in X^* \},$

where $LaTeX: x$ is a vector of binary variables in the IP formulation and $LaTeX: X^*$ is usually the optimality region, although it could allow some ball around the objective value. Some results suggest that the difficulty of an NP-hard problem can be further analyzed by the size of itsbackbone.

# Backjumping

A backtracking algorithm that, in contrast to classic backtracking algorithms, records partial justifications for inferences that are made during the tree search. These justifications allow the search to soundly jump back several levels in the search tree in case of an inconsistency.

# Backtracking

An algorithm applied in tree search that builds up candidate solutions by making one decision at a time (i.e., branching, see branch and bound) and checks for feasibility at each decision. In case an infeasible state is reached,the algorithm "backtracks", returning to a previous decisions and changing it.

In constraint programming a narrower definition is often used: backtracking is a recursive algorithm that traverses the search tree depth-first, such that at each step a variable is assigned a value (all possible values are assigned in turn) and then the partial assignment is checked for consistency, i.e. the algorithm checks whether all constraints whose variable are fully assigned, are satisfied. If there is an inconsistency, the algorithm prunes the sub-tree and tracks back one step.

# Backward Transformation

Backward Transformation (abbr. BTRAN). This applies to the factored system, $LaTeX: u E_1 E_2 \ldots E_n = c$, where each $LaTeX: E_i$ is an elementary matrix. The recursion is:

$LaTeX: u_n E_n = c$
$LaTeX: u_{n-1} E_{n-1} = u_n$
...
$LaTeX: u_{1} E_{1} = u_2$,
for which $LaTeX: u_1$ is a solution to the original system.

# Backward substitution

The recursion to solve a nonsingular upper triangular system, $LaTeX: Ux = b$. It starts with $LaTeX: x_n = b_n / U_{n,n}$, then

$LaTeX: x_j = \frac{1}{U_{j, j}} \left( b_j - \sum_{i > j} U_{i,j} x_i \right) \mbox{ for } j = n-1, \ldots ,1.$

# Backward triangularization

An algorithm to rearrange a matrix by recursively assigning a singleton column to the rear (with its only row as its pivot). The recursion applies to the submatrix defined by omitting the assigned column and row (and reducing remaining column counts accordingly). This results in a lower-triangular rearrangement if and only if such a rearrangement exists. Otherwise, the algorithm reaches a submatrix will all columns having more than one element, of which one is selected to be a spike that is assigned to the rear of the matrix. The process then continues on the submatrix.

Example 1: Let the nonzeros be denoted by *:

$LaTeX:

 A = \begin{bmatrix} * & & & \\ * & * & * & * \\ * & & & * \\ * & & * & * \end{bmatrix}

$

The column counts are $LaTeX: (4,1,2,3)$. We thus put column $LaTeX: 2$ at the rear, assigned to pivot on row $LaTeX: 2$, and consider the remaining $LaTeX: 3\times 3$ submatrix:

$LaTeX:

A^1 = \begin{bmatrix} * & & & \cdot & \\ * & & * & \cdot & \\ * & * & * & \cdot & \\ \cdot & \cdot & \cdot & \cdot & \cdot \\ * & * & * & \cdot & * \end{bmatrix}

$

The remaining column counts (for $LaTeX: A^1$) are $LaTeX: (3,1,2)$, so column $LaTeX: 2$ is placed at the (new) rear, pivoting on row $LaTeX: 3$:

$LaTeX:

A^2 = \left[\begin{array}{cc|cc} * & & & \\ * & * & & \\ \hline * & * & * & \\ * & * & * & * \end{array}\right]

$

The remaining column counts are $LaTeX: (2,1)$, so we reach the final rearrangement, which is triangular:

$LaTeX:

A^3 = \left[\begin{array}{cc cc} * & & & \\ * & * & & \\ * & * & * & \\ * & * & * & * \end{array}\right]

$

Example 2: Column counts are shown below the matrix.

$LaTeX:

\begin{array}{l} A = \left[\begin{array}{cccc} * & & & \\ * & * & * & * \\ * & * & & * \\ * & & * & * \end{array}\right] \\ \phantom{A = \left[\right.}~ \begin{array}{cccc} 4 & 2 & 2 & 3 \end{array} \end{array}

$

All column counts are greater than one, so we select a column to be a spike, and we assign it to the rear with a pivot row (some algorithms delay the pivot row assignment). For this example, we choose a column with minimum count and pick the pivot row with maximum count.

$LaTeX:

\begin{array}{l@{\,\Rightarrow\,}l} A^1 = \left[\begin{array}{ccc | c} * & & & \\ * & * & & \\ * & & * & * \\ \hline * & * & * & * \end{array}\right] & A^2 = \left[\begin{array}{cc |cc} * & & & \\ * & * & & * \\ \hline * & & * & \\ * & * & * & * \end{array}\right] = A^3. \end{array}

$

The final rearrangement ($LaTeX: A^3$) has one spike.

# Balance equation

A constraint of the form $LaTeX: I(x) - O(x) = 0$, where $LaTeX: I(x)$ is the input and $LaTeX: O(x)$ is the output, each as a function of the decision variables, $LaTeX: x$. Typically this is linear, conserving flows, measured in units of volume or mass. In an ordinary network in which there are no gains or losses, this appears as:

$LaTeX: I(x) - O(x) = \sum_{i} x_{i,j} - \sum_{k} x_{j,k} = 0.$

# Barrier function

$LaTeX: P$ is a barrier function on the strict interior, say $LaTeX: I$, of the set $LaTeX: S$ if $LaTeX: P$ is continuous, and $LaTeX: P(x) \rightarrow \infty$ as $LaTeX: x$ approaches any point in $LaTeX: S\backslash I$. Barrier functions arises in penalty function methods for certain classes that include convex programs of the form:

$LaTeX: \min \{ f(x) : g(x) \ge 0 \},$

where $LaTeX: I = \{ x : g(x) > 0 \}$. Two popular choices are

$LaTeX: \textstyle P(x) = \sum_i 1 / g_i(x),$

and

$LaTeX: \textstyle P(x) = -\sum_i \ln (g_i(x)).$

# Barycenter

This is the center of mass and is the point at which a mass (body) can be considered as being concentrated without altering the effect of the attraction that the earth has upon it. This is the average position of particles located at postions $LaTeX: x^i$ weighted by their masses $LaTeX: m_i$,

$LaTeX: \frac{\sum_i m_i x^i}{\sum_i m_i}.$

In mathematical programming, this is usually used to mean the simple average of the extreme points of a polytope,

$LaTeX: \textstyle \frac{1}{m} \sum_i x^i,$

where $LaTeX: \{x^1, \ldots, x^m\}$ is the set of extreme points. This could be a weighted average,

$LaTeX: \textstyle \sum_i w_i x^i,$

where $LaTeX: w \in S_m$.

# Basic

Associated with a submatrix of $LaTeX: A$, say $LaTeX: B$, whose columns comprise a basis for $LaTeX: \mathbb{R}^m$ (i.e., $LaTeX: B$ consists of $LaTeX: m$ linearly independent columns of $LaTeX: A$, which is a basis for $LaTeX: \mathbb{R}^m$).

Here are some related terms that arise in linear programming.

• Adjacent basis. One that differs in exactly one column from a given basis.
• Basic column. A column of the basis matrix.
• Basic variable. The variable, say $LaTeX: x_j$, associated with the $LaTeX: j$-th column of the basis matrix.
• Basic level. The value of a basic variable.
• Basic solution. The solution, $LaTeX: x$, obtained by setting nonbasic values to some bound value, like 0, resulting in a unique solution for the basic variables. That is, $LaTeX: Ax=b$ is equivalent to $LaTeX: Bu + Nv = b$, where $LaTeX: A=[B \; N]$ and $LaTeX: x=[u^T \; v^T]^T$. Upon fixing the value of $LaTeX: v$, the nonsingularity of $LaTeX: B$ gives the basic solution with $LaTeX: u = B^{-1}(b - Nv)$. In a standard linear program, $LaTeX: v=0$, and hence $LaTeX: u = B^{-1} b$.
• Basic feasible solution. A basic solution that is feasible --i.e., the basic values satisfy their bounds. (In a standard LP, this means $LaTeX: B^{-1}b \ge 0$.)
• Basis kernel. After performing forward triangularization, if the basis does not triangularize completely, backward triangularization is applied. The result is a (rearranged) blocking of the basis into three segments:
                |\
| \ <--- Forward triangle
|__\ ______
|   |      |
|   |      | <--- Kernel
|   |______|
|          |\
|          | \ <--- Backward triangle
|__________|__\

Each row and column in the kernel has at least 2 nonzeroes.

# Basic solution

See terms associated with being basic.

# Benders decomposition

This decomposition separates complicating variables. The (original) semi-linear model is:

$LaTeX: \max \{ c^Tx + F(y) : x \in X, \; y \in Y, \; Ax + G(y) \le b \}.$

Generalized Benders decomposition considers the following mathematical program:

$LaTeX: \max \{ f(x, y) : g(x, y) \le 0, \; h(x, y) = 0, \; x \in X, \; y \in Y \}.$

Suppose this is not a convex program, but that with each fixed $LaTeX: y \in Y$, the remaining mathematical program (with $LaTeX: x$ being the only variable) is convex. The idea is to consider

$LaTeX: v(y) = \sup \{ f(x, y) : g(x, y) \le 0, \; h(x, y) = 0, \; x \in X \},$

and solve $LaTeX: \max \{v(y) : y \in Y\}$, with evaluations of $LaTeX: v$ requiring the solution to the convex program.

# Biconcave function

A function whose negative is biconvex.

# Biconvex function

A function $LaTeX: f(x, y)$ on $LaTeX: X \times Y$ that is convex on $LaTeX: X$ for each $LaTeX: y \in Y$ and convex on $LaTeX: Y$ for each $LaTeX: x \in X$. An example is $LaTeX: f(x,y) = xy$ on $LaTeX: \mathbb{R}^2$, which is not convex in both $LaTeX: x$ and $LaTeX: y$ together.

# Biconvex program

A nonlinear program with each function biconvex or biconcave, such that if $LaTeX: x$ or $LaTeX: y$ is fixed, the resulting mathematical program in $LaTeX: y$ or $LaTeX: x$, resp., is a convex program. A special case is the bilinear program.

# Bidding algorithm

Also called an auction algorithm. This proceeds by generating a sequence of partial solutions (e.g., a partial assignment), terminating when the solution is complete. There is a pricing vector and a bidding increment whereby the solution is revised and extended by a myopic optimization scheme (which depends on the particular problem).

# Big-M method

An alternative to using Phase I, a large number $LaTeX: M$ is used as a linear penalty in the composite objective:

$LaTeX: \min \{ c^T x + M \; e^T v: Ax + v = b, \; x, v \ge 0 \},$

where $LaTeX: v$ is an artificial variable and $LaTeX: e = (1, 1, ..., 1)^T$.

# Bilevel program

A multilevel program with two levels (sets of variables).

# Bilinear function

A special quadratic function, $LaTeX: x^T Q y + c^Tx + b^Ty$, that is linear in $LaTeX: y$ for each $LaTeX: x$, and linear in $LaTeX: x$ for each $LaTeX: y$.

# Bilinear program

A nonlinear program whose objective and/or constraint functions are bilinear. An example is the pooling problem.

# Bin packing problem

Partition a set of integers, $LaTeX: \{c_1, \ldots, c_n\}$, into bins such that the sum of all integers in each bin does not exceed the bin capacity, say $LaTeX: b$, so as to minimize the number of bins required. The problem is NP-hard.

# Binary CSP

A constraint satisfaction problem is binary if all constraints are either unary or binary relations (wrt the decision variables). That is, the cardinality of the scope of the constraint is either 1 or 2. For example, $LaTeX: x > 2$ (unary) or $LaTeX: x \neq y$ (binary).

# Binary constraint

A constraint is binary if there are exactly two variables in its scope.

# Binary relation

A subset of $LaTeX: S \times T$, say $LaTeX: R$. When $LaTeX: T=S$, we refer to just the relation on the set $LaTeX: S$.

Here are 5 properties that distinguish relations on a set ($LaTeX: x$, $LaTeX: y$, $LaTeX: z$ are in $LaTeX: S$):

1. antisymmetric - $LaTeX: (x, y) \in R$ and $LaTeX: x \ne y$ imply $LaTeX: (y, x) \not\in R$.
2. irreflexive - $LaTeX: (x, x) \not\in R$.
3. reflexive - $LaTeX: (x, x) \in R$.
4. symmetric - $LaTeX: (x, y) \in R$ implies $LaTeX: (y, x) \in R$.
5. transitive - $LaTeX: (x, y)$ and $LaTeX: (y, z)$ both in $LaTeX: R$ imply $LaTeX: (x, z) \in R$.

# Binary variable

A decision variable with two feasible values, usually 0 or 1.

# Binding constraint

A constraint whose removal changes the optimality region. (Some authors require that the removal results in a strict improvement in the objective value.)

# Bisection

Taking the midpoint of endpoints known to contain a solution. If the solution is in the interval (a,b), bisection chooses x = (a+b)/2 as the next point to evaluate. The result reduces the interval to (a,x) or (x,b), so the length is cut in half with each evaluation.

# Bland rule

This is for pivot selection in the simplex method to avoid cycling:

• If more than one (nonbasic) column has a negative (for minimization) reduced cost, choose the one with lowest index.
• If more than one (basic) column has the same determining value to leave the basis, select the one with the lowest index.

Beale's example of cycling shows that Bland's Rule must apply to all candidates for entering the basis, not just those with most negative reduced cost, see the supplement on cycling. The fifth tableau is the first time Bland's Rule is needed to break the cycle. Variable $LaTeX: x_1$ is chosen to enter the basis rather than $LaTeX: x_5$.

# Blending problem

A blend is some combination of materials to make another material. Given raw materials, their blends make intermediate materials, called stock, and/or final materials, called products (there can be other blending activities that make products from stocks). The raw materials have purchase costs and the blending activities have costs of operation and maintenance. The products have either fixed demands or selling prices, and the problem is to find blends that minimize total net cost (or maximize profit). This arises in refinery operations, in the food industry, and in other process industries. The problem can sometimes be solved with linear programming, but there can be complicating relations that require nonlinear programming, such as the pooling problem.

# Block pivot

Given a detached coefficient form, a pivot is performed on a nonsingular submatrix of nonbasic variables (rather than just one element). Suppose the detached coefficient form is partitioned into 2 row blocks and 5 column blocks (with the last column corresponding to the right-hand side):

$LaTeX: A = \left[ \begin{array}{ccccc} I & 0 & P & R & e \\ 0 & I & Q & S & f \end{array} \right].$
(Note: each $LaTeX: I$ is an identity matrix of appropriate dimension.) If $LaTeX: P$ is nonsingular, the basis can change by entering all (nonbasic) variables associated with the columns of $LaTeX: P$ for all basic variables associated with the rows of $LaTeX: P$.

# Blocking polyhedron

The blocking polyhedron of a polyhedron $LaTeX: P$ is $LaTeX: \{ y \in \mathbb{R}^n_+ : y^T x \ge 1 \mbox{ for all } x \in P\}$.

# Bolzano-Weierstrass theorem

A fundamental existence theorem guaranteeing a maximum and a minimum.

A continuous function on a non-empty compact set achieves a minimum and a maximum over the set.

A useful corollary to the Bolzano-Weierstrass theorem is:

Suppose $LaTeX: f$ is continuous on its non-empty feasible region, $LaTeX: F$, and that $LaTeX: F$ is a closed set. Then, $LaTeX: f$ achieves a maximum on $LaTeX: F$ if there exists $LaTeX: x^0$ such that $LaTeX: \{ x \in F : f(x) \ge f(x^0) \}$ is bounded.

In particular, if we have a quadratic objective of the form $LaTeX: f(x) = x^T Q x + c^T x$, it is sufficient for $LaTeX: Q$ to be negative definite. Although often credited to Weierstrass alone, it was proven by Bolzano in 1817, and it was known to Cauchy near the end of the 19th century.

# Bordered hessian

The complete Hessian of the Lagrangian:

$LaTeX: \left[ \begin{array}{cc} H_x[L(x,u,v)] & G^T \\ G & 0 \end{array} \right],$
where
$LaTeX: G = \left[ \begin{array}{cc} \nabla g(x) & 0 \\ 0 & \nabla h(x) \end{array} \right].$
Here we let $LaTeX: H_x[L(x,u,v)]$ be the Hessian of the Lagrangian (L) with respect to $LaTeX: x$ only. The southwest and northeast blocks are the constraint gradients because they give the cross derivative between $LaTeX: x$ and the Lagrange multipliers $LaTeX: (u,v)$.

# Bottleneck TSP problem

The problem asks if there a traveling salesman problem tour whose value is less than some given value?

# Bottleneck assignment problem

This is the assignment problem, where the "cost", $LaTeX: c_{i, j}$, is the rate at which person $LaTeX: i$ can perform job $LaTeX: j$. The objective is to maximize the rate of production of the entire assembly line.

# Bottleneck transportation problem

This is the transportation problem with a traversal time instead of arc cost, and the objective is to minimize the time it takes to satisfy the demands:

$LaTeX: \min \left\{ \max \{ t_{ij} : x_{ij} > 0 \} : x \ge 0, \sum_j x_{ij} \le s_i, \sum_i x_{ij} \ge d_j \right\},$

where $LaTeX: x_{ij}$ is the flow from source $LaTeX: i$ to destination $LaTeX: j$, and $LaTeX: t_{ij}$ is the time it takes to ship across arc $LaTeX: (i, j)$ (independent of the quantity shipped).

# Bounds consistency

A consistency property for variables over integer domains where consistency is only defined on the minimum and maximum values of the integer domain.

# Box constraint

Simple bounds of the form $LaTeX: L \le x \le U$.

The phenomenon that in a traffic network in which travel times depend on link (arc) flows the addition of a link (arc) can increase travel time for every user if all users minimize their own travel time. Hence, the paradox is that building a new road can make traffic congestion worse. See Myths and Counter Examples in Mathematical Programming for an example.

# Branch and bound

Born in integer programming (IP), branch and bound (B&B) is more generally used in global optimization, including continuous functions. A prototype is as follows.

1. Initialize set list with feasible region ($LaTeX: X$ plus all constraints), say $LaTeX: S$, with associated bounds $LaTeX: -\infty$, $LaTeX: \infty$ (finite bounds on the objective $LaTeX: f$ can be computed, if computationally preferable, and the initial set could be a superset of the feasible region). Initialize $LaTeX: f^*$ equal to the lower bound ($LaTeX: -\infty$, unless an initial feasible solution is known).
2. Branch by selecting the $LaTeX: k$-th set from list, say $LaTeX: S_k$, and splitting into a partition, say $LaTeX: S_k = A \vee B$ such that the interior of $LaTeX: A$ does not intersect the interior of $LaTeX: B$. (Could partition $LaTeX: S_k$ into more than 2 subsets.)
3. Bound $LaTeX: f$ in $LaTeX: A$ and/or in $LaTeX: B$, say $LaTeX: L(S) \le f(x) \le U(S)$ for all $LaTeX: x \in S$, for $LaTeX: S = A$ and/or $LaTeX: B$. Define $LaTeX: U(S) = -\infty$ if it is determined that $LaTeX: S$ is empty (in which case $LaTeX: S$ will be discarded in the next step).
4. Test if $LaTeX: U(S) \le f^* + t$, for $LaTeX: S = A$ and/or $LaTeX: B$. If not, put $LaTeX: S$ on list. (If the test passes, $LaTeX: S$ is discarded by not being put on the list. Note: $LaTeX: f^*$ is best solution value to date, and $LaTeX: t$ is a tolerance that says we want to get a solution whose objective value is within $LaTeX: t$ of the optimal value.) Further, if $LaTeX: L(S) > f*$ and the associated point, $LaTeX: x(S)$, is feasible (where $LaTeX: f(x(S)) = L(S)$), replace $LaTeX: f^*$ with $LaTeX: L(S)$ and $LaTeX: x^*$ with $LaTeX: x(S)$. If the list is not empty, go to branch.

B&B terminates when the list becomes empty, and the best feasible solution obtained is $LaTeX: x^*$ with value $LaTeX: f^*$, which is within $LaTeX: t$ of optimality. (If $LaTeX: f^* = -\infty$, then there is no feasible solution.) Notes:

1. A particular bounding rule for IP is to solve an LP relaxation, whose value is an upper bound on the integer-valued maximum. The lower bounds do not change unless an integer solution is obtained from the LP, in which case the best solution to date can change.
2. Branching rules can range from breadth-first, i.e. expand all possible branches from a node in the search tree before going deeper into the tree, to depth-first, i.e. expand the deepest node first (and backtrack, as necessary). For example, suppose we branch on the value of $LaTeX: x_j$, which could be $LaTeX: 0$ or $LaTeX: 1$. The breadth-first rule would solve the LPs for each of the two cases. The depth-first rule would solve for $LaTeX: x_j=0$ and postpone the complementary case, $LaTeX: x_j=1$ to consider another variable dichotomy, going deeper into the search tree. To illustrate, the following are search progressions in $LaTeX: \{0, 1\}^2$;.

         ____.                 ____.____          ____.____
|                     |         |        |         |
(x=0)                 (x=0)     (x=1)    (x=0)     (x=1)
/
(y=0)
1                     2                  3

____.____             ____.____          ____.____
|         |           |         |        |         |
(x=0)     (x=1)       (x=0)     (x=1)    (x=0)     (x=1)
/\                    /\         /       /\         /\
(y=0)(y=1)            (y=0)(y=1) (y=0)   (y=0)(y=1) (y=0)(y=1)
4                     5                  6



Depth-first:

          ____.                ____.              ____.
|                    |                  |
(x=0)                (x=0)              (x=0)
/                  /\
(y=0)              (y=0)(y=1)
1                     2                  3

____.____            ____.____          ____.____
|         |          |         |        |         |
(x=0)     (x=1)      (x=0)     (x=1)    (x=0)     (x=1)
/\                   /\         /       /\         /\
(y=0)(y=1)           (y=0)(y=1) (y=0)   (y=0)(y=1) (y=0)(y=1)
4                     5                  6


3. A best-first search is to select the node (i.e., set on list) that seems most promising, such as one with the greatest upper bound. Typically, there are multiple criteria for branching, and these change depending on whether a feasible solution has been obtained.
4. Backtracking need not go in exactly the reverse order (i.e., need not have a LIFO rule), but care must be taken when reordering the path.
5. When the problem is not finite (not IP), branching rules need to consider the hyper-volume of each set in a partition to ensure convergence.
6. The B&B methods we consider in OR are special cases of heuristic search in Artificial Intelligence (AI).

# Branch and cut

A hybrid of Branch and bound and cutting plane methods.

# Branch and price

The price operation is the same as column generation.

# Branching heuristic

See branching rule.

# Branching rule

A branching rule (or branching heuristic) specifies how a search tree is constructed during search. More specifically, it is a procedure or set of conditions for choosing the next variable to be branched on. It is part of the search strategy.

In constraint programming a branching rule is commonly referred to as a variable ordering heuristic.

# Breakpoint

A point where some property of a function changes, such as its slope. For example, a piece-wise linear, univariate function with $LaTeX: k$ breakpoints has the form:

$LaTeX: f(x) = a_i x + b_i \mbox{ for } c_i \le x < c_{i+1}$
where $LaTeX: i=1, ...,k$ and $LaTeX: c_{k+1}$ is defined to be infinity and $LaTeX: c_1$ is the least value of $LaTeX: x$. This function is continuous if it is equal at breakpoints, but the slope changes if $LaTeX: a_i \neq a_{i+1}$ for $LaTeX: i=1, ..., k-1$.

See BFGS update.

# Broyden family

This is a family of algorithms to solve an unconstrained nonlinear program where the objective function is in $LaTeX: C^2$;. The algorithms in this family differ by how they update the approximation of the hessian inverse (assumed to exist). These are of the form:

$LaTeX: H = aH^{\mbox{BFGS}} + (1-a)H^{\mbox{DFP}},$

with the two matrices being the Broyden-Fletcher-Goldfarb-Shanno update (BFGS) and the Davidon-Fletcher-Powell update (DFP), respectively. If $LaTeX: a$ is in $LaTeX: [0, 1]$, this is called the Broyden convex family.

# Bundle method

A class of nonsmooth-optimization methods for the numerical minimization of a convex function, $LaTeX: \theta : \mathbb{R}^n \rightarrow \mathbb{R}$. At the $LaTeX: k$th iteration the technique uses the available information contained in a bundle $LaTeX: (\theta(v^1), g_1, \ldots , \theta(v^k), g_k)$ that is obtained from an oracle; here each $LaTeX: g_i$ is a subgradient of $LaTeX: \theta$ at $LaTeX: v^i$.

Important applications are Lagrangian relaxation and column generation applied to a primal problem

$LaTeX: \max \{ p(x) : x \in X, \; h(x) = 0 \}.$

In this context, for a given $LaTeX: v \in \mathbb{R}n$,

$LaTeX: \theta(u) = \max \{ p(x) - v^T h(x) : x \in X \}$

results from the maximization of the Lagrange function; each $LaTeX: g_i$ is $LaTeX: -h(x_i)$, with $LaTeX: x^i$ maximizing the Lagrangian at $LaTeX: v^i$. Minimizing $LaTeX: \theta$ is the dual problem.

If minimizing $LaTeX: \theta(v)$, a basic method to construct the next iterate $LaTeX: v^{k+1}$ uses the cutting-plane paradigm: one minimizes the piecewise-linear function

$LaTeX: \theta_k(v) = \max \{ \theta (v^i) + g_i^T (v - v^i) : i = 1, ... ,k \},$

which can be viewed as a model of $LaTeX: \theta$ (in fact $LaTeX: \theta_k \le \theta$). Minimizing $LaTeX: \theta_k$ is a linear program, which is called the restricted master program in the context of column generation. One of the deficiencies of this method is instability -i.e. the sequence $LaTeX: v^k$ usually oscillates uncontrollably. Several possibilities exist to stabilize the process; one, adopted in the bundle approach, replaces the minimization of $LaTeX: \theta_k$ by the quadratic program

$LaTeX: \min_v \{ \theta_k(v) + (1/(2t)) |v - C|^2 \}.$

Here the positive $LaTeX: t$ is a stability parameter that is usually adjusted at each iteration, and the stability center $LaTeX: C$ is

some $LaTeX: v^i$ ($LaTeX: i \le k$) taken from the bundle, typically one that gave the best $LaTeX: \theta$-value.