# Dynamic program

### From Glossary

Based on the Principle of Optimality, this was originally concerned with optimal
decisions over time. For continuous time, it addresses problems in
variational calculus. For discrete time, each period is sometimes called a
*stage*, and the dynamic program (DP) is called a *multi-stage decision process*. Here is the
*Fundamental Recurrence Equation* for an additive process:

where is the optimal total return upon reaching time in state
, and is *decision variable(s)*; , are
*state variables*; , are *time*; is
the *decision space* (usually depends only on state); is the *immediate
return*; is the *state transform*.

In words, the optimal return upon reaching time in state
equals the optimal return from an immediate decision plus the
optimal return for the remainder of the process upon the transition
to state . This new state is a function of the current time,
state, and decision. For discrete time, for the
*forward equation*, and for the *backward equation*.
The multiplier, is generally a positive scalar, such as a
discount factor to yield the present value of
money. (For some problems , in which case the infinite horizon
model might be ill posed.) More generally, the process need not be
additive, so that '+' could be another composition operation.

A *decision rule* is a function, , that specifies
an optimal value of upon reaching time in state .
For a completely deterministic process, *backtracking* is used to
obtain the usual form of an optimal solution from a known initial
or final state. The process could be stochastic, in which case
and are expected values, and the state transform is random with
probability distribution, . Thus, is replaced by
.

DP is a technique that applies to static problems too. For example, consider the following recurrence equation for the knapsack problem:

In words, this says that the max total return with objects having total knapsack space equals the max choice, of how much of the -th object we put into the knapsack, with immediate return , plus the max total return from the remaining objects with the remaining space, . The value of (the stage decision variable) is any non-negative integer up to the space allowed: is the max number of objects of this type that could fit in the knapsack. There could also be explicit bounds on , such as the 0-1 knapsack problem, in which case the policy set, simply has only those values.