 # Dynamic program

Based on the Principle of Optimality, this was originally concerned with optimal decisions over time. For continuous time, it addresses problems in variational calculus. For discrete time, each period is sometimes called a stage, and the dynamic program (DP) is called a multi-stage decision process. Here is the Fundamental Recurrence Equation for an additive process: $LaTeX: F(t, s) = \mbox{opt} \{r(t, s, x) + aF(t', s'): x \in X(t, s) \mbox { and } s'=T(t, s, x)\}$,

where $LaTeX: F(t, s)$ is the optimal total return upon reaching time $LaTeX: t$ in state $LaTeX: s$, and $LaTeX: x$ is decision variable(s); $LaTeX: s$, $LaTeX: s'$ are state variables; $LaTeX: t$, $LaTeX: t'$ are time; $LaTeX: X(t, s)$ is the decision space (usually depends only on state); $LaTeX: r(t, s, x)$ is the immediate return; $LaTeX: T(t, s, x)$ is the state transform.

In words, the optimal return upon reaching time $LaTeX: t$ in state $LaTeX: s$ equals the optimal return from an immediate decision plus the optimal return for the remainder of the process upon the transition to state $LaTeX: s'$. This new state is a function of the current time, state, and decision. For discrete time, $LaTeX: t'=t+1$ for the forward equation, and $LaTeX: t'=t-1$ for the backward equation. The multiplier, $LaTeX: a$ is generally a positive scalar, such as a discount factor to yield the present value of money. (For some problems $LaTeX: a=1$, in which case the infinite horizon model might be ill posed.) More generally, the process need not be additive, so that '+' could be another composition operation.

A decision rule is a function, $LaTeX: x^*(t, s)$, that specifies an optimal value of $LaTeX: x$ upon reaching time $LaTeX: t$ in state $LaTeX: s$. For a completely deterministic process, backtracking is used to obtain the usual form of an optimal solution from a known initial or final state. The process could be stochastic, in which case $LaTeX: r$ and $LaTeX: F$ are expected values, and the state transform is random with probability distribution, $LaTeX: P[T(t,s,x) = s' | s,x]$. Thus, $LaTeX: F(t',s')$ is replaced by $LaTeX: \sum_{s'} F(t',s')P[T(t,s,x)=s' | s,x]$.

DP is a technique that applies to static problems too. For example, consider the following recurrence equation for the knapsack problem: $LaTeX: F(n, s) = \max \{c(n) x + F(n-1, s-a(n)x): x \in \{0,1, ..., |_s/a(n)_|\} \}$.

In words, this says that the max total return with $LaTeX: n$ objects having total knapsack space $LaTeX: s$ equals the max choice, $LaTeX: x$ of how much of the $LaTeX: n$-th object we put into the knapsack, with immediate return $LaTeX: c(n)$, plus the max total return from the remaining $LaTeX: n-1$ objects with the remaining space, $LaTeX: s - a(n)x$. The value of $LaTeX: x$ (the stage decision variable) is any non-negative integer up to the space allowed: $LaTeX: \lfloor s/a(n) \rfloor$ is the max number of objects of this type that could fit in the knapsack. There could also be explicit bounds on $LaTeX: x$, such as the 0-1 knapsack problem, in which case the policy set, $LaTeX: X(n,s)$ simply has only those values.