Dynamic program

From Glossary

Jump to: navigation, search


Based on the Principle of Optimality, this was originally concerned with optimal decisions over time. For continuous time, it addresses problems in variational calculus. For discrete time, each period is sometimes called a stage, and the dynamic program (DP) is called a multi-stage decision process. Here is the Fundamental Recurrence Equation for an additive process:

LaTeX: F(t, s) = \mbox{opt} \{r(t, s, x) + aF(t', s'): x \in X(t, s) \mbox { and } s'=T(t, s, x)\},

where LaTeX: F(t, s) is the optimal total return upon reaching time LaTeX: t in state LaTeX: s, and LaTeX: x is decision variable(s); LaTeX: s, LaTeX: s' are state variables; LaTeX: t, LaTeX: t' are time; LaTeX: X(t, s) is the decision space (usually depends only on state); LaTeX: r(t, s, x) is the immediate return; LaTeX: T(t, s, x) is the state transform.

In words, the optimal return upon reaching time LaTeX: t in state LaTeX: s equals the optimal return from an immediate decision plus the optimal return for the remainder of the process upon the transition to state LaTeX: s'. This new state is a function of the current time, state, and decision. For discrete time, LaTeX: t'=t+1 for the forward equation, and LaTeX: t'=t-1 for the backward equation. The multiplier, LaTeX: a is generally a positive scalar, such as a discount factor to yield the present value of money. (For some problems LaTeX: a=1, in which case the infinite horizon model might be ill posed.) More generally, the process need not be additive, so that '+' could be another composition operation.

A decision rule is a function, LaTeX: x^*(t, s), that specifies an optimal value of LaTeX: x upon reaching time LaTeX: t in state LaTeX: s. For a completely deterministic process, backtracking is used to obtain the usual form of an optimal solution from a known initial or final state. The process could be stochastic, in which case LaTeX: r and LaTeX: F are expected values, and the state transform is random with probability distribution, LaTeX: P[T(t,s,x) = s' | s,x]. Thus, LaTeX: F(t',s') is replaced by LaTeX: \sum_{s'} F(t',s')P[T(t,s,x)=s' | s,x].

DP is a technique that applies to static problems too. For example, consider the following recurrence equation for the knapsack problem:

LaTeX: F(n, s) = \max \{c(n) x + F(n-1, s-a(n)x): x \in \{0,1, ..., |_s/a(n)_|\} \}.

In words, this says that the max total return with LaTeX: n objects having total knapsack space LaTeX: s equals the max choice, LaTeX: x of how much of the LaTeX: n-th object we put into the knapsack, with immediate return LaTeX: c(n), plus the max total return from the remaining LaTeX: n-1 objects with the remaining space, LaTeX: s - a(n)x. The value of LaTeX: x (the stage decision variable) is any non-negative integer up to the space allowed: LaTeX: \lfloor s/a(n) \rfloor is the max number of objects of this type that could fit in the knapsack. There could also be explicit bounds on LaTeX: x, such as the 0-1 knapsack problem, in which case the policy set, LaTeX: X(n,s) simply has only those values.


Views
Personal tools