Value iteration

From Glossary

Jump to: navigation, search

This is an algorithm for infinite horizon [stochastic] dynamic programs that proceeds by successive approximation to satisfy the fundamental equation:

LaTeX: \mbox{F}(s) = \mbox{Opt}\{ r_{x, s} + a \sum_{s'} \mbox{P}(x, s, s')*\mbox{F}(s')\},

where LaTeX: a is a discount rate. The successive approximation becomes the DP forward equation. If LaTeX: \textstyle 0 < a < 1, this is a fixed point, and Banach's theorem yields convergence because then `Opt' is a contraction map. Even when there is no discounting, policy iteration can apply.

Personal tools