# Value iteration

$LaTeX: \mbox{F}(s) = \mbox{Opt}\{ r_{x, s} + a \sum_{s'} \mbox{P}(x, s, s')*\mbox{F}(s')\},$
where $LaTeX: a$ is a discount rate. The successive approximation becomes the DP forward equation. If $LaTeX: \textstyle 0 < a < 1,$ this is a fixed point, and Banach's theorem yields convergence because then `Opt' is a contraction map. Even when there is no discounting, policy iteration can apply.