# Policy iteration

### From Glossary

This is an algorithm for infinite horizon dynamic programs (generally stochastic) that proceeds by improving policies to satisfy the fundamental equation:

where is the maximum expected value when starting in state is the immediate expected return when in state and following an optimal policy (a decision rule), and is probability of a transition from state to state in one time period.

The algorithm has some policy, at the beginning of an iteration. This determines an approximation of which is exact if the equation holds. If the equation is violated, the violation identifies how the policy can be improved. This changes the policy for the next iteration. Convergence depends upon the underlying Markov process (e.g., whether it is ergodic). Another approach is value iteration.