文章目录
Terminology
- st : state
- ot : observation
- at : action
- π θ ( a t ∣ o t ) \pi_\theta (a_t | o_t) πθ(at∣ot) : policy, distribution p ( a t ∣ o t ) p(a_t | o_t) p(at∣ot). θ \theta θ is the parameter.
- π θ ( a t ∣ s t ) \pi_\theta (a_t | s_t) πθ(at∣st) : policy (fully observed)
Observation result from states. Sometimes we can’t fully get the entire state of system. So we need choose actions based on observation.