Exercise 3.12 Give an equation for v π v_\pi vπ in terms of q π q_\pi qπ and π \pi π.
v
π
(
s
)
=
E
π
(
G
t
∣
S
t
=
s
)
=
∑
g
t
[
g
t
⋅
p
(
g
t
∣
s
)
]
=
∑
g
t
[
g
t
⋅
p
(
g
t
,
s
)
p
(
s
)
]
=
∑
g
t
[
g
t
⋅
∑
a
∈
A
p
(
g
t
,
s
,
a
)
p
(
s
)
]
=
∑
g
t
{
g
t
⋅
∑
a
∈
A
[
p
(
g
t
∣
s
,
a
)
⋅
p
(
s
,
a
)
]
p
(
s
)
}
=
∑
g
t
{
g
t
⋅
∑
a
∈
A
[
p
(
g
t
∣
s
,
a
)
⋅
p
(
a
∣
s
)
⋅
p
(
s
)
]
p
(
s
)
]
}
=
∑
g
t
{
g
t
⋅
∑
a
∈
A
[
p
(
g
t
∣
s
,
a
)
⋅
p
(
a
∣
s
)
]
}
=
∑
a
∈
A
{
p
(
a
∣
s
)
∑
g
t
[
g
t
⋅
p
(
g
t
∣
s
,
a
)
]
}
\begin{aligned} v_\pi(s) &= \mathbb E_\pi(G_t|S_t=s) \\ &=\sum_{g_t}\bigl [ g_t \cdot p(g_t|s) \bigr ] \\ &=\sum_{g_t}\bigl [ g_t \cdot \frac {p(g_t, s)}{p(s)} \bigr ] \\ &=\sum_{g_t}\bigl [ g_t \cdot \frac{ \sum_{a \in \mathcal A} p(g_t, s, a)}{p(s)} \bigr ] \\ &=\sum_{g_t}\Bigl \{ g_t \cdot \frac{ \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(s, a) \bigr ] }{p(s)} \Bigr \} \\ &=\sum_{g_t}\Bigl \{ g_t \cdot \frac{ \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(a | s) \cdot p(s) \bigr ]}{p(s) \bigr ] } \Bigr \} \\ &=\sum_{g_t}\Bigl \{ g_t \cdot \sum_{a \in \mathcal A} \bigl [p(g_t| s, a) \cdot p(a | s) \bigr ] \Bigr \} \\ &=\sum_{a \in \mathcal A} \Bigl \{ p(a|s) \sum_{g_t} \bigl [ g_t \cdot p(g_t | s, a) \bigr ] \Bigr \} \end{aligned}
vπ(s)=Eπ(Gt∣St=s)=gt∑[gt⋅p(gt∣s)]=gt∑[gt⋅p(s)p(gt,s)]=gt∑[gt⋅p(s)∑a∈Ap(gt,s,a)]=gt∑{gt⋅p(s)∑a∈A[p(gt∣s,a)⋅p(s,a)]}=gt∑{gt⋅p(s)]∑a∈A[p(gt∣s,a)⋅p(a∣s)⋅p(s)]}=gt∑{gt⋅a∈A∑[p(gt∣s,a)⋅p(a∣s)]}=a∈A∑{p(a∣s)gt∑[gt⋅p(gt∣s,a)]}
According to definition,
p
(
a
∣
s
)
=
π
(
a
∣
s
)
p(a|s) = \pi(a|s)
p(a∣s)=π(a∣s),
∑
g
t
[
g
t
⋅
p
(
g
t
∣
s
,
a
)
]
=
q
π
(
s
,
a
)
\sum_{g_t} \bigl [ g_t \cdot p(g_t | s, a) \bigr ] = q_\pi(s,a)
∑gt[gt⋅p(gt∣s,a)]=qπ(s,a), so there is:
v
π
(
s
)
=
∑
a
∈
A
[
π
(
a
∣
s
)
⋅
q
π
(
s
,
a
)
]
v_\pi(s) = \sum_{a \in \mathcal A} \bigl [ \pi(a|s) \cdot q_\pi(s,a) \bigr ]
vπ(s)=a∈A∑[π(a∣s)⋅qπ(s,a)]