2. Off Policy Evaluation And Causal Inference
注意:本节的代码输出结果,特别是那个语音分解,ICA方法,在我账号上传的资源中可以找到
(a). Importance Sampling
这个很容易证明,直接将
π
^
0
=
π
0
\hat{\pi}_{0}=\pi_{0}
π^0=π0代入即可:
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
R
(
s
,
a
)
=
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
0
(
s
,
a
)
R
(
s
,
a
)
=
∑
(
s
,
a
)
π
1
(
s
,
a
)
π
0
(
s
,
a
)
R
(
s
,
a
)
p
(
s
)
π
0
(
s
,
a
)
=
∑
(
s
,
a
)
R
(
s
,
a
)
p
(
s
)
π
1
(
s
,
a
)
=
E
s
∼
p
(
s
)
a
∼
π
1
(
s
,
a
)
R
(
s
,
a
)
\begin{aligned} \mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{0}(s, a)}{ } \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a) &=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} R(s, a) \\ &=\sum_{(s, a)} \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} R(s, a) p(s) \pi_{0}(s, a) \\ &=\sum_{(s, a)} R(s, a) p(s) \pi_{1}(s, a) \\ &=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} R(s, a) \end{aligned}
Ea∼π0(s,a)s∼p(s)π^0(s,a)π1(s,a)R(s,a)=Ea∼π0(s,a)s∼p(s)π0(s,a)π1(s,a)R(s,a)=(s,a)∑π0(s,a)π1(s,a)R(s,a)p(s)π0(s,a)=(s,a)∑R(s,a)p(s)π1(s,a)=Ea∼π1(s,a)s∼p(s)R(s,a)
(b). Weighted Importance Sampling
这个与(a)
类似,仍然直接代入
π
^
0
=
π
0
\hat{\pi}_{0}=\pi_{0}
π^0=π0证明即可:
对于分母部分:
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
=
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
0
(
s
,
a
)
=
∑
(
s
,
a
)
p
(
s
,
a
)
π
1
(
s
,
a
)
π
0
(
s
,
a
)
=
∑
(
s
,
a
)
p
(
s
)
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
0
(
s
,
a
)
=
∑
(
s
,
a
)
p
(
s
)
π
1
(
s
,
a
)
=
1
\begin{aligned} \mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} &=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} \\ &=\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} \\ &=\sum_{(s, a)} p(s) \pi_{0}(s, a) \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} \\ &=\sum_{(s, a)} p(s) \pi_{1}(s, a) \\ &=1 \end{aligned}
Ea∼π0(s,a)s∼p(s)π^0(s,a)π1(s,a)=Ea∼π0(s,a)s∼p(s)π0(s,a)π1(s,a)=(s,a)∑p(s,a)π0(s,a)π1(s,a)=(s,a)∑p(s)π0(s,a)π0(s,a)π1(s,a)=(s,a)∑p(s)π1(s,a)=1
则:
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
R
(
s
,
a
)
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
=
E
s
∼
p
(
s
)
a
∼
π
1
(
s
,
a
)
R
(
s
,
a
)
\frac{\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}=\mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{1}(s, a)}{ } R(s, a)
Ea∼π0(s,a)s∼p(s)π^0(s,a)π1(s,a)Ea∼π0(s,a)s∼p(s)π^0(s,a)π1(s,a)R(s,a)=Ea∼π1(s,a)s∼p(s)R(s,a)
©. biased problem
首先由定义出发:
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
R
(
s
,
a
)
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
=
∑
(
s
,
a
)
p
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
R
(
s
,
a
)
∑
(
s
,
a
)
p
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
\frac{\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}=\frac{\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}
Ea∼π0(s,a)s∼p(s)π^0(s,a)π1(s,a)Ea∼π0(s,a)s∼p(s)π^0(s,a)π1(s,a)R(s,a)=∑(s,a)p(s,a)π^0(s,a)π1(s,a)∑(s,a)p(s,a)π^0(s,a)π1(s,a)R(s,a)
考虑只有一个样本的情况:
∑
(
s
,
a
)
p
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
R
(
s
,
a
)
∑
(
s
,
a
)
p
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
=
p
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
R
(
s
,
a
)
p
(
s
,
a
)
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
=
R
(
s
,
a
)
\frac{\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}=\frac{p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}=R(s, a)
∑(s,a)p(s,a)π^0(s,a)π1(s,a)∑(s,a)p(s,a)π^0(s,a)π1(s,a)R(s,a)=p(s,a)π^0(s,a)π1(s,a)p(s,a)π^0(s,a)π1(s,a)R(s,a)=R(s,a)
也就是说,由weighted importance sampling得到的关于符合
π
1
\pi_{1}
π1的分布的reward function的均值都是一样的(样本给出的),但是如果真正的
π
1
≠
π
0
\pi_{1} \neq \pi_{0}
π1=π0,那么他们对应的reward function肯定是不同的,即:
E
R
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
(
s
,
a
)
≠
E
s
∼
p
(
s
)
a
∼
π
1
(
s
,
a
)
R
(
s
,
a
)
\mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{0}(s, a)}R(s, a) \neq \mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{1}(s, a)}{ } R(s, a)
Ea∼π0(s,a)s∼p(s)R(s,a)=Ea∼π1(s,a)s∼p(s)R(s,a)
也就是混存在偏差。
(d). Doubly Robust
i.
直接代入证明:
首先,这种服从分布相当于求和(积分),内层指标起决定作用,外层时已不含该指标a:
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
(
(
E
a
∼
π
1
(
s
,
a
)
R
^
(
s
,
a
)
)
)
=
E
s
∼
p
(
s
)
a
∼
π
1
(
s
,
a
)
R
^
(
s
,
a
)
.
\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\left(\mathbb{E}_{a \sim \pi_{1}(s, a)} \hat{R}(s, a)\right)\right) =\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} \hat{R}(s, a).
Ea∼π0(s,a)s∼p(s)((Ea∼π1(s,a)R^(s,a)))=Ea∼π1(s,a)s∼p(s)R^(s,a).
然后与(b)
相同的方法可以得到:
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
(
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
(
R
(
s
,
a
)
−
R
^
(
s
,
a
)
)
)
=
E
s
∼
p
(
s
)
a
∼
π
1
(
s
,
a
)
(
R
(
s
,
a
)
−
R
^
(
s
,
a
)
)
\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}(R(s, a)-\hat{R}(s, a))\right)=\mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{1}(s, a)}{ }(R(s, a)-\hat{R}(s, a))
Ea∼π0(s,a)s∼p(s)(π^0(s,a)π1(s,a)(R(s,a)−R^(s,a)))=Ea∼π1(s,a)s∼p(s)(R(s,a)−R^(s,a))
则以上两部分求和:
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
(
(
E
a
∼
π
1
(
s
,
a
)
R
^
(
s
,
a
)
)
+
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
(
R
(
s
,
a
)
−
R
^
(
s
,
a
)
)
)
=
E
s
∼
p
(
s
)
a
∼
π
1
(
s
,
a
)
R
(
s
,
a
)
\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\left(\mathbb{E}_{a \sim \pi_{1}(s, a)} \hat{R}(s, a)\right)+\frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}(R(s, a)-\hat{R}(s, a))\right)=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} R(s, a)
Ea∼π0(s,a)s∼p(s)((Ea∼π1(s,a)R^(s,a))+π^0(s,a)π1(s,a)(R(s,a)−R^(s,a)))=Ea∼π1(s,a)s∼p(s)R(s,a)
ii.
同样是直接代入证明即可:
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
(
(
E
a
∼
π
1
(
s
,
a
)
R
^
(
s
,
a
)
)
=
E
s
∼
p
(
s
)
a
∼
π
1
(
s
,
a
)
R
(
s
,
a
)
\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\left(\mathbb{E}_{a \sim \pi_{1}(s, a)} \hat{R}(s, a)\right)=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} R(s, a)\right.
Ea∼π0(s,a)s∼p(s)((Ea∼π1(s,a)R^(s,a))=Ea∼π1(s,a)s∼p(s)R(s,a)
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
(
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
(
R
(
s
,
a
)
−
R
^
(
s
,
a
)
)
)
=
0
\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}(R(s, a)-\hat{R}(s, a))\right)=0
Ea∼π0(s,a)s∼p(s)(π^0(s,a)π1(s,a)(R(s,a)−R^(s,a)))=0
E
s
∼
p
(
s
)
a
∼
π
0
(
s
,
a
)
(
(
E
a
∼
π
1
(
s
,
a
)
R
^
(
s
,
a
)
)
+
π
1
(
s
,
a
)
π
^
0
(
s
,
a
)
(
R
(
s
,
a
)
−
R
^
(
s
,
a
)
)
)
=
E
s
∼
p
(
s
)
a
∼
π
1
(
s
,
a
)
R
(
s
,
a
)
\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\left(\mathbb{E}_{a \sim \pi_{1}(s, a)} \hat{R}(s, a)\right)+\frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}(R(s, a)-\hat{R}(s, a))\right)=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} R(s, a)
Ea∼π0(s,a)s∼p(s)((Ea∼π1(s,a)R^(s,a))+π^0(s,a)π1(s,a)(R(s,a)−R^(s,a)))=Ea∼π1(s,a)s∼p(s)R(s,a)
(e).
i.
这时,药物分发是randomly,所以很好评估 π 0 \pi_{0} π0,即 π ^ 0 \hat{\pi}_{0} π^0。所以,用importance sampling estimator合适。
ii.
这时,药物分发比较复杂,因此 π 0 \pi_{0} π0难以评估,因此使用regression estimator合适。
而,上面展示的Doubly Robust方法,相当于两者的中和。
3. PCA
首先由几何意义,可以写出:
f
u
(
x
)
=
arg
min
v
∈
V
∥
x
−
v
∥
2
=
u
u
T
x
u
T
u
=
u
u
T
x
f_{u}(x)=\arg \min _{v \in \mathcal{V}}\|x-v\|^{2}=\frac{u u^{T} x}{u^{T} u}=u u^{T} x
fu(x)=argv∈Vmin∥x−v∥2=uTuuuTx=uuTx
那么:
arg
min
u
:
u
T
u
=
1
∑
i
=
1
m
∥
x
(
i
)
−
f
u
(
x
(
i
)
)
∥
2
2
=
arg
min
u
:
u
T
u
=
1
∑
i
=
1
m
∥
x
(
i
)
−
u
u
T
x
(
i
)
∥
2
2
=
arg
min
u
:
u
T
u
=
1
∑
i
=
1
m
(
x
(
i
)
−
u
u
T
x
(
i
)
)
T
(
x
(
i
)
−
u
u
T
x
(
i
)
)
=
arg
min
u
:
u
T
u
=
1
∑
i
=
1
m
x
(
i
)
T
x
(
i
)
−
x
(
i
)
T
u
u
T
x
(
i
)
=
arg
max
u
:
u
T
u
=
1
∑
i
=
1
m
x
(
i
)
T
u
u
T
x
(
i
)
=
arg
max
u
:
u
T
u
=
1
∑
i
=
1
m
u
T
x
(
i
)
x
(
i
)
T
u
=
arg
max
u
:
u
T
u
=
1
u
T
(
∑
i
=
1
m
x
(
i
)
x
(
i
)
T
)
u
\begin{aligned} \arg \min _{u: u^{T} u=1} \sum_{i=1}^{m}\left\|x^{(i)}-f_{u}\left(x^{(i)}\right)\right\|_{2}^{2} &=\arg \min _{u: u^{T} u=1} \sum_{i=1}^{m}\left\|x^{(i)}-u u^{T} x^{(i)}\right\|_{2}^{2} \\ &=\arg \min _{u: u^{T} u=1} \sum_{i=1}^{m}\left(x^{(i)}-u u^{T} x^{(i)}\right)^{T}\left(x^{(i)}-u u^{T} x^{(i)}\right) \\ &=\arg \min _{u: u^{T} u=1} \sum_{i=1}^{m} x^{(i)^{T}} x^{(i)}-x^{(i)^{T}} u u^{T} x^{(i)} \\ &=\arg \max _{u: u^{T} u=1} \sum_{i=1}^{m} x^{(i)^{T}} u u^{T} x^{(i)} \\ &=\arg \max _{u: u^{T} u=1} \sum_{i=1}^{m} u^{T} x^{(i)} x^{(i)^{T}} u \\ &=\arg \max _{u: u^{T} u=1} u^{T}\left(\sum_{i=1}^{m} x^{(i)} x^{(i)^{T}}\right) u \end{aligned}
argu:uTu=1mini=1∑m∥∥∥x(i)−fu(x(i))∥∥∥22=argu:uTu=1mini=1∑m∥∥∥x(i)−uuTx(i)∥∥∥22=argu:uTu=1mini=1∑m(x(i)−uuTx(i))T(x(i)−uuTx(i))=argu:uTu=1mini=1∑mx(i)Tx(i)−x(i)TuuTx(i)=argu:uTu=1maxi=1∑mx(i)TuuTx(i)=argu:uTu=1maxi=1∑muTx(i)x(i)Tu=argu:uTu=1maxuT(i=1∑mx(i)x(i)T)u
4. Independent Component Analysis
(a). Gaussian source
对最大对数似然函数求导:
∇
W
ℓ
(
W
)
=
∇
W
∑
i
=
1
n
(
log
∣
W
∣
+
∑
j
=
1
d
log
1
2
π
exp
(
−
1
2
(
w
j
T
x
(
i
)
)
2
)
)
=
n
(
W
−
1
)
T
−
∑
i
=
1
n
∇
W
∑
j
=
1
d
1
2
(
w
j
T
x
(
i
)
)
2
=
n
(
W
−
1
)
T
−
∑
i
=
1
n
W
x
(
i
)
x
(
i
)
T
=
n
(
W
−
1
)
T
−
W
X
T
X
=
0
\begin{aligned} \nabla_{W} \ell(W) &=\nabla_{W} \sum_{i=1}^{n}\left(\log |W|+\sum_{j=1}^{d} \log \frac{1}{\sqrt{2 \pi}} \exp \left(-\frac{1}{2}\left(w_{j}^{T} x^{(i)}\right)^{2}\right)\right) \\ &=n\left(W^{-1}\right)^{T}-\sum_{i=1}^{n} \nabla_{W} \sum_{j=1}^{d} \frac{1}{2}\left(w_{j}^{T} x^{(i)}\right)^{2} \\ &=n\left(W^{-1}\right)^{T}-\sum_{i=1}^{n} W x^{(i)} x^{(i)^{T}} \\ &=n\left(W^{-1}\right)^{T}-W X^{T} X \\ &=0 \end{aligned}
∇Wℓ(W)=∇Wi=1∑n(log∣W∣+j=1∑dlog2π1exp(−21(wjTx(i))2))=n(W−1)T−i=1∑n∇Wj=1∑d21(wjTx(i))2=n(W−1)T−i=1∑nWx(i)x(i)T=n(W−1)T−WXTX=0
从而:
W
T
W
=
n
(
X
T
X
)
−
1
W^{T} W=n\left(X^{T} X\right)^{-1}
WTW=n(XTX)−1
再看为何此时的W会有一个旋转的不确定性,假如旋转的矩阵为R(旋转矩阵为行列式为1的正交矩阵),令
W
′
=
R
W
W^{\prime}=R W
W′=RW:
W
T
W
′
=
(
R
W
)
T
R
W
=
W
T
R
T
R
W
=
W
T
W
W^{T} W^{\prime}=(R W)^{T} R W=W^{T} R^{T} R W=W^{T} W
WTW′=(RW)TRW=WTRTRW=WTW
所以任何的
W
′
W^{\prime}
W′都是和W一样的一个可能的解。
(b). Laplace source
考虑到(c)
中要利用随机梯度下降,因此这里求对于一个样本的情况进行求解:
∇
W
ℓ
(
W
)
=
∇
W
(
log
∣
W
∣
+
∑
j
=
1
d
log
1
2
exp
(
−
∣
w
j
T
x
(
i
)
∣
)
)
=
(
W
−
1
)
T
−
∇
W
∑
j
=
1
d
∣
w
j
T
x
(
i
)
∣
=
(
W
T
)
−
1
−
sign
(
W
x
(
i
)
)
x
(
i
)
T
\begin{aligned} \nabla_{W} \ell(W) &=\nabla_{W}\left(\log |W|+\sum_{j=1}^{d} \log \frac{1}{2} \exp \left(-\left|w_{j}^{T} x^{(i)}\right|\right)\right) \\ &=\left(W^{-1}\right)^{T}-\nabla_{W} \sum_{j=1}^{d}\left|w_{j}^{T} x^{(i)}\right| \\ &=\left(W^{T}\right)^{-1}-\operatorname{sign}\left(W x^{(i)}\right) x^{(i)^{T}} \end{aligned}
∇Wℓ(W)=∇W(log∣W∣+j=1∑dlog21exp(−∣∣∣wjTx(i)∣∣∣))=(W−1)T−∇Wj=1∑d∣∣∣wjTx(i)∣∣∣=(WT)−1−sign(Wx(i))x(i)T
则,更新规则为:
W
:
=
W
+
α
(
(
W
T
)
−
1
−
sign
(
W
x
(
i
)
)
x
(
i
)
T
)
W:=W+\alpha\left(\left(W^{T}\right)^{-1}-\operatorname{sign}\left(W x^{(i)}\right) x^{(i)^{T}}\right)
W:=W+α((WT)−1−sign(Wx(i))x(i)T)
( c ). Cocktail Party Problem
import numpy as np
import scipy.io.wavfile
import os
def update_W(W, x, learning_rate):
"""
Perform a gradient ascent update on W using data element x and the provided learning rate.
This function should return the updated W.
Use the laplace distribiution in this problem.
Args:
W: The W matrix for ICA
x: A single data element
learning_rate: The learning rate to use
Returns:
The updated W
"""
# *** START CODE HERE ***
x = x.reshape(-1, 1)
grad = np.linalg.inv(W.T) - np.sign(W @ x) @ x.T
updated_W = W + learning_rate * grad
# *** END CODE HERE ***
return updated_W
def unmix(X, W):
"""
Unmix an X matrix according to W using ICA.
Args:
X: The data matrix
W: The W for ICA
Returns:
A numpy array S containing the split data
"""
S = np.zeros(X.shape)
# *** START CODE HERE ***
S = X @ W.T
# *** END CODE HERE ***
return S
Fs = 11025
def normalize(dat):
return 0.99 * dat / np.max(np.abs(dat))
def load_data():
mix = np.loadtxt('./data/mix.dat')
return mix
def save_W(W):
np.savetxt('output/W.txt',W)
def save_sound(audio, name):
scipy.io.wavfile.write('output/{}.wav'.format(name), Fs, audio)
def unmixer(X):
M, N = X.shape
W = np.eye(N)
anneal = [0.1 , 0.1, 0.1, 0.05, 0.05, 0.05, 0.02, 0.02, 0.01, 0.01, 0.005, 0.005, 0.002, 0.002, 0.001, 0.001]
print('Separating tracks ...')
for lr in anneal:
print(lr)
rand = np.random.permutation(range(M))
for i in rand:
x = X[i]
W = update_W(W, x, lr)
return W
def main():
# Seed the randomness of the simulation so this outputs the same thing each time
np.random.seed(0)
X = normalize(load_data())
print('data_set shape:', X.shape)
# 原始的每个麦克风的声音
for i in range(X.shape[1]):
save_sound(X[:, i], 'mixed_{}'.format(i))
# 进行学习(training)
W = unmixer(X)
print('W:\n', W)
save_W(W)
# 进行预测(prediction)
S = normalize(unmix(X, W)) # 在学习的时候可以知道,因为scaling是不能够通过ICA学习得到的,为符合wav格式做正规化
assert S.shape[1] == 5
for i in range(S.shape[1]):
if os.path.exists('split_{}'.format(i)):
os.unlink('split_{}'.format(i))
save_sound(S[:, i], 'split_{}'.format(i))
main()
data_set shape: (53442, 5)
Separating tracks ...
0.1
0.1
0.1
0.05
0.05
0.05
0.02
0.02
0.01
0.01
0.005
0.005
0.002
0.002
0.001
0.001
W:
[[ 52.83512054 16.79594361 19.94115104 -10.1985153 -20.89752402]
[ -9.92869812 -0.97683438 -4.67732108 8.04342263 1.78792785]
[ 8.31150961 -7.47676167 19.3154481 15.17437116 -14.3261898 ]
[-14.66745004 -26.64459003 2.44058135 21.38205443 -8.42074783]
[ -0.26912908 18.37411935 9.31234188 9.1025964 30.59424603]]
上面输出结果,打开听一听,确实能够听到清晰的区分开的声音。
5. Markov decision processes
(a).
证明:
∥
B
(
V
1
)
−
B
(
V
2
)
∥
∞
=
γ
∥
max
a
∈
A
∑
s
′
∈
S
P
s
a
(
s
′
)
[
V
1
(
s
′
)
−
V
2
(
s
′
)
]
∥
∞
=
γ
max
s
′
∈
S
∣
max
a
∈
A
∑
s
′
∈
S
P
s
a
(
s
′
)
[
V
1
(
s
′
)
−
V
2
(
s
′
)
]
∣
≤
γ
∥
V
1
−
V
2
∥
∞
\begin{aligned}\left\|B\left(V_{1}\right)-B\left(V_{2}\right)\right\|_{\infty} &=\gamma\left\|\max _{a \in A} \sum_{s^{\prime} \in S} P_{s a}\left(s^{\prime}\right)\left[V_{1}\left(s^{\prime}\right)-V_{2}\left(s^{\prime}\right)\right]\right\|_{\infty} \\ &=\gamma \max _{s^{\prime} \in S}\left|\max _{a \in A} \sum_{s^{\prime} \in S} P_{s a}\left(s^{\prime}\right)\left[V_{1}\left(s^{\prime}\right)-V_{2}\left(s^{\prime}\right)\right]\right| \\ & \leq \gamma\left\|V_{1}-V_{2}\right\|_{\infty} \end{aligned}
∥B(V1)−B(V2)∥∞=γ∥∥∥∥∥a∈Amaxs′∈S∑Psa(s′)[V1(s′)−V2(s′)]∥∥∥∥∥∞=γs′∈Smax∣∣∣∣∣a∈Amaxs′∈S∑Psa(s′)[V1(s′)−V2(s′)]∣∣∣∣∣≤γ∥V1−V2∥∞
最后的不等号成立,是基于,对于
α
,
x
∈
R
n
, 若
∑
i
α
i
=
1
且
α
i
≥
0
,
则
∑
i
α
i
x
i
≤
max
x
i
\alpha, x \in \mathbb{R}^{n} \text {, 若 } \sum_{i} \alpha_{i}=1 \text { 且 } \alpha_{i} \geq 0, \text { 则 } \sum_{i} \alpha_{i} x_{i} \leq \max x_{i}
α,x∈Rn, 若 ∑iαi=1 且 αi≥0, 则 ∑iαixi≤maxxi
(b).
利用反证法证明,假设B至少有一个不动点
V
1
V_{1}
V1,那么假如有第二个fixed point
V
2
V_{2}
V2,则:
∥
V
1
−
V
2
∥
∞
=
∥
B
(
V
1
)
−
B
(
V
2
)
∥
∞
≤
γ
∥
V
1
−
V
2
∥
∞
\left\|V_{1}-V_{2}\right\|_{\infty}=\left\|B\left(V_{1}\right)-B\left(V_{2}\right)\right\|_{\infty} \leq \gamma\left\|V_{1}-V_{2}\right\|_{\infty}
∥V1−V2∥∞=∥B(V1)−B(V2)∥∞≤γ∥V1−V2∥∞
上式中,第一个等号,来自于fixed point的性质,第二个不等号来自(a)
中的结论。考虑到
γ
<
1
\gamma < 1
γ<1,则上式成立的条件为:
∥
V
1
−
V
2
∥
∞
=
0
\left\|V_{1}-V_{2}\right\|_{\infty}=0
∥V1−V2∥∞=0
从而只有
V
1
=
V
2
V_{1}=V_{2}
V1=V2,所以假设不成立(没有第二个不同的fixed point),则逆命题,B至多有一个fixed point存在。