CS229 吴恩达机器学习习题大作业答案 problem sets 04 PS04（第2,3,4,5题，欢迎指教）非策略评估和因果推断 PCA 主成分分析 ICA 独立成分分析 MDP 马尔可夫决策

ML--小小白

已于 2022-05-07 22:05:22 修改

阅读量1.1k

点赞数 1

分类专栏： CS229 文章标签：深度学习人工智能机器学习自然语言处理语音识别

于 2022-04-30 23:58:54 首次发布

本文链接：https://blog.csdn.net/qq_26928055/article/details/124520964

版权

CS229 专栏收录该内容

8 篇文章 7 订阅

订阅专栏

2. Off Policy Evaluation And Causal Inference

注意：本节的代码输出结果，特别是那个语音分解，ICA方法，在我账号上传的资源中可以找到

(a). Importance Sampling

这个很容易证明，直接将 $\hat{\pi}_{0}=\pi_{0}$ 代入即可：
$\begin{aligned} \mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{0}(s, a)}{ } \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a) &=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} R(s, a) \\ &=\sum_{(s, a)} \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} R(s, a) p(s) \pi_{0}(s, a) \\ &=\sum_{(s, a)} R(s, a) p(s) \pi_{1}(s, a) \\ &=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} R(s, a) \end{aligned}$

(b). Weighted Importance Sampling

这个与(a)类似，仍然直接代入 $\hat{\pi}_{0}=\pi_{0}$ 证明即可：

对于分母部分：
$\begin{aligned} \mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} &=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} \\ &=\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} \\ &=\sum_{(s, a)} p(s) \pi_{0}(s, a) \frac{\pi_{1}(s, a)}{\pi_{0}(s, a)} \\ &=\sum_{(s, a)} p(s) \pi_{1}(s, a) \\ &=1 \end{aligned}$
则：
$\frac{\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}=\mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{1}(s, a)}{ } R(s, a)$

©. biased problem

首先由定义出发：
$\frac{\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)} \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}=\frac{\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}$
考虑只有一个样本的情况：
$\frac{\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{\sum_{(s, a)} p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}=\frac{p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)} R(s, a)}{p(s, a) \frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}}=R(s, a)$
也就是说，由weighted importance sampling得到的关于符合 $\pi_{1}$ 的分布的reward function的均值都是一样的（样本给出的），但是如果真正的 $\pi_{1} \neq \pi_{0}$ ,那么他们对应的reward function肯定是不同的，即：
$\mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{0}(s, a)}R(s, a) \neq \mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{1}(s, a)}{ } R(s, a)$
也就是混存在偏差。

(d). Doubly Robust

i.

直接代入证明：
首先，这种服从分布相当于求和（积分），内层指标起决定作用，外层时已不含该指标a：
$\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\left(\mathbb{E}_{a \sim \pi_{1}(s, a)} \hat{R}(s, a)\right)\right) =\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} \hat{R}(s, a).$
然后与(b)相同的方法可以得到：
$\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}(R(s, a)-\hat{R}(s, a))\right)=\mathbb{E} \underset{s \sim p(s) \atop a \sim \pi_{1}(s, a)}{ }(R(s, a)-\hat{R}(s, a))$
则以上两部分求和：
$\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\left(\mathbb{E}_{a \sim \pi_{1}(s, a)} \hat{R}(s, a)\right)+\frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}(R(s, a)-\hat{R}(s, a))\right)=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} R(s, a)$

ii.

同样是直接代入证明即可：
$\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\left(\mathbb{E}_{a \sim \pi_{1}(s, a)} \hat{R}(s, a)\right)=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} R(s, a)\right.$
$\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}(R(s, a)-\hat{R}(s, a))\right)=0$
$\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{0}(s, a)}\left(\left(\mathbb{E}_{a \sim \pi_{1}(s, a)} \hat{R}(s, a)\right)+\frac{\pi_{1}(s, a)}{\hat{\pi}_{0}(s, a)}(R(s, a)-\hat{R}(s, a))\right)=\mathbb{E}_{s \sim p(s) \atop a \sim \pi_{1}(s, a)} R(s, a)$

(e).

i.

这时，药物分发是randomly，所以很好评估 $\pi_{0}$ ，即 $\hat{\pi}_{0}$ 。所以，用importance sampling estimator合适。

ii.

这时，药物分发比较复杂，因此 $\pi_{0}$ 难以评估，因此使用regression estimator合适。

而，上面展示的Doubly Robust方法，相当于两者的中和。

3. PCA

首先由几何意义，可以写出：
$f_{u}(x)=\arg \min _{v \in \mathcal{V}}\|x-v\|^{2}=\frac{u u^{T} x}{u^{T} u}=u u^{T} x$
那么：
$\begin{aligned} \arg \min _{u: u^{T} u=1} \sum_{i=1}^{m}\left\|x^{(i)}-f_{u}\left(x^{(i)}\right)\right\|_{2}^{2} &=\arg \min _{u: u^{T} u=1} \sum_{i=1}^{m}\left\|x^{(i)}-u u^{T} x^{(i)}\right\|_{2}^{2} \\ &=\arg \min _{u: u^{T} u=1} \sum_{i=1}^{m}\left(x^{(i)}-u u^{T} x^{(i)}\right)^{T}\left(x^{(i)}-u u^{T} x^{(i)}\right) \\ &=\arg \min _{u: u^{T} u=1} \sum_{i=1}^{m} x^{(i)^{T}} x^{(i)}-x^{(i)^{T}} u u^{T} x^{(i)} \\ &=\arg \max _{u: u^{T} u=1} \sum_{i=1}^{m} x^{(i)^{T}} u u^{T} x^{(i)} \\ &=\arg \max _{u: u^{T} u=1} \sum_{i=1}^{m} u^{T} x^{(i)} x^{(i)^{T}} u \\ &=\arg \max _{u: u^{T} u=1} u^{T}\left(\sum_{i=1}^{m} x^{(i)} x^{(i)^{T}}\right) u \end{aligned}$

4. Independent Component Analysis

(a). Gaussian source

对最大对数似然函数求导：
$\begin{aligned} \nabla_{W} \ell(W) &=\nabla_{W} \sum_{i=1}^{n}\left(\log |W|+\sum_{j=1}^{d} \log \frac{1}{\sqrt{2 \pi}} \exp \left(-\frac{1}{2}\left(w_{j}^{T} x^{(i)}\right)^{2}\right)\right) \\ &=n\left(W^{-1}\right)^{T}-\sum_{i=1}^{n} \nabla_{W} \sum_{j=1}^{d} \frac{1}{2}\left(w_{j}^{T} x^{(i)}\right)^{2} \\ &=n\left(W^{-1}\right)^{T}-\sum_{i=1}^{n} W x^{(i)} x^{(i)^{T}} \\ &=n\left(W^{-1}\right)^{T}-W X^{T} X \\ &=0 \end{aligned}$
从而：
$W^{T} W=n\left(X^{T} X\right)^{-1}$
再看为何此时的W会有一个旋转的不确定性，假如旋转的矩阵为R（旋转矩阵为行列式为1的正交矩阵），令 $W^{\prime}=R W$ ：
$W^{T} W^{\prime}=(R W)^{T} R W=W^{T} R^{T} R W=W^{T} W$
所以任何的 $W^{\prime}$ 都是和W一样的一个可能的解。

(b). Laplace source

考虑到(c)中要利用随机梯度下降，因此这里求对于一个样本的情况进行求解：
$\begin{aligned} \nabla_{W} \ell(W) &=\nabla_{W}\left(\log |W|+\sum_{j=1}^{d} \log \frac{1}{2} \exp \left(-\left|w_{j}^{T} x^{(i)}\right|\right)\right) \\ &=\left(W^{-1}\right)^{T}-\nabla_{W} \sum_{j=1}^{d}\left|w_{j}^{T} x^{(i)}\right| \\ &=\left(W^{T}\right)^{-1}-\operatorname{sign}\left(W x^{(i)}\right) x^{(i)^{T}} \end{aligned}$
则，更新规则为：
$W:=W+\alpha\left(\left(W^{T}\right)^{-1}-\operatorname{sign}\left(W x^{(i)}\right) x^{(i)^{T}}\right)$

( c ). Cocktail Party Problem

import numpy as np
import scipy.io.wavfile
import os


def update_W(W, x, learning_rate):
    """
    Perform a gradient ascent update on W using data element x and the provided learning rate.

    This function should return the updated W.

    Use the laplace distribiution in this problem.

    Args:
        W: The W matrix for ICA
        x: A single data element
        learning_rate: The learning rate to use

    Returns:
        The updated W
    """
    
    # *** START CODE HERE ***
    x = x.reshape(-1, 1)
    grad = np.linalg.inv(W.T) - np.sign(W @ x) @ x.T
    updated_W = W + learning_rate * grad
    # *** END CODE HERE ***

    return updated_W


def unmix(X, W):
    """
    Unmix an X matrix according to W using ICA.

    Args:
        X: The data matrix
        W: The W for ICA

    Returns:
        A numpy array S containing the split data
    """

    S = np.zeros(X.shape)


    # *** START CODE HERE ***
    S = X @ W.T
    # *** END CODE HERE ***

    return S


Fs = 11025

def normalize(dat):
    return 0.99 * dat / np.max(np.abs(dat))

def load_data():
    mix = np.loadtxt('./data/mix.dat')
    return mix

def save_W(W):
    np.savetxt('output/W.txt',W)

def save_sound(audio, name):
    scipy.io.wavfile.write('output/{}.wav'.format(name), Fs, audio)

def unmixer(X):
    M, N = X.shape
    W = np.eye(N)

    anneal = [0.1 , 0.1, 0.1, 0.05, 0.05, 0.05, 0.02, 0.02, 0.01, 0.01, 0.005, 0.005, 0.002, 0.002, 0.001, 0.001]
    print('Separating tracks ...')
    for lr in anneal:
        print(lr)
        rand = np.random.permutation(range(M))
        for i in rand:
            x = X[i]
            W = update_W(W, x, lr)

    return W

def main():
    # Seed the randomness of the simulation so this outputs the same thing each time
    np.random.seed(0)
    X = normalize(load_data())

    print('data_set shape:', X.shape)
    # 原始的每个麦克风的声音
    for i in range(X.shape[1]):
        save_sound(X[:, i], 'mixed_{}'.format(i))
    # 进行学习（training）
    W = unmixer(X)
    print('W:\n', W)
    save_W(W)
    # 进行预测（prediction）
    S = normalize(unmix(X, W))   # 在学习的时候可以知道，因为scaling是不能够通过ICA学习得到的，为符合wav格式做正规化
    assert S.shape[1] == 5
    for i in range(S.shape[1]):
        if os.path.exists('split_{}'.format(i)):
            os.unlink('split_{}'.format(i))
        save_sound(S[:, i], 'split_{}'.format(i))

main()

data_set shape: (53442, 5)
Separating tracks ...
0.1
0.1
0.1
0.05
0.05
0.05
0.02
0.02
0.01
0.01
0.005
0.005
0.002
0.002
0.001
0.001
W:
 [[ 52.83512054  16.79594361  19.94115104 -10.1985153  -20.89752402]
 [ -9.92869812  -0.97683438  -4.67732108   8.04342263   1.78792785]
 [  8.31150961  -7.47676167  19.3154481   15.17437116 -14.3261898 ]
 [-14.66745004 -26.64459003   2.44058135  21.38205443  -8.42074783]
 [ -0.26912908  18.37411935   9.31234188   9.1025964   30.59424603]]

上面输出结果，打开听一听，确实能够听到清晰的区分开的声音。

5. Markov decision processes

(a).

证明：
$\begin{aligned}\left\|B\left(V_{1}\right)-B\left(V_{2}\right)\right\|_{\infty} &=\gamma\left\|\max _{a \in A} \sum_{s^{\prime} \in S} P_{s a}\left(s^{\prime}\right)\left[V_{1}\left(s^{\prime}\right)-V_{2}\left(s^{\prime}\right)\right]\right\|_{\infty} \\ &=\gamma \max _{s^{\prime} \in S}\left|\max _{a \in A} \sum_{s^{\prime} \in S} P_{s a}\left(s^{\prime}\right)\left[V_{1}\left(s^{\prime}\right)-V_{2}\left(s^{\prime}\right)\right]\right| \\ & \leq \gamma\left\|V_{1}-V_{2}\right\|_{\infty} \end{aligned}$
最后的不等号成立，是基于，对于 $\alpha, x \in \mathbb{R}^{n} \text {, 若 } \sum_{i} \alpha_{i}=1 \text { 且 } \alpha_{i} \geq 0, \text { 则 } \sum_{i} \alpha_{i} x_{i} \leq \max x_{i}$

(b).

利用反证法证明，假设B至少有一个不动点 $V_{1}$ ，那么假如有第二个fixed point $V_{2}$ ，则：
$\left\|V_{1}-V_{2}\right\|_{\infty}=\left\|B\left(V_{1}\right)-B\left(V_{2}\right)\right\|_{\infty} \leq \gamma\left\|V_{1}-V_{2}\right\|_{\infty}$
上式中，第一个等号，来自于fixed point的性质，第二个不等号来自(a)中的结论。考虑到 $\gamma < 1$ ，则上式成立的条件为：
$\left\|V_{1}-V_{2}\right\|_{\infty}=0$
从而只有 $V_{1}=V_{2}$ ，所以假设不成立（没有第二个不同的fixed point），则逆命题，B至多有一个fixed point存在。

ML--小小白

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
CS229 吴恩达机器学习习题大作业答案 problem sets 04 PS04（第2,3,4,5题，欢迎指教）非策略评估和因果推断 PCA 主成分分析 ICA 独立成分分析 MDP 马尔可夫决策

2. Off Policy Evaluation And Causal Inference注意：本节的代码输出结果，特别是那个语音分解，ICA方法，在我账号上传的资源中可以找到(a). Importance Sampling这个很容易证明，直接将π^0=π0\hat{\pi}_{0}=\pi_{0}π^0=π0代入即可：Es∼p(s)a∼π0(s,a)π1(s,a)π^0(s,a)R(s,a)=Es∼p(s)a∼π0(s,a)π1(s,a)π0(s,a)R(s,a)=∑(s,a)π1(s,a)π
复制链接

扫一扫