机器学习笔记之高斯过程(三)高斯过程回归——函数空间角度

引言

上一节介绍了从权重空间角度认识高斯过程回归。本节将介绍从函数空间角度认识高斯过程回归。

回顾:高维转换处理非线性回归任务过程

权重空间(Weight-Space)视角观察高斯过程回归高斯过程(Gaussian Process)本身没有直接联系。其本质上是 针对非线性回归任务,使用贝叶斯线性回归与核技巧(Kernal Trick)相结合的方式进行求解

  • 针对非线性回归任务,使用非线性转换(Non-Linear Transformation) ϕ ( ⋅ ) \phi(\cdot) ϕ()原始特征空间 X ∈ R p \mathcal X \in \mathbb R^p XRp映射到高维空间
    X ∈ R p → ϕ ( X ) ∈ R q q ≫ p \begin{aligned} \mathcal X \in \mathbb R^p \to \phi(\mathcal X) \in \mathbb R^q \quad q \gg p \end{aligned} XRpϕ(X)Rqqp

  • 由于样本特征空间的变化,因而影响随机变量 W \mathcal W W后验概率分布 P ( W ∣ D a t a ) \mathcal P(\mathcal W \mid Data) P(WData)
    P ( W ∣ D a t a ) ∼ N ( μ W , Σ W ) → { μ W = A − 1 [ ϕ ( X ) ] T Y σ 2 Σ W = A − 1 A = [ ϕ ( X ) ] T ϕ ( X ) σ 2 + [ Σ p r i o r − 1 ] q × q \mathcal P(\mathcal W \mid Data) \sim \mathcal N(\mu_{\mathcal W},\Sigma_{\mathcal W}) \to \begin{cases} \mu_{\mathcal W} = \frac{\mathcal A^{-1}[\phi(\mathcal X)]^T\mathcal Y}{\sigma^2} \\ \Sigma_{\mathcal W} = \mathcal A^{-1} \\ \mathcal A = \frac{[\phi(\mathcal X)]^T\phi(\mathcal X)}{\sigma^2} + [\Sigma_{prior}^{-1}]_{q \times q} \end{cases} P(WData)N(μW,ΣW)μW=σ2A1[ϕ(X)]TYΣW=A1A=σ2[ϕ(X)]Tϕ(X)+[Σprior1]q×q

  • 从而对经过非线性转换后的给定(未知)样本 ϕ ( x ^ ) \phi(\hat x) ϕ(x^)的标签 f [ ϕ ( x ^ ) ] f[\phi(\hat x)] f[ϕ(x^)]进行预测(Prediction):

    • 推导过程复杂的部分是 A − 1 \mathcal A^{-1} A1的求解,关于 A − 1 \mathcal A^{-1} A1的求解过程详见上一节.
    • 这里预测的是'不含高斯噪声'的 f [ ϕ ( x ^ ) ] f[\phi(\hat x)] f[ϕ(x^)]而不是 y ^ \hat y y^,如果要预测 y ^ \hat y y^需要在协方差中加上 σ 2 \sigma^2 σ2.
      P [ f [ ϕ ( x ^ ) ] ∣ D a t a , ϕ ( x ^ ) ] ∼ N ( [ ϕ ( x ^ ) ] T μ W , [ ϕ ( x ^ ) ] T Σ W ⋅ ϕ ( x ^ ) ) = N { [ ϕ ( x ^ ) ] T ( A − 1 [ ϕ ( X ) ] T Y σ 2 ) , [ ϕ ( x ^ ) ] T A − 1 ⋅ ϕ ( x ^ ) } \begin{aligned} \mathcal P[f[\phi(\hat x)] \mid Data,\phi(\hat x)] & \sim \mathcal N([\phi(\hat x)]^T \mu_{\mathcal W},[\phi(\hat x)]^T \Sigma_{\mathcal W} \cdot \phi(\hat x)) \\ & = \mathcal N \left\{[\phi(\hat x)]^T \left(\frac{\mathcal A^{-1} [\phi(\mathcal X)]^T\mathcal Y}{\sigma^2}\right),[\phi(\hat x)]^T\mathcal A^{-1} \cdot \phi(\hat x)\right\} \end{aligned} P[f[ϕ(x^)]Data,ϕ(x^)]N([ϕ(x^)]TμW,[ϕ(x^)]TΣWϕ(x^))=N{[ϕ(x^)]T(σ2A1[ϕ(X)]TY),[ϕ(x^)]TA1ϕ(x^)}

    最终展开结果表示如下:
    其中 [ Σ p r i o r ] q × q [\Sigma_{prior}]_{q \times q} [Σprior]q×q表示先验分布的协方差矩阵; I q × q \mathcal I_{q \times q} Iq×q表示单位矩阵。 K ( X , X ) q × q \mathcal K(\mathcal X,\mathcal X)_{q \times q} K(X,X)q×q表示 [ ϕ ( X ) ] T Σ p r i o r ϕ ( X ) [\phi(\mathcal X)]^T\Sigma_{prior}\phi(\mathcal X) [ϕ(X)]TΣpriorϕ(X).
    P [ f ( x ^ ) ∣ D a t a , x ^ ] ∼ N ( μ x ^ . Σ x ^ ) { μ x ^ = [ ϕ ( x ^ ) ] T Σ p r i o r [ ϕ ( X ) ] T [ K ( X , X ) + σ 2 I ] − 1 Σ x ^ = [ ϕ ( x ^ ) ] T ⋅ { Σ p r i o r − Σ p r i o r [ ϕ ( X ) ] T [ K ( X , X ) + σ 2 I ] − 1 ϕ ( X ) Σ p r i o r } ⋅ ϕ ( x ^ ) \mathcal P[f(\hat x) \mid Data,\hat x] \sim \mathcal N(\mu_{\hat x}.\Sigma_{\hat x}) \\ \begin{cases} \mu_{\hat x} = [\phi(\hat x)]^T \Sigma_{prior} [\phi(\mathcal X)]^T [\mathcal K(\mathcal X,\mathcal X) + \sigma^2 \mathcal I]^{-1} \\ \Sigma_{\hat x} = [\phi(\hat x)]^T \cdot \left\{\Sigma_{prior} - \Sigma_{prior} [\phi(\mathcal X)]^T \left[\mathcal K(\mathcal X,\mathcal X) + \sigma^2 \mathcal I\right]^{-1} \phi(\mathcal X) \Sigma_{prior}\right\} \cdot \phi(\hat x) \end{cases} P[f(x^)Data,x^]N(μx^.Σx^){μx^=[ϕ(x^)]TΣprior[ϕ(X)]T[K(X,X)+σ2I]1Σx^=[ϕ(x^)]T{ΣpriorΣprior[ϕ(X)]T[K(X,X)+σ2I]1ϕ(X)Σprior}ϕ(x^)

  • 针对公式中出现的复杂的内积问题,使用核技巧(Kernal Trick)进行处理。假设存在关于变量 x , x ′ x,x' x,x核函数 K ( x , x ′ ) \mathcal K(x,x') K(x,x)表示如下:
    这里 [ Σ p r i o r ] q × q [\Sigma_{prior}]_{q \times q} [Σprior]q×q至少是半正定矩阵。
    K ( x , x ′ ) = [ ϕ ( x ) ] T Σ p r i o r ϕ ( x ′ ) = [ Σ p r i o r   ϕ ( x ) ] T [ Σ p r i o r   ϕ ( x ′ ) ] = ⟨ Σ p r i o r   ϕ ( x ) , Σ p r i o r   ϕ ( x ′ ) ⟩ \begin{aligned} \mathcal K(x,x') & = [\phi(x)]^T \Sigma_{prior} \phi(x') \\ & = \left[\sqrt{\Sigma_{prior}} \text{ } \phi(x)\right]^T[\sqrt{\Sigma_{prior}} \text{ } \phi(x')] \\ & = \left\langle\sqrt{\Sigma_{prior}} \text{ } \phi(x) ,\sqrt{\Sigma_{prior}} \text{ } \phi(x')\right\rangle \end{aligned} K(x,x)=[ϕ(x)]TΣpriorϕ(x)=[Σprior  ϕ(x)]T[Σprior  ϕ(x)]=Σprior  ϕ(x),Σprior  ϕ(x)
    核函数的处理方式相同,直接规避非线性函数 ϕ ( ⋅ ) \phi(\cdot) ϕ()的高维复杂运算。直接对其内积进行求解。

回顾:高斯过程

高斯过程(Gaussian Process)本质上式一组高维随机变量组成的集合
{ ξ t } t ∈ T = { ⋯   , ξ t 1 , ξ t 2 , ⋯   , ξ t n , ⋯   } ( t 1 , t 2 ⋯   , t n ∈ T ) \{\xi_{t}\}_{t \in \mathcal T} = \{\cdots,\xi_{t_1},\xi_{t_2},\cdots,\xi_{t_n},\cdots\} \quad (t_1,t_2\cdots,t_n \in \mathcal T) {ξt}tT={,ξt1,ξt2,,ξtn,}(t1,t2,tnT)
其中 T \mathcal T T表示连续域,它可能是时间/空间中的连续域。对于高斯过程的定义可描述为:对于任意 { t 1 , t 2 , ⋯   , t n } ∈ T \{t_1,t_2,\cdots,t_n\} \in \mathcal T {t1,t2,,tn}T对应随机过程 { ξ t } t ∈ T \{\xi_t\}_{t \in \mathcal T} {ξt}tT子集 ξ t 1 → t n = { ξ t 1 , ξ t 2 , ⋯   , ξ t n } \xi_{t_1 \to t_n} = \{\xi_{t_1},\xi_{t_2},\cdots,\xi_{t_n}\} ξt1tn={ξt1,ξt2,,ξtn}服从某一高斯分布 N ( μ t 1 → t n , Σ t 1 → t n ) \mathcal N(\mu_{t_1 \to t_n},\Sigma_{t_1 \to t_n}) N(μt1tn,Σt1tn),那么称 { ξ t } t ∈ T \{\xi_{t}\}_{t \in \mathcal T} {ξt}tT是高斯过程
由于 t ∈ T t \in \mathcal T tT是稠密的(可以理解为‘时间间隔无限趋近于0,依然存在随机变量’),从而可以看作是连续域 T \mathcal T T内的‘无限维’高斯分布
{ ξ t } t ∈ T ∼ G P [ m ( t ) , K ( t , s ) ] ( s , t ∈ T ) \{\xi_t\}_{t \in \mathcal T} \sim \mathcal G\mathcal P[m(t),\mathcal K(t,s)] \quad (s,t \in \mathcal T) {ξt}tTGP[m(t),K(t,s)](s,tT)
需要注意的是,均值函数(Mean-Function) m ( t ) m(t) m(t)方差函数(Covariance Function) K ( s , t ) \mathcal K(s,t) K(s,t)它们均是基于函数形式的表达,这说明:不同时刻/状态下的均值/协方差结果不是固定值,而是表示为关于 s , t s,t s,t的函数
X ∈ R p → X ∼ N ( μ p , Σ p × p ) \mathcal X \in \mathbb R^p \to \mathcal X \sim \mathcal N(\mu_p,\Sigma_{p \times p}) XRpXN(μp,Σp×p)

相反,如高斯网络(Gaussian Network),一旦随机变量集合 X \mathcal X X确定了,那么对应的概率图模型就是静态模型,对应的期望结果 μ p \mu_p μp协方差矩阵 Σ p × p \Sigma_{p \times p} Σp×p就是恒定不变的,从概率图的角度观察各随机变量结点之间的关联关系也是确定的。

权重空间视角——模型参数 W \mathcal W W的变化

基于线性回归模型(无高斯噪声) f ( X ) = X T W f(\mathcal X) = \mathcal X^T\mathcal W f(X)=XTW,对特征空间 X ∈ R p \mathcal X \in \mathbb R^p XRp进行非线性高维转换 X → ϕ ( X ) ∈ R q \mathcal X \to \phi(\mathcal X) \in \mathbb R^q Xϕ(X)Rq
给定模型参数 W \mathcal W W一个先验分布
由于 X \mathcal X X已经执行了‘非线性转换’,因此此时的 W \mathcal W W q q q维随机变量,对应的协方差矩阵 Σ p r i o r \Sigma_{prior} Σprior同样需要时 q × q q \times q q×q的格式。
W ∼ N ( 0 , [ Σ p r i o r ] q × q ) \mathcal W \sim \mathcal N(0,[\Sigma_{prior}]_{q \times q}) WN(0,[Σprior]q×q)
因此,线性模型 f ( X ) f(\mathcal X) f(X)的期望 E [ f ( X ) ] \mathbb E[f(\mathcal X)] E[f(X)]可表示如下:
这里关注的是 W \mathcal W W的变化,因此这里将 ϕ ( X ) \phi(\mathcal X) ϕ(X)看作常数。
E [ f ( X ) ] = E { [ ϕ ( X ) ] T W } = [ ϕ ( X ) ] T E [ W ] = [ ϕ ( X ) ] T ⋅ 0 = 0 \mathbb E[f(\mathcal X)] = \mathbb E\left\{[\phi(\mathcal X)]^T \mathcal W\right\} = [\phi(\mathcal X)]^T \mathbb E[\mathcal W] = [\phi(\mathcal X)]^T \cdot 0 = 0 E[f(X)]=E{[ϕ(X)]TW}=[ϕ(X)]TE[W]=[ϕ(X)]T0=0
对于任意 x ( i ) , x ( j ) ∈ R p x^{(i)},x^{(j)} \in \mathbb R^p x(i),x(j)Rp,对应函数结果的协方差 C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] Cov \left[f(x^{(i)}),f(x^{(j)})\right] Cov[f(x(i)),f(x(j))]表示如下:
C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] = E { [ f ( x ( i ) ) − E [ f ( x ( i ) ) ] ] ⋅ [ f ( x ( j ) ) − E [ f ( x ( j ) ) ] ] } = E { [ f ( x ( i ) ) − 0 ] ⋅ [ f ( x ( j ) ) − 0 ] } = E [ f ( x ( i ) ) ⋅ f ( x ( j ) ) ] = E [ ϕ ( x ( i ) ) T W ⋅ ϕ ( x ( j ) ) T W ] \begin{aligned} Cov \left[f(x^{(i)}),f(x^{(j)})\right] & = \mathbb E \left\{\left[f(x^{(i)}) -\mathbb E[f(x^{(i)})] \right] \cdot \left[f(x^{(j)}) -\mathbb E[f(x^{(j)})] \right] \right\} \\ & = \mathbb E \left\{\left[f(x^{(i)}) -0 \right] \cdot \left[f(x^{(j)}) -0 \right] \right\} \\ & = \mathbb E \left[f(x^{(i)}) \cdot f(x^{(j)})\right] \\ & = \mathbb E \left[\phi(x^{(i)})^T\mathcal W \cdot \phi(x^{(j)})^T\mathcal W\right] \end{aligned} Cov[f(x(i)),f(x(j))]=E{[f(x(i))E[f(x(i))]][f(x(j))E[f(x(j))]]}=E{[f(x(i))0][f(x(j))0]}=E[f(x(i))f(x(j))]=E[ϕ(x(i))TWϕ(x(j))TW]
由于 ϕ ( x ( j ) ) T W \phi(x^{(j)})^T \mathcal W ϕ(x(j))TW结果是一个实数,因而 [ ϕ ( x ( j ) ) T W ] T = W T ϕ ( x ( j ) ) \left[\phi(x^{(j)})^T \mathcal W\right]^T = \mathcal W^T\phi(x^{(j)}) [ϕ(x(j))TW]T=WTϕ(x(j))等于 ϕ ( x ( j ) ) T W \phi(x^{(j)})^T \mathcal W ϕ(x(j))TW自身。因而有:
Δ \Delta Δ表示上述推导结果。
Δ = E [ ϕ ( x ( i ) ) T W ⋅ W T ϕ ( x ( j ) ) ] = [ ϕ ( x ( i ) ) ] T ⋅ E [ W ⋅ W T ] ⋅ ϕ ( x ( j ) ) \begin{aligned} \Delta & = \mathbb E \left[\phi(x^{(i)})^T\mathcal W \cdot \mathcal W^T \phi(x^{(j)})\right] \\ & = [\phi(x^{(i)})]^T \cdot \mathbb E[\mathcal W \cdot \mathcal W^T] \cdot \phi(x^{(j)}) \end{aligned} Δ=E[ϕ(x(i))TWWTϕ(x(j))]=[ϕ(x(i))]TE[WWT]ϕ(x(j))
观察 E [ W ⋅ W T ] \mathbb E[\mathcal W \cdot \mathcal W^T] E[WWT],它实际上就是:
E [ W ⋅ W T ] = E [ ( W − 0 ) ⋅ ( W T − 0 ) ] = E { [ W − E [ W ] ] ⋅ [ W − E [ W ] ] T } = C o v ( W , W ) = Σ p r i o r \begin{aligned} \mathbb E[\mathcal W \cdot \mathcal W^T] & = \mathbb E \left[(\mathcal W - 0) \cdot (\mathcal W^T - 0)\right] \\ & = \mathbb E\left\{[\mathcal W - \mathbb E[\mathcal W]] \cdot [\mathcal W - \mathbb E[\mathcal W]]^T\right\} \\ & = Cov(\mathcal W,\mathcal W) \\ & = \Sigma_{prior} \end{aligned} E[WWT]=E[(W0)(WT0)]=E{[WE[W]][WE[W]]T}=Cov(W,W)=Σprior
至此,关于 f ( x ( i ) ) f(x^{(i)}) f(x(i)) f ( x ( j ) ) f(x^{(j)}) f(x(j))协方差结果 C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] Cov \left[f(x^{(i)}),f(x^{(j)})\right] Cov[f(x(i)),f(x(j))]表示如下:
C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] = [ ϕ ( x ( i ) ) ] 1 × q T ⋅ [ Σ p r i o r ] q × q ⋅ [ ϕ ( x ( j ) ) ] q × 1 = K ( x ( i ) , x ( j ) ) \begin{aligned} Cov\left[f(x^{(i)}),f(x^{(j)})\right] & = [\phi(x^{(i)})]_{1 \times q}^T \cdot [\Sigma_{prior}]_{q \times q} \cdot [\phi(x^{(j)})]_{q \times 1} \\ & = \mathcal K(x^{(i)},x^{(j)}) \end{aligned} Cov[f(x(i)),f(x(j))]=[ϕ(x(i))]1×qT[Σprior]q×q[ϕ(x(j))]q×1=K(x(i),x(j))

小插曲:记号函数 K \mathcal K K是核函数的必要性证明

继续将 C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] Cov\left[f(x^{(i)}),f(x^{(j)})\right] Cov[f(x(i)),f(x(j))]展开,有:
权重空间角度文章的末尾介绍的是‘记号函数’ K ( ⋅ , ⋅ ) \mathcal K(\cdot,\cdot) K(,)的充分性证明。这里顺势补充一下必要性证明。
C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] = ( x 1 ( i ) , x 2 ( i ) , ⋯   , x q ( i ) ) ( Σ p r i o r 11 , Σ p r i o r 12 , ⋯   , Σ p r i o r 1 q Σ p r i o r 21 , Σ p r i o r 22 , ⋯   , Σ p r i o r 2 q ⋮ Σ p r i o r q 1 , Σ p r i o r q 2 , ⋯   , Σ p r i o r q q ) ( x 1 ( j ) x 2 ( j ) ⋮ x q ( j ) ) Σ p r i o r i j = C o v ( w i , w j ) ; w i , w j ∈ W = [ ∑ k = 1 q x k ( i ) Σ p r i o r k 1 , ⋯   , ∑ k = 1 q x k ( i ) Σ p r i o r k q ] ( x 1 ( j ) x 2 ( j ) ⋮ x q ( j ) ) = ∑ l = 1 q ∑ k = 1 q x k ( i ) ⋅ Σ p r i o r k l ⋅ x l ( j ) \begin{aligned} Cov\left[f(x^{(i)}),f(x^{(j)})\right] & = (x_1^{(i)},x_2^{(i)},\cdots,x_q^{(i)})\begin{pmatrix} \Sigma_{prior}^{11},\Sigma_{prior}^{12},\cdots,\Sigma_{prior}^{1q} \\ \Sigma_{prior}^{21},\Sigma_{prior}^{22},\cdots,\Sigma_{prior}^{2q} \\ \vdots \\ \Sigma_{prior}^{q1},\Sigma_{prior}^{q2},\cdots,\Sigma_{prior}^{qq} \\ \end{pmatrix}\begin{pmatrix} x_1^{(j)} \\ x_2^{(j)} \\ \vdots \\ x_q^{(j)} \end{pmatrix} \quad \Sigma_{prior}^{ij} = Cov(w_i,w_j);w_i,w_j \in \mathcal W \\ & = \left[\sum_{k=1}^qx_k^{(i)}\Sigma_{prior}^{k1},\cdots,\sum_{k=1}^qx_k^{(i)}\Sigma_{prior}^{kq}\right]\begin{pmatrix} x_1^{(j)} \\ x_2^{(j)} \\ \vdots \\ x_q^{(j)} \end{pmatrix} \\ & = \sum_{l=1}^q\sum_{k=1}^q x_k^{(i)} \cdot \Sigma_{prior}^{kl} \cdot x_l^{(j)} \end{aligned} Cov[f(x(i)),f(x(j))]=(x1(i),x2(i),,xq(i))Σprior11,Σprior12,,Σprior1qΣprior21,Σprior22,,Σprior2qΣpriorq1,Σpriorq2,,Σpriorqqx1(j)x2(j)xq(j)Σpriorij=Cov(wi,wj);wi,wjW=[k=1qxk(i)Σpriork1,,k=1qxk(i)Σpriorkq]x1(j)x2(j)xq(j)=l=1qk=1qxk(i)Σpriorklxl(j)
其中, x k ( i ) , Σ p r i o r k l , x l ( j ) x_k^{(i)},\Sigma_{prior}^{kl},x_l^{(j)} xk(i),Σpriorkl,xl(j)均表示实数,因而有:
∑ l = 1 q ∑ k = 1 q x k ( i ) ⋅ Σ p r i o r k l ⋅ x l ( j ) = ∑ l = 1 q ∑ k = 1 q x l ( j ) ⋅ Σ p r i o r k l ⋅ x k ( i ) ⇒ C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] = C o v [ f ( x ( j ) ) , f ( x ( i ) ) ] ⇒ K ( x ( i ) , x ( j ) ) = K ( x ( j ) , x ( i ) ) \begin{aligned} & \sum_{l=1}^q\sum_{k=1}^q x_k^{(i)} \cdot \Sigma_{prior}^{kl} \cdot x_l^{(j)} = \sum_{l=1}^q\sum_{k=1}^q x_l^{(j)} \cdot \Sigma_{prior}^{kl} \cdot x_k^{(i)} \\ & \Rightarrow Cov \left[f(x^{(i)}),f(x^{(j)})\right] = Cov \left[f(x^{(j)}),f(x^{(i)})\right] \\ & \Rightarrow \mathcal K(x^{(i)},x^{(j)}) = \mathcal K(x^{(j)},x^{(i)}) \end{aligned} l=1qk=1qxk(i)Σpriorklxl(j)=l=1qk=1qxl(j)Σpriorklxk(i)Cov[f(x(i)),f(x(j))]=Cov[f(x(j)),f(x(i))]K(x(i),x(j))=K(x(j),x(i))
这意味着核矩阵 K \mathbb K K实对称矩阵,那么它必然是半正定的
K = [ K ( x ( 1 ) , x ( 1 ) ) , K ( x ( 1 ) , x ( 2 ) ) , ⋯   , K ( x ( 1 ) , x ( N ) ) K ( x ( 2 ) , x ( 1 ) ) , K ( x ( 2 ) , x ( 2 ) ) , ⋯   , K ( x ( 2 ) , x ( N ) ) ⋮ K ( x ( N ) , x ( 1 ) ) , K ( x ( N ) , x ( 2 ) ) , ⋯   , K ( x ( N ) , x ( N ) ) ] N × N \mathbb K = \begin{bmatrix} \mathcal K(x^{(1)},x^{(1)}),\mathcal K(x^{(1)},x^{(2)}),\cdots,\mathcal K(x^{(1)},x^{(N)}) \\ \mathcal K(x^{(2)},x^{(1)}),\mathcal K(x^{(2)},x^{(2)}),\cdots,\mathcal K(x^{(2)},x^{(N)}) \\ \vdots \\ \mathcal K(x^{(N)},x^{(1)}),\mathcal K(x^{(N)},x^{(2)}),\cdots,\mathcal K(x^{(N)},x^{(N)}) \\ \end{bmatrix}_{N \times N} K=K(x(1),x(1)),K(x(1),x(2)),,K(x(1),x(N))K(x(2),x(1)),K(x(2),x(2)),,K(x(2),x(N))K(x(N),x(1)),K(x(N),x(2)),,K(x(N),x(N))N×N
至此,证明记号 K \mathcal K K函数是正定核函数
正定核函数必要性证明参考传送门

言归正传

根据 C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] = K ( x ( i ) , x ( j ) ) Cov\left[f(x^{(i)}),f(x^{(j)})\right] = \mathcal K(x^{(i)},x^{(j)}) Cov[f(x(i)),f(x(j))]=K(x(i),x(j)),这意味着:如果将 { f ( X ) } x ∈ R p = { f ( x 1 ) , f ( x 2 ) , ⋯   , f ( x p ) } \{f(\mathcal X)\}_{x \in \mathbb R^p} = \{f(x_1),f(x_2),\cdots,f(x_p)\} {f(X)}xRp={f(x1),f(x2),,f(xp)}本身看做一个随机变量集合,那么这个随机变量本身的协方差结果可以由核函数表示

回顾高斯过程的定义式: { ξ t } t ∈ T ∼ G P [ m ( t ) , K ( t , s ) ] ( s , t ∈ T ) \{\xi_t\}_{t \in \mathcal T} \sim \mathcal G\mathcal P[m(t),\mathcal K(t,s)] \quad (s,t \in \mathcal T) {ξt}tTGP[m(t),K(t,s)](s,tT),其中 s , t s,t s,t本身不是随机变量,它们仅是描述连续域状态/时刻的下标(index),和随机变量 ξ \xi ξ之间不存在关系。因而可以将高斯过程定义式表示为如下形式:
{ { f ( X ) } X ∈ R p ∼ G P [ m ( X ) , K ( x ( i ) , x ( j ) ) ] x ( i ) , x ( j ) ∈ X { ξ t } t ∈ T ∼ G P [ m ( t ) , K ( t , s ) ] ( s , t ∈ T ) \begin{cases} \{f(\mathcal X)\}_{\mathcal X \in \mathbb R^p} \sim \mathcal G\mathcal P[m(\mathcal X),\mathcal K(x^{(i)},x^{(j)})] \quad x^{(i)},x^{(j)} \in \mathcal X \\ \{\xi_t\}_{t \in \mathcal T} \sim \mathcal G\mathcal P[m(t),\mathcal K(t,s)] \quad (s,t \in \mathcal T) \end{cases} {{f(X)}XRpGP[m(X),K(x(i),x(j))]x(i),x(j)X{ξt}tTGP[m(t),K(t,s)](s,tT)

小结

对比一下两种高斯过程的表达:

  • t t t ξ t \xi_t ξt之间不存在关联关系,只是一个下标的表示;而 X \mathcal X X f ( X ) f(\mathcal X) f(X)之间存在明确的函数关系
  • ξ t \xi_t ξt表示连续域 T \mathcal T T t t t时刻的一个高维随机变量;而 f ( X ) f(\mathcal X) f(X)表示 p p p维实数域 R p \mathbb R^p Rp中某随机变量 X \mathcal X X对应的高维随机变量
  • 均值函数、方差函数:这里以方差函数为例,它们均表示连续域中随机变量集合的核矩阵
    K ( s , t ) ⇒ [ K ( ξ t 1 , ξ t 1 ) , K ( ξ t 1 , ξ t 2 ) , ⋯   , K ( ξ t 1 , ξ t n ) K ( ξ t 2 , ξ t 1 ) , K ( ξ t 2 , ξ t 2 ) , ⋯   , K ( ξ t 2 , ξ t n ) ⋮ K ( ξ t n , ξ t 1 ) , K ( ξ t n , ξ t 2 ) , ⋯   , K ( ξ t n , ξ t n ) ] n × n s , t ∈ { t 1 , t 2 , ⋯   , t n } K ( x ( i ) , x ( j ) ) ⇒ [ K ( x ( 1 ) , x ( 1 ) ) , K ( x ( 1 ) , x ( 2 ) ) , ⋯   , K ( x ( 1 ) , x ( N ) ) K ( x ( 2 ) , x ( 1 ) ) , K ( x ( 2 ) , x ( 2 ) ) , ⋯   , K ( x ( 2 ) , x ( N ) ) ⋮ K ( x ( N ) , x ( 1 ) ) , K ( x ( N ) , x ( 2 ) ) , ⋯   , K ( x ( N ) , x ( N ) ) ] N × N x ( i ) , x ( j ) ∈ X \begin{aligned} \mathcal K(s,t) & \Rightarrow \begin{bmatrix} \mathcal K(\xi_{t_1},\xi_{t_1}),\mathcal K(\xi_{t_1},\xi_{t_2}),\cdots,\mathcal K(\xi_{t_1},\xi_{t_n}) \\ \mathcal K(\xi_{t_2},\xi_{t_1}),\mathcal K(\xi_{t_2},\xi_{t_2}),\cdots,\mathcal K(\xi_{t_2},\xi_{t_n}) \\ \vdots \\ \mathcal K(\xi_{t_n},\xi_{t_1}),\mathcal K(\xi_{t_n},\xi_{t_2}),\cdots,\mathcal K(\xi_{t_n},\xi_{t_n}) \\ \end{bmatrix}_{n \times n} \quad s,t \in \{t_1,t_2,\cdots,t_n\} \\ \mathcal K(x^{(i)},x^{(j)}) & \Rightarrow \begin{bmatrix} \mathcal K(x^{(1)},x^{(1)}),\mathcal K(x^{(1)},x^{(2)}),\cdots,\mathcal K(x^{(1)},x^{(N)}) \\ \mathcal K(x^{(2)},x^{(1)}),\mathcal K(x^{(2)},x^{(2)}),\cdots,\mathcal K(x^{(2)},x^{(N)}) \\ \vdots \\ \mathcal K(x^{(N)},x^{(1)}),\mathcal K(x^{(N)},x^{(2)}),\cdots,\mathcal K(x^{(N)},x^{(N)}) \\ \end{bmatrix}_{N \times N} \quad x^{(i)},x^{(j)} \in \mathcal X \end{aligned} K(s,t)K(x(i),x(j))K(ξt1,ξt1),K(ξt1,ξt2),,K(ξt1,ξtn)K(ξt2,ξt1),K(ξt2,ξt2),,K(ξt2,ξtn)K(ξtn,ξt1),K(ξtn,ξt2),,K(ξtn,ξtn)n×ns,t{t1,t2,,tn}K(x(1),x(1)),K(x(1),x(2)),,K(x(1),x(N))K(x(2),x(1)),K(x(2),x(2)),,K(x(2),x(N))K(x(N),x(1)),K(x(N),x(2)),,K(x(N),x(N))N×Nx(i),x(j)X

关于给定样本 x ^ \hat x x^预测任务中:

  • 权重空间角度关注模型参数 W \mathcal W W,对预测任务的表达式如下:
    P ( y ^ ∣ x ^ , D a t a ) = ∫ W ∣ D a t a P ( y ^ ∣ W , x ^ ) ⋅ P ( W ∣ D a t a ) d W \mathcal P(\hat y \mid \hat x,Data) = \int_{\mathcal W \mid Data} \mathcal P(\hat y \mid \mathcal W,\hat x) \cdot \mathcal P(\mathcal W \mid Data) d\mathcal W P(y^x^,Data)=WDataP(y^W,x^)P(WData)dW
  • 函数空间角度关注 f ( X ) f(\mathcal X) f(X)自身,将 f ( X ) = [ ϕ ( X ) ] T W f(\mathcal X) = [\phi(\mathcal X)]^T \mathcal W f(X)=[ϕ(X)]TW自身看作随机变量,对预测任务的表达式如下:
    P ( y ^ ∣ D a t a , x ^ ) = ∫ f ( X ) P ( y ^ ∣ f ( X ) , x ^ ) ⋅ P [ f ( X ) ∣ D a t a ]   d f ( X ) \mathcal P(\hat y \mid Data,\hat x) = \int_{f(\mathcal X)} \mathcal P(\hat y \mid f(\mathcal X),\hat x) \cdot \mathcal P[f(\mathcal X) \mid Data]\text{ }df(\mathcal X) P(y^Data,x^)=f(X)P(y^f(X),x^)P[f(X)Data] df(X)

函数空间角度与权重空间角度的核心差别在于 K ( x ( i ) , x ( j ) ) \mathcal K(x^{(i)},x^{(j)}) K(x(i),x(j))的表示上。

  • 权重空间角度需要将 x ( i ) , x ( j ) → ϕ ( x ( i ) ) , ϕ ( x ( j ) ) x^{(i)},x^{(j)} \to \phi(x^{(i)}),\phi(x^{(j)}) x(i),x(j)ϕ(x(i)),ϕ(x(j)),然后通过高维转换后的样本维度重新对 W \mathcal W W的先验分布 P ( W ) \mathcal P(\mathcal W) P(W)进行设定 → N ( 0 , Σ p r i o r ) \to \mathcal N(0,\Sigma_{prior}) N(0,Σprior)。再凑成 K ( x ( i ) , x ( j ) ) = ϕ ( x ( i ) ) Σ p r i o r ϕ ( x ( j ) ) \mathcal K(x^{(i)},x^{(j)}) = \phi(x^{(i)})\Sigma_{prior}\phi(x^{(j)}) K(x(i),x(j))=ϕ(x(i))Σpriorϕ(x(j))的格式,去求解 W \mathcal W W的后验概率分布 P ( W ∣ D a t a ) \mathcal P(\mathcal W \mid Data) P(WData);
  • 函数空间角度直接用 C o v [ f ( x ( i ) ) , f ( x ( j ) ) ] Cov[f(x^{(i)}),f(x^{(j)})] Cov[f(x(i)),f(x(j))]表示 K ( x ( i ) , x ( j ) ) \mathcal K(x^{(i)},x^{(j)}) K(x(i),x(j)),从而并不需要单独求解 W \mathcal W W,而是直接求解 f ( x ( i ) ) = [ ϕ ( x ( i ) ) ] T W , f ( x ( j ) ) = [ ϕ ( x ( j ) ) ] T W f(x^{(i)}) = [\phi(x^{(i)})]^T\mathcal W,f(x^{(j)}) = [\phi(x^{(j)})]^T\mathcal W f(x(i))=[ϕ(x(i))]TW,f(x(j))=[ϕ(x(j))]TW即可。在预测任务中,直接通过 [ ϕ ( x ) ] T W [\phi(x)]^T\mathcal W [ϕ(x)]TW替代 W \mathcal W W执行预测任务。

相关参考:
机器学习-高斯过程回归-权重空间到函数空间(From Weight-Space To Function-Space)

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

静静的喝酒

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值