机器学习笔记之核方法——正定核函数的充要性证明
引言
上一节介绍了核方法思想与核函数,本节将介绍正定核函数的充要性证明。
回顾:核函数与正定核函数
首先,核(Kernal)表示两个样本空间的映射,将
p
p
p维特征空间映射至一维实数域的映射:
κ
(
x
(
i
)
,
x
(
j
)
)
→
R
∀
x
(
i
)
,
x
(
j
)
∈
X
;
x
(
i
)
,
x
(
j
)
∈
R
p
\kappa(x^{(i)},x^{(j)}) \to \mathbb R \quad \forall x^{(i)},x^{(j)} \in \mathcal X;x^{(i)},x^{(j)} \in \mathbb R^p
κ(x(i),x(j))→R∀x(i),x(j)∈X;x(i),x(j)∈Rp
正定核函数在核函数的基础上,表示将特征空间
x
(
i
)
,
x
(
j
)
x^{(i)},x^{(j)}
x(i),x(j)经过非线性转换得到的高维特征空间
ϕ
(
x
(
i
)
)
,
ϕ
(
x
(
j
)
)
\phi(x^{(i)}),\phi(x^{(j)})
ϕ(x(i)),ϕ(x(j))之间的内积:
κ
(
x
(
i
)
,
x
(
j
)
)
=
⟨
ϕ
(
x
(
i
)
)
,
ϕ
(
x
(
j
)
)
⟩
=
[
ϕ
(
x
(
i
)
)
]
T
ϕ
(
x
(
j
)
)
x
(
i
)
,
x
(
j
)
∈
X
\begin{aligned} \kappa(x^{(i)},x^{(j)}) & = \left\langle\phi(x^{(i)}),\phi(x^{(j)})\right\rangle \\ & = \left[\phi(x^{(i)})\right]^T \phi(x^{(j)}) \quad x^{(i)},x^{(j)} \in \mathcal X \end{aligned}
κ(x(i),x(j))=⟨ϕ(x(i)),ϕ(x(j))⟩=[ϕ(x(i))]Tϕ(x(j))x(i),x(j)∈X
正定核函数的性质
- 对称性:正定核函数作为两特征空间作为输入的函数,特征空间的输入顺序不影响函数结果:
κ ( x ( i ) , x ( j ) ) = κ ( x ( j ) , x ( i ) ) \kappa(x^{(i)},x^{(j)}) = \kappa(x^{(j)},x^{(i)}) κ(x(i),x(j))=κ(x(j),x(i)) - 正定性:如果
κ
(
⋅
,
⋅
)
\kappa(\cdot,\cdot)
κ(⋅,⋅)是正定核函数,那么从样本集合
X
\mathcal X
X中任取
N
N
N个样本
x
(
1
)
,
x
(
2
)
,
⋯
,
x
(
N
)
∈
X
x^{(1)},x^{(2)},\cdots,x^{(N)} \in \mathcal X
x(1),x(2),⋯,x(N)∈X,对应的核矩阵
K
\mathcal K
K总是半正定的。核矩阵
K
\mathcal K
K表示如下:
K = [ κ ( x ( 1 ) , x ( 1 ) ) , κ ( x ( 1 ) , x ( 2 ) ) , ⋯ , κ ( x ( 1 ) , x ( N ) ) κ ( x ( 2 ) , x ( 1 ) ) , κ ( x ( 2 ) , x ( 2 ) ) , ⋯ , κ ( x ( 2 ) , x ( N ) ) ⋮ κ ( x ( N ) , x ( 1 ) ) , κ ( x ( N ) , x ( 2 ) ) , ⋯ , κ ( x ( N ) , x ( N ) ) ] N × N \mathcal K = \begin{bmatrix} \kappa(x^{(1)},x^{(1)}),\kappa(x^{(1)},x^{(2)}),\cdots,\kappa(x^{(1)},x^{(N)}) \\ \kappa(x^{(2)},x^{(1)}),\kappa(x^{(2)},x^{(2)}),\cdots,\kappa(x^{(2)},x^{(N)}) \\ \vdots \\ \kappa(x^{(N)},x^{(1)}),\kappa(x^{(N)},x^{(2)}),\cdots,\kappa(x^{(N)},x^{(N)}) \\ \end{bmatrix}_{N \times N} K= κ(x(1),x(1)),κ(x(1),x(2)),⋯,κ(x(1),x(N))κ(x(2),x(1)),κ(x(2),x(2)),⋯,κ(x(2),x(N))⋮κ(x(N),x(1)),κ(x(N),x(2)),⋯,κ(x(N),x(N)) N×N
而这两个性质同样是判定是否为正定核函数的充要条件。
正定核函数的充要性证明
证明要求为:已知 κ ( x ( i ) , x ( j ) ) \kappa(x^{(i)},x^{(j)}) κ(x(i),x(j))是正定核函数,证:该函数对应的核矩阵 K \mathcal K K是半正定的,且 κ ( x ( i ) , x ( j ) ) \kappa(x^{(i)},x^{(j)}) κ(x(i),x(j))具有对称性。
对称性证明
基于正定核函数的定义:
κ
(
x
(
i
)
,
x
(
j
)
)
=
⟨
ϕ
(
x
(
i
)
)
,
ϕ
(
x
(
j
)
)
⟩
=
[
ϕ
(
x
(
i
)
)
]
T
ϕ
(
x
(
j
)
)
\kappa(x^{(i)},x^{(j)}) = \left\langle\phi(x^{(i)}),\phi(x^{(j)})\right\rangle =\left[\phi(x^{(i)})\right]^T\phi(x^{(j)})
κ(x(i),x(j))=⟨ϕ(x(i)),ϕ(x(j))⟩=[ϕ(x(i))]Tϕ(x(j))
因而有:
调转变量顺序~
κ
(
x
(
j
)
,
x
(
i
)
)
=
⟨
ϕ
(
x
(
j
)
)
,
ϕ
(
x
(
i
)
)
⟩
=
[
ϕ
(
x
(
j
)
)
]
T
ϕ
(
x
(
i
)
)
\kappa(x^{(j)},x^{(i)}) = \left\langle\phi(x^{(j)}),\phi(x^{(i)})\right\rangle = \left[\phi(x^{(j)})\right]^T\phi(x^{(i)})
κ(x(j),x(i))=⟨ϕ(x(j)),ϕ(x(i))⟩=[ϕ(x(j))]Tϕ(x(i))
又由于内积运算本身存在交换律,因而有:
[
ϕ
(
x
(
j
)
)
]
T
ϕ
(
x
(
i
)
)
=
[
ϕ
(
x
(
i
)
)
]
T
ϕ
(
x
(
j
)
)
κ
(
x
(
i
)
,
x
(
j
)
)
=
κ
(
x
(
j
)
,
x
(
i
)
)
\begin{aligned} \left[\phi(x^{(j)})\right]^T\phi(x^{(i)}) & = \left[\phi(x^{(i)})\right]^T\phi(x^{(j)}) \\ \kappa(x^{(i)},x^{(j)}) & = \kappa(x^{(j)},x^{(i)}) \end{aligned}
[ϕ(x(j))]Tϕ(x(i))κ(x(i),x(j))=[ϕ(x(i))]Tϕ(x(j))=κ(x(j),x(i))
因此,正定核函数
κ
(
⋅
,
⋅
)
\kappa(\cdot,\cdot)
κ(⋅,⋅)满足对称性。
正定性的必要性证明
已知一个方阵 A N × N \mathcal A_{N \times N} AN×N是半正定矩阵的充要条件:对于任意 N N N维向量 α \alpha α,都有 α T A α ≥ 0 \alpha^T\mathcal A \alpha \geq 0 αTAα≥0恒成立。
定义向量
α
\alpha
α表示如下:
α
=
(
α
(
1
)
,
α
(
2
)
,
⋯
,
α
(
N
)
)
T
\alpha = (\alpha^{(1)},\alpha^{(2)},\cdots,\alpha^{(N)})^T
α=(α(1),α(2),⋯,α(N))T
观察
α
T
K
α
\alpha^T\mathcal K\alpha
αTKα的结果:
α
T
K
α
=
(
α
(
1
)
,
α
(
2
)
,
⋯
,
α
(
N
)
)
1
×
N
[
κ
(
x
(
1
)
,
x
(
1
)
)
,
κ
(
x
(
1
)
,
x
(
2
)
)
,
⋯
,
κ
(
x
(
1
)
,
x
(
N
)
)
κ
(
x
(
2
)
,
x
(
1
)
)
,
κ
(
x
(
2
)
,
x
(
2
)
)
,
⋯
,
κ
(
x
(
2
)
,
x
(
N
)
)
⋮
κ
(
x
(
N
)
,
x
(
1
)
)
,
κ
(
x
(
N
)
,
x
(
2
)
)
,
⋯
,
κ
(
x
(
N
)
,
x
(
N
)
)
]
N
×
N
(
α
(
1
)
α
(
2
)
⋮
α
(
N
)
)
N
×
1
\begin{aligned} \alpha^T\mathcal K\alpha = (\alpha^{(1)},\alpha^{(2)},\cdots,\alpha^{(N)})_{1 \times N} \begin{bmatrix} \kappa(x^{(1)},x^{(1)}),\kappa(x^{(1)},x^{(2)}),\cdots,\kappa(x^{(1)},x^{(N)}) \\ \kappa(x^{(2)},x^{(1)}),\kappa(x^{(2)},x^{(2)}),\cdots,\kappa(x^{(2)},x^{(N)}) \\ \vdots \\ \kappa(x^{(N)},x^{(1)}),\kappa(x^{(N)},x^{(2)}),\cdots,\kappa(x^{(N)},x^{(N)}) \\ \end{bmatrix}_{N\times N} \begin{pmatrix}\alpha^{(1)} \\ \alpha^{(2)} \\ \vdots \\ \alpha^{(N)}\end{pmatrix}_{N \times 1} \end{aligned}
αTKα=(α(1),α(2),⋯,α(N))1×N
κ(x(1),x(1)),κ(x(1),x(2)),⋯,κ(x(1),x(N))κ(x(2),x(1)),κ(x(2),x(2)),⋯,κ(x(2),x(N))⋮κ(x(N),x(1)),κ(x(N),x(2)),⋯,κ(x(N),x(N))
N×N
α(1)α(2)⋮α(N)
N×1
观察上述矩阵/向量格式,
α
T
K
α
\alpha^T\mathcal K\alpha
αTKα的最终结果是一个实数。将
α
T
K
α
\alpha^T\mathcal K\alpha
αTKα继续展开:
α
T
K
α
=
[
α
(
1
)
⋅
κ
(
x
(
1
)
,
x
(
1
)
)
+
⋯
+
α
(
N
)
⋅
κ
(
x
(
N
)
,
x
(
1
)
)
,
⋯
,
α
(
1
)
⋅
κ
(
x
(
1
)
,
x
(
N
)
)
+
⋯
+
α
(
N
)
⋅
κ
(
x
(
N
)
,
x
(
N
)
)
]
(
α
(
1
)
α
(
2
)
⋮
α
(
N
)
)
=
[
∑
i
=
1
N
α
(
i
)
⋅
κ
(
x
(
i
)
,
x
(
1
)
)
,
⋯
,
∑
i
=
1
N
α
(
i
)
κ
(
x
(
i
)
,
x
(
N
)
)
]
(
α
(
1
)
α
(
2
)
⋮
α
(
N
)
)
=
α
(
1
)
⋅
∑
i
=
1
N
α
(
i
)
⋅
κ
(
x
(
i
)
,
x
(
1
)
)
+
⋯
+
α
(
N
)
⋅
∑
i
=
1
N
α
(
i
)
κ
(
x
(
i
)
,
x
(
N
)
)
=
∑
i
=
1
N
∑
j
=
1
N
α
(
i
)
α
(
j
)
κ
(
x
(
i
)
,
x
(
j
)
)
\begin{aligned} \alpha^T\mathcal K\alpha & = \left[\alpha^{(1)}\cdot \kappa(x^{(1)},x^{(1)}) + \cdots +\alpha^{(N)} \cdot \kappa(x^{(N)},x^{(1)}),\cdots,\alpha^{(1)}\cdot \kappa(x^{(1)},x^{(N)}) + \cdots +\alpha^{(N)} \cdot \kappa(x^{(N)},x^{(N)})\right] \begin{pmatrix}\alpha^{(1)} \\ \alpha^{(2)} \\ \vdots \\ \alpha^{(N)}\end{pmatrix} \\ & = \left[\sum_{i=1}^N \alpha^{(i)} \cdot \kappa(x^{(i)},x^{(1)}),\cdots,\sum_{i=1}^N\alpha^{(i)} \kappa(x^{(i)},x^{(N)})\right]\begin{pmatrix}\alpha^{(1)} \\ \alpha^{(2)} \\ \vdots \\ \alpha^{(N)}\end{pmatrix} \\ & = \alpha^{(1)} \cdot \sum_{i=1}^N \alpha^{(i)} \cdot \kappa(x^{(i)},x^{(1)}) + \cdots + \alpha^{(N)} \cdot \sum_{i=1}^N\alpha^{(i)} \kappa(x^{(i)},x^{(N)}) \\ & = \sum_{i=1}^N\sum_{j=1}^N \alpha^{(i)}\alpha^{(j)} \kappa(x^{(i)},x^{(j)}) \end{aligned}
αTKα=[α(1)⋅κ(x(1),x(1))+⋯+α(N)⋅κ(x(N),x(1)),⋯,α(1)⋅κ(x(1),x(N))+⋯+α(N)⋅κ(x(N),x(N))]
α(1)α(2)⋮α(N)
=[i=1∑Nα(i)⋅κ(x(i),x(1)),⋯,i=1∑Nα(i)κ(x(i),x(N))]
α(1)α(2)⋮α(N)
=α(1)⋅i=1∑Nα(i)⋅κ(x(i),x(1))+⋯+α(N)⋅i=1∑Nα(i)κ(x(i),x(N))=i=1∑Nj=1∑Nα(i)α(j)κ(x(i),x(j))
已知正定核函数
κ
(
x
(
i
)
,
x
(
j
)
)
=
[
ϕ
(
x
(
i
)
)
]
T
ϕ
(
x
(
j
)
)
\kappa(x^{(i)},x^{(j)}) = \left[\phi(x^{(i)})\right]^T\phi(x^{(j)})
κ(x(i),x(j))=[ϕ(x(i))]Tϕ(x(j)),因而有:
α
(
i
)
,
[
ϕ
(
x
(
i
)
)
]
T
\alpha^{(i)},\left[\phi(x^{(i)})\right]^T
α(i),[ϕ(x(i))]T均不含
j
j
j,因而从
j
j
j的视角观察,这两项均视作常数,将它们提到前面。
α
T
K
α
=
∑
i
=
1
N
∑
j
=
1
N
α
(
i
)
α
(
j
)
[
ϕ
(
x
(
i
)
)
]
T
ϕ
(
x
(
j
)
)
=
∑
i
=
1
N
α
(
i
)
[
ϕ
(
x
(
i
)
)
]
T
∑
j
=
1
N
α
(
j
)
ϕ
(
x
(
j
)
)
\begin{aligned} \alpha^T\mathcal K\alpha & = \sum_{i=1}^N\sum_{j=1}^N \alpha^{(i)}\alpha^{(j)} \left[\phi(x^{(i)})\right]^T\phi(x^{(j)}) \\ & = \sum_{i=1}^N \alpha^{(i)}\left[\phi(x^{(i)})\right]^T \sum_{j=1}^N \alpha^{(j)} \phi(x^{(j)}) \end{aligned}
αTKα=i=1∑Nj=1∑Nα(i)α(j)[ϕ(x(i))]Tϕ(x(j))=i=1∑Nα(i)[ϕ(x(i))]Tj=1∑Nα(j)ϕ(x(j))
继续观察,由于
α
(
i
)
\alpha^{(i)}
α(i)是向量
α
=
(
α
(
1
)
,
⋯
,
α
(
N
)
)
N
×
1
T
\alpha = (\alpha^{(1)},\cdots,\alpha^{(N)})_{N \times 1}^T
α=(α(1),⋯,α(N))N×1T的一个元素,因此而它是一个常数。并且连加符号操作只是对
ϕ
(
x
(
i
)
)
\phi(x^{(i)})
ϕ(x(i))各项元素进行累加运算,不改变向量结果的维度。最终表示为如下形式:
α
T
K
α
=
[
∑
i
=
1
N
α
(
i
)
ϕ
(
x
(
i
)
)
]
T
[
∑
j
=
1
N
α
(
j
)
ϕ
(
x
(
j
)
)
]
=
⟨
∑
i
=
1
N
α
(
i
)
ϕ
(
x
(
i
)
)
,
∑
j
=
1
N
α
(
j
)
ϕ
(
x
(
j
)
)
⟩
\begin{aligned} \alpha^T\mathcal K\alpha & = \left[\sum_{i=1}^N\alpha^{(i)}\phi(x^{(i)})\right]^T \left[\sum_{j=1}^N\alpha^{(j)}\phi(x^{(j)})\right] \\ & = \left\langle\sum_{i=1}^N\alpha^{(i)}\phi(x^{(i)}),\sum_{j=1}^N\alpha^{(j)}\phi(x^{(j)})\right\rangle \end{aligned}
αTKα=[i=1∑Nα(i)ϕ(x(i))]T[j=1∑Nα(j)ϕ(x(j))]=⟨i=1∑Nα(i)ϕ(x(i)),j=1∑Nα(j)ϕ(x(j))⟩
观察,虽然使用
i
,
j
i,j
i,j两个符号去遍历
1
,
2
,
⋯
,
N
1,2,\cdots,N
1,2,⋯,N,但都是对
α
(
⋅
)
ϕ
(
x
(
⋅
)
)
(
⋅
→
i
,
j
)
\alpha^{(\cdot)}\phi(x^{(\cdot)}) \quad (\cdot \to i,j)
α(⋅)ϕ(x(⋅))(⋅→i,j)进行计算,因此:
∑
i
=
1
N
α
(
i
)
ϕ
(
x
(
i
)
)
=
∑
j
=
1
N
α
(
j
)
ϕ
(
x
(
j
)
)
\sum_{i=1}^N\alpha^{(i)}\phi(x^{(i)}) = \sum_{j=1}^N\alpha^{(j)}\phi(x^{(j)})
i=1∑Nα(i)ϕ(x(i))=j=1∑Nα(j)ϕ(x(j))
根据向量内积的定义,有:
由于
∑
i
=
1
N
α
(
i
)
ϕ
(
x
(
i
)
)
=
∑
j
=
1
N
α
(
j
)
ϕ
(
x
(
j
)
)
\sum_{i=1}^N\alpha^{(i)}\phi(x^{(i)}) = \sum_{j=1}^N\alpha^{(j)}\phi(x^{(j)})
∑i=1Nα(i)ϕ(x(i))=∑j=1Nα(j)ϕ(x(j)),意味着向量
∑
i
=
1
N
α
(
i
)
ϕ
(
x
(
i
)
)
\sum_{i=1}^N\alpha^{(i)}\phi(x^{(i)})
∑i=1Nα(i)ϕ(x(i))和向量
∑
j
=
1
N
α
(
j
)
ϕ
(
x
(
j
)
)
\sum_{j=1}^N\alpha^{(j)}\phi(x^{(j)})
∑j=1Nα(j)ϕ(x(j))是完全重合的,因此两向量之间夹角
θ
=
0
\theta = 0
θ=0
⟨
∑
i
=
1
N
α
(
i
)
ϕ
(
x
(
i
)
)
,
∑
j
=
1
N
α
(
j
)
ϕ
(
x
(
j
)
)
⟩
=
∣
∑
i
=
1
N
α
(
i
)
ϕ
(
x
(
i
)
)
∣
⋅
∣
∑
j
=
1
N
α
(
j
)
ϕ
(
x
(
j
)
)
∣
cos
θ
=
∣
∣
∑
i
=
1
N
α
(
i
)
ϕ
(
x
(
i
)
)
∣
∣
2
≥
0
\begin{aligned} & \left\langle\sum_{i=1}^N\alpha^{(i)}\phi(x^{(i)}),\sum_{j=1}^N\alpha^{(j)}\phi(x^{(j)})\right\rangle \\ & = |\sum_{i=1}^N\alpha^{(i)}\phi(x^{(i)})|\cdot |\sum_{j=1}^N\alpha^{(j)}\phi(x^{(j)})| \cos \theta \\ & = ||\sum_{i=1}^N\alpha^{(i)}\phi(x^{(i)})||^2 \geq0 \end{aligned}
⟨i=1∑Nα(i)ϕ(x(i)),j=1∑Nα(j)ϕ(x(j))⟩=∣i=1∑Nα(i)ϕ(x(i))∣⋅∣j=1∑Nα(j)ϕ(x(j))∣cosθ=∣∣i=1∑Nα(i)ϕ(x(i))∣∣2≥0
至此证明
K
\mathcal K
K是半正定矩阵。
正定性的充分性证明
证明要求:已知核矩阵 K \mathcal K K是半正定矩阵,求证: κ ( x ( i ) , x ( j ) ) \kappa(x^{(i)},x^{(j)}) κ(x(i),x(j))是正定核函数。
证明:
由于
K
\mathcal K
K是半正定矩阵,那么
K
\mathcal K
K必包含
N
N
N个线性无关的特征向量。因此根据实对称矩阵的定义,对
K
\mathcal K
K进行特征分解:
K
=
V
Λ
V
T
\mathcal K = \mathcal V\Lambda\mathcal V^T
K=VΛVT
对上述相关向量进行定义:
V
=
(
v
1
,
v
2
,
⋯
,
v
N
)
N
×
N
Λ
=
(
λ
1
λ
2
⋱
λ
N
)
N
×
N
\mathcal V = (v_1,v_2,\cdots,v_N)_{N\times N} \quad \Lambda = \begin{pmatrix} \lambda_1 & & \\ & \lambda_2 & \\ & &\ddots \\ & & &\lambda_N \end{pmatrix}_{N \times N}
V=(v1,v2,⋯,vN)N×NΛ=
λ1λ2⋱λN
N×N
至此,矩阵
K
\mathcal K
K表示如下:
注意:
λ
i
(
i
=
1
,
2
,
⋯
,
N
)
\lambda_i(i=1,2,\cdots,N)
λi(i=1,2,⋯,N)表示常数;
v
i
(
i
=
1
,
2
,
⋯
,
N
)
v_i(i=1,2,\cdots,N)
vi(i=1,2,⋯,N)表示
N
×
1
N \times 1
N×1的列向量。
K
=
(
v
1
,
v
2
,
⋯
,
v
N
)
(
λ
1
λ
2
⋱
λ
N
)
(
v
1
T
v
2
T
⋮
v
N
T
)
=
(
λ
1
v
1
,
λ
2
v
2
,
⋯
,
λ
N
v
N
)
(
v
1
T
v
2
T
⋮
v
N
T
)
=
λ
1
v
1
v
1
T
+
λ
2
v
2
v
2
T
+
⋯
+
λ
N
v
N
v
N
T
=
∑
i
=
1
N
λ
i
v
i
v
i
T
\begin{aligned} \mathcal K & = (v_1,v_2,\cdots,v_N) \begin{pmatrix} \lambda_1 & & \\ & \lambda_2 & \\ & &\ddots \\ & & &\lambda_N \end{pmatrix}\begin{pmatrix} v_1^T \\ v_2^T \\ \vdots \\ v_N^T\end{pmatrix} \\ & = (\lambda_1 v_1,\lambda_2v_2,\cdots,\lambda_N v_N)\begin{pmatrix} v_1^T \\ v_2^T \\ \vdots \\ v_N^T\end{pmatrix}\\ & = \lambda_1v_1v_1^T + \lambda_2v_2v_2^T + \cdots +\lambda_Nv_Nv_N^T \\ & = \sum_{i=1}^N\lambda_i v_i v_i^T \end{aligned}
K=(v1,v2,⋯,vN)
λ1λ2⋱λN
v1Tv2T⋮vNT
=(λ1v1,λ2v2,⋯,λNvN)
v1Tv2T⋮vNT
=λ1v1v1T+λ2v2v2T+⋯+λNvNvNT=i=1∑NλiviviT
至此,通过特征值分解得到了关于
K
\mathcal K
K的描述。使用
λ
,
v
\lambda,v
λ,v重新对
K
\mathcal K
K进行描述。
在对
K
\mathcal K
K描述之前,我们对
v
i
v_i
vi 进行描述。
v
i
v_i
vi本质上是
N
N
N个任取样本第
i
i
i维度结果构成的向量:
这里
v
i
k
(
k
=
1
,
2
,
⋯
,
N
)
v_i^{k}(k=1,2,\cdots,N)
vik(k=1,2,⋯,N)表示一个实数。是个一维信息。
v
i
=
(
v
i
(
1
)
,
v
i
(
2
)
,
⋯
,
v
i
(
N
)
)
N
×
1
T
v_i = (v_i^{(1)},v_i^{(2)},\cdots,v_i^{(N)})^T_{N \times 1}
vi=(vi(1),vi(2),⋯,vi(N))N×1T
半正定矩阵
K
\mathcal K
K的描述如下:
∑
i
=
1
N
λ
i
\sum_{i=1}^N\lambda_i
∑i=1Nλi看做常数,直接带入即可。
K
=
∑
i
=
1
N
λ
i
v
i
v
i
T
=
∑
i
=
1
N
λ
i
(
v
i
(
1
)
v
i
(
2
)
⋮
v
i
(
N
)
)
(
v
i
(
1
)
,
v
i
(
2
)
,
⋯
,
v
i
(
N
)
)
=
(
∑
i
=
1
N
λ
i
v
i
(
1
)
v
i
(
1
)
,
∑
i
=
1
N
λ
i
v
i
(
1
)
v
i
(
2
)
,
⋯
,
∑
i
=
1
N
λ
i
v
i
(
1
)
v
i
(
N
)
∑
i
=
1
N
λ
i
v
i
(
2
)
v
i
(
1
)
,
∑
i
=
1
N
λ
i
v
i
(
2
)
v
i
(
2
)
,
⋯
,
∑
i
=
1
N
λ
i
v
i
(
2
)
v
i
(
N
)
⋮
∑
i
=
1
N
λ
i
v
i
(
N
)
v
i
(
1
)
,
∑
i
=
1
N
λ
i
v
i
(
N
)
v
i
(
2
)
,
⋯
,
∑
i
=
1
N
λ
i
v
i
(
N
)
v
i
(
N
)
)
N
×
N
\begin{aligned} \mathcal K &= \sum_{i=1}^N\lambda_i v_i v_i^T \\ & = \sum_{i=1}^N\lambda_i \begin{pmatrix}v_i^{(1)}\\v_i^{(2)}\\ \vdots \\ v_i^{(N)}\end{pmatrix}(v_i^{(1)},v_i^{(2)},\cdots,v_i^{(N)}) \\ & = \begin{pmatrix} \sum_{i=1}^N \lambda_iv_i^{(1)}v_i^{(1)},\sum_{i=1}^N \lambda_iv_i^{(1)}v_i^{(2)},\cdots,\sum_{i=1}^N \lambda_iv_i^{(1)}v_i^{(N)} \\ \sum_{i=1}^N \lambda_iv_i^{(2)}v_i^{(1)},\sum_{i=1}^N \lambda_iv_i^{(2)}v_i^{(2)},\cdots,\sum_{i=1}^N \lambda_iv_i^{(2)}v_i^{(N)} \\ \vdots \\ \sum_{i=1}^N \lambda_iv_i^{(N)}v_i^{(1)},\sum_{i=1}^N \lambda_iv_i^{(N)}v_i^{(2)},\cdots,\sum_{i=1}^N \lambda_iv_i^{(N)}v_i^{(N)} \\ \end{pmatrix}_{N \times N} \end{aligned}
K=i=1∑NλiviviT=i=1∑Nλi
vi(1)vi(2)⋮vi(N)
(vi(1),vi(2),⋯,vi(N))=
∑i=1Nλivi(1)vi(1),∑i=1Nλivi(1)vi(2),⋯,∑i=1Nλivi(1)vi(N)∑i=1Nλivi(2)vi(1),∑i=1Nλivi(2)vi(2),⋯,∑i=1Nλivi(2)vi(N)⋮∑i=1Nλivi(N)vi(1),∑i=1Nλivi(N)vi(2),⋯,∑i=1Nλivi(N)vi(N)
N×N
为了表达方便,将上述矩阵中的每一项元素改写成如下形式。这里以第
j
j
j行,第
k
k
k列的元素
∑
i
=
1
N
λ
i
v
i
(
j
)
v
i
(
k
)
\sum_{i=1}^N \lambda_iv_i^{(j)}v_i^{(k)}
∑i=1Nλivi(j)vi(k)为例:
矩阵乘法~
λ
(
j
)
,
λ
(
k
)
\sqrt{\lambda^{(j)}},\sqrt{\lambda^{(k)}}
λ(j),λ(k)均是常数,可以提出来。
∑
i
=
1
N
λ
i
v
i
(
j
)
v
i
(
k
)
=
∑
i
=
1
N
λ
i
(
j
)
λ
i
(
k
)
v
i
(
j
)
v
i
(
k
)
=
∑
i
=
1
N
(
λ
i
(
j
)
v
i
(
j
)
)
(
λ
i
(
k
)
v
i
(
k
)
)
=
(
λ
(
j
)
v
(
j
)
)
T
(
λ
(
k
)
v
(
k
)
)
=
λ
(
j
)
λ
(
k
)
[
v
(
j
)
]
T
⋅
v
(
k
)
\begin{aligned} \sum_{i=1}^N \lambda_iv_i^{(j)}v_i^{(k)} & = \sum_{i=1}^N \sqrt{\lambda_i^{(j)}\lambda_i^{(k)}}v_i^{(j)}v_i^{(k)} \\ & = \sum_{i=1}^N \left(\sqrt{\lambda_i^{(j)}}v_i^{(j)}\right)\left(\sqrt{\lambda_i^{(k)}}v_i^{(k)}\right) \\ & = \left(\sqrt{\lambda^{(j)}}v^{(j)}\right)^T\left(\sqrt{\lambda^{(k)}}v^{(k)}\right) \\ & = \sqrt{\lambda^{(j)}\lambda^{(k)}} \left[v^{(j)}\right]^T\cdot v^{(k)} \end{aligned}
i=1∑Nλivi(j)vi(k)=i=1∑Nλi(j)λi(k)vi(j)vi(k)=i=1∑N(λi(j)vi(j))(λi(k)vi(k))=(λ(j)v(j))T(λ(k)v(k))=λ(j)λ(k)[v(j)]T⋅v(k)
令
ϕ
(
x
(
j
)
)
=
λ
(
j
)
v
(
j
)
,
ϕ
(
x
(
k
)
)
=
λ
(
k
)
v
(
k
)
\phi(x^{(j)}) = \sqrt{\lambda^{(j)}}v^{(j)},\phi(x^{(k)}) = \sqrt{\lambda^{(k)}}v^{(k)}
ϕ(x(j))=λ(j)v(j),ϕ(x(k))=λ(k)v(k),则有:
κ
(
x
(
j
)
,
x
(
k
)
)
=
∑
i
=
1
N
λ
i
v
i
(
j
)
v
i
(
k
)
=
λ
(
j
)
λ
(
k
)
[
v
(
j
)
]
T
⋅
v
(
k
)
=
(
λ
(
j
)
v
(
j
)
)
T
(
λ
(
k
)
v
(
k
)
)
=
[
ϕ
(
x
(
j
)
)
]
T
ϕ
(
x
(
k
)
)
\begin{aligned} \kappa(x^{(j)},x^{(k)}) & = \sum_{i=1}^N \lambda_i v_i^{(j)}v_i^{(k)} \\ & = \sqrt{\lambda^{(j)}\lambda^{(k)}} \left[v^{(j)}\right]^T \cdot v^{(k)} \\ & = \left(\sqrt{\lambda^{(j)}}v^{(j)}\right)^T\left(\sqrt{\lambda^{(k)}}v^{(k)}\right)\\ & = \left[\phi(x^{(j)})\right]^T\phi(x^{(k)}) \\ \end{aligned}
κ(x(j),x(k))=i=1∑Nλivi(j)vi(k)=λ(j)λ(k)[v(j)]T⋅v(k)=(λ(j)v(j))T(λ(k)v(k))=[ϕ(x(j))]Tϕ(x(k))
证毕。
至此,核函数部分相关介绍结束,下一节将继续介绍概率图模型中的高斯图。
相关参考:
点积——百度百科
特征分解——百度百科
机器学习中的核函数与核方法
机器学习-核方法(3)-正定核充要条件-必要性证明