学习策略
非概率方式
在假设空间
H
\mathcal H
H中寻找泛化误差最小的
h
\mathcal h
h.
泛化误差:
∫
X
×
Y
L
(
y
,
h
(
x
)
)
p
(
x
,
y
)
d
x
d
y
.
\int_{\mathcal X \times \mathcal Y} L(y, h(\boldsymbol x))p(\boldsymbol x, y)d\boldsymbol xdy .
∫X×YL(y,h(x))p(x,y)dxdy.
即损失函数值的期望,认为模式与标签服从某个联合概率分布. 因为有些时候模式可能具有不同标签, 这是随机导致的.
测试集上的误差随测试集容量增大依概率收敛到泛化误差.
概率方式
判别方法:直接学习
p
(
y
∣
x
)
p(y|\boldsymbol x)
p(y∣x)
生成方法:通过
p
(
y
∣
x
)
=
p
(
y
,
x
)
p
(
x
)
=
p
(
x
∣
y
)
p
(
y
)
p
(
x
)
p(y|\boldsymbol x) = \frac{p(y, \boldsymbol x)}{p(\boldsymbol x)}= \frac{p(\boldsymbol x|y)p(y)}{p(\boldsymbol x)}
p(y∣x)=p(x)p(y,x)=p(x)p(x∣y)p(y)学习条件分布.
在给定样本集上:
\quad
最大后验概率MAP: 使
p
(
y
∣
x
)
∝
p
(
x
∣
y
)
p
(
y
)
p(y|\boldsymbol x) \propto p(\boldsymbol x|y)p(y)
p(y∣x)∝p(x∣y)p(y)最大, 需要知道
y
y
y的分布
\quad
最大似然估计MLE:使
p
(
x
∣
y
)
p(\boldsymbol x|y)
p(x∣y)最大
具体使用
l
o
g
log
log似然函数法即可.
命名:
p
o
s
t
e
r
i
o
r
=
l
i
k
e
l
i
h
o
o
d
⋅
p
r
i
o
r
e
v
i
d
e
n
c
e
posterior = \frac{likelihood \cdot prior}{evidence}
posterior=evidencelikelihood⋅prior
代数知识
若干背景
-
∣
A
B
∣
=
∣
A
∣
∣
B
∣
|AB|=|A||B|
∣AB∣=∣A∣∣B∣
证: 由laplace定理作k行展开可得 ∣ A 0 − I B ∣ = ∣ A ∣ ∣ B ∣ \begin{vmatrix} A & \boldsymbol 0 \\ -I & B \end{vmatrix}=|A||B| A−I0B =∣A∣∣B∣, 又因倍加变换不改变行列式的值, 对此行列式作行倍加变换得 ∣ A 0 − I B ∣ = ∣ 0 A B − I B ∣ = ∣ A B ∣ ( − 1 ) Σ i = 1 2 n i ( − 1 ) n = ∣ A B ∣ ( − 1 ) 2 n 2 + 2 n = ∣ A B ∣ \begin{vmatrix} A & \boldsymbol 0 \\ -I & B \end{vmatrix}=\begin{vmatrix} \boldsymbol 0 & AB \\ -I & B \end{vmatrix}=|AB|{(-1)}^{\Sigma_{i=1}^{2n}i}{(-1)}^n=|AB|{(-1)}^{2n^2+2n}=|AB| A−I0B = 0−IABB =∣AB∣(−1)Σi=12ni(−1)n=∣AB∣(−1)2n2+2n=∣AB∣, 得证. -
r
a
n
k
(
A
)
+
r
a
n
k
(
B
)
−
n
≤
r
a
n
k
(
A
B
)
≤
m
i
n
{
r
a
n
k
(
A
)
,
r
a
n
k
(
B
)
}
rank(A)+rank(B)-n\le rank(AB)\le min\{rank(A), rank(B)\}
rank(A)+rank(B)−n≤rank(AB)≤min{rank(A),rank(B)}
证: 第二个不等式: A B = ( α 1 , ⋯ , α n ) B = ( Σ b i 1 α i , ⋯ , Σ b i n α i ) AB=(\alpha_1, \cdots, \alpha_n)B=(\Sigma b_{i1}\alpha_i, \cdots, \Sigma b_{in}\alpha_i) AB=(α1,⋯,αn)B=(Σbi1αi,⋯,Σbinαi), 所得矩阵的列向量组可由 ( α 1 , ⋯ , α n ) (\alpha_1, \cdots, \alpha_n) (α1,⋯,αn)的极大线性无关组线性表出, 因此 r a n k ( A B ) ≤ r a n k ( A ) rank(AB) \le rank(A) rank(AB)≤rank(A), 类似可得 r a n k ( A B ) = r a n k ( B T A T ) ≤ r a n k ( B T ) = r a n k ( B ) rank(AB)=rank(B^TA^T) \le rank(B^T)=rank(B) rank(AB)=rank(BTAT)≤rank(BT)=rank(B). 第一个不等式称作Sylvester不等式, 是Frobenius不等式 r a n k ( A ) + r a n k ( B ) − r a n k ( C ) ≤ r a n k ( A C B ) rank(A)+rank(B)-rank(C)\le rank(ACB) rank(A)+rank(B)−rank(C)≤rank(ACB)的特例, r a n k ( A B ) + n = r a n k ( I n 0 0 A B ) rank(AB)+n=rank \begin{pmatrix} I_n & \boldsymbol 0 \\ \boldsymbol 0 & AB \end{pmatrix} rank(AB)+n=rank(In00AB)(由矩阵各自化成阶梯型时非零行个数可证), 对矩阵 P = ( I n 0 0 A B ) P=\begin{pmatrix} I_n & \boldsymbol 0 \\ \boldsymbol 0 & AB \end{pmatrix} P=(In00AB)作倍加、倍乘及换行(列)变换得 ( B I n 0 A ) \begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix} (B0InA), 而 r a n k ( B I n 0 A ) ≥ r a n k ( A ) + r a n k ( B ) rank\begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix} \ge rank(A)+rank(B) rank(B0InA)≥rank(A)+rank(B), 因为 ( B I n 0 A ) \begin{pmatrix} B & I_n \\ \boldsymbol 0 & A \end{pmatrix} (B0InA)起码能找到 r a n k ( A ) + r a n k ( B ) rank(A)+rank(B) rank(A)+rank(B)阶子式不为0. - r a n k ( A ) + d i m ( n u l l ( A ) ) = n rank(A)+dim(null(A))=n rank(A)+dim(null(A))=n
-
r
a
n
k
(
A
)
=
r
a
n
k
(
A
T
A
)
=
r
a
n
k
(
A
A
T
)
rank(A)=rank(A^TA)=rank(AA^T)
rank(A)=rank(ATA)=rank(AAT)
证: 齐次线性方程组的解有三种情况:无解、唯一零解与无穷解. 但是无穷解之间又有一定区别, 例如他们的基础解系所形成解空间(又称系数矩阵的零空间: n u l l ( A ) null(A) null(A))的维度. A x = 0 Ax=\boldsymbol 0 Ax=0与 A T A x = 0 A^TAx=\boldsymbol 0 ATAx=0同解, 具有完全相同的解空间, 因此第一个等式成立. 证明 A T A x = 0 A^TAx=\boldsymbol 0 ATAx=0的解也是 A x = 0 Ax=\boldsymbol 0 Ax=0的解可通过对 A T A x = 0 A^TAx=\boldsymbol 0 ATAx=0两侧同时乘 x T x^T xT实现. r a n k ( A ) = r a n k ( A T ) = r a n k ( A A T ) rank(A)=rank(A^T)=rank(AA^T) rank(A)=rank(AT)=rank(AAT), 因此第二个等式成立. -
若
B
可逆
,
则
r
a
n
k
(
A
B
)
=
r
a
n
k
(
A
)
若B可逆, 则rank(AB)=rank(A)
若B可逆,则rank(AB)=rank(A)
证: 由以上可得 r a n k ( A B ) ≤ r a n k ( A ) rank(AB) \le rank(A) rank(AB)≤rank(A), 又 A = A B B − 1 A=ABB^{-1} A=ABB−1, 故 r a n k ( A ) = r a n k ( A B B − 1 ) ≤ r a n k ( A B ) rank(A)=rank(ABB^{-1}) \le rank(AB) rank(A)=rank(ABB−1)≤rank(AB). 得证. - A = U Σ V T A=U\Sigma V^T A=UΣVT, V H 表示共轭转置 V^H表示共轭转置 VH表示共轭转置
奇异值分解用于图像压缩
本例中压缩率为36%
import numpy as np
import matplotlib.pyplot as plt
import keras
mnist = keras.datasets.mnist
(x_train, y_train), (x_valid, y_valid) = mnist.load_data()
A = x_train[0]
u,s,vh = np.linalg.svd(A)
B = np.zeros(shape=A.shape)
C = np.matmul(u, np.diag(s))
C = np.matmul(C, vh)
print(u[:,0].shape)
print(vh[0,:].shape)
for i in range(5):
B += s[i]*np.matmul(np.expand_dims(u[:,i],1), np.expand_dims(vh[i,:],0))
plt.subplot(131)
plt.title('original')
plt.imshow(A)
plt.subplot(132)
plt.title('partial')
plt.imshow(B)
plt.subplot(133)
plt.title('complete')
plt.imshow(C)
plt.show()
print(np.linalg.eigvals(A-B).max()) #矩阵2范数
决策树
信息增益
#include <iostream>
#include <string>
#include <vector>
#include <cmath>
using namespace std;
enum{ color, root, sound, texture, umbilical, touch };
class Object
{
protected:
vector<int> _feature;
bool _label;
public:
int num_feature;
const vector<int>* feature;
const bool* label;
};
class Cucumber : public Object
{
public:
Cucumber(vector<int> feature, bool label)
{
this->_feature = feature;
this->_label = label;
num_feature = feature.size() - 1;
this->feature = &_feature;
this->label = &_label;
}
};
class Entropy
{
public:
static double calculate(vector<Cucumber> v)
{
try {
if (!v.size()) {
throw "no object";
}
int positive = 0, negative = 0;
for (auto i : v) {
if ((*i.label)) {
positive++;
}
else {
negative++;
}
}
int total = v.size();
double w1 = positive * 1.0 / total, w2 = negative * 1.0 / total;
if (0 == w1) return -w2 * log2(w2);
if (0 == w2) return -w1 * log2(w1);
return -w1 * log2(w1) - w2 * log2(w2);
}
catch(exception e) {
cout << e.what();
}
}
};
class Classify
{
public:
static vector<vector<Cucumber>> do_classify(vector<Cucumber> v, int rank_feature)
{
try
{
if (!v.size()) {
throw "no object";
}
if (v[0].num_feature < rank_feature) {
throw "rank is out of range";
}
vector<vector<Cucumber>> res;
for (auto i : v) {
while (res.size() < (*i.feature)[rank_feature]+1){
res.push_back(*new vector<Cucumber > ());
}
res[(*i.feature)[rank_feature]].push_back(i);
}
return res;
}
catch (const std::exception& e)
{
cout << e.what();
}
}
};
class InformationGain
{
public:
static double calculate(vector<vector<Cucumber>> v, int total, double entropy_s)
{
double res = 0;
for (auto i : v) {
res += double(i.size()) * Entropy::calculate(i) / total;
}
return entropy_s - res;
}
};
int main()
{
vector<Cucumber> v;
v.push_back(*new Cucumber({ 1, 0, 0, 0, 0, 0, 0 }, true));
v.push_back(*new Cucumber({ 2, 1, 0, 1, 0, 0, 0 }, true));
v.push_back(*new Cucumber({ 3, 1, 0, 0, 0, 0, 0 }, true));
v.push_back(*new Cucumber({ 4, 0, 0, 1, 0, 0, 0 }, true));
v.push_back(*new Cucumber({ 5, 2, 0, 0, 0, 0, 0 }, true));
v.push_back(*new Cucumber({ 6, 0, 1, 0, 0, 1, 1 }, true));
v.push_back(*new Cucumber({ 7, 1, 1, 0, 1, 1, 1 }, true));
v.push_back(*new Cucumber({ 8, 1, 1, 0, 0, 1, 0 }, true));
v.push_back(*new Cucumber({ 9, 1, 1, 1, 1, 1, 0 }, false));
v.push_back(*new Cucumber({ 10, 0, 2, 2, 0, 2, 1 }, false));
v.push_back(*new Cucumber({ 11, 2, 2, 2, 2, 2, 0 }, false));
v.push_back(*new Cucumber({ 12, 2, 0, 0, 2, 2, 1 }, false));
v.push_back(*new Cucumber({ 13, 0, 1, 0, 1, 0, 0 }, false));
v.push_back(*new Cucumber({ 14, 2, 1, 1, 1, 0, 0 }, false));
v.push_back(*new Cucumber({ 15, 1, 1, 0, 0, 1, 1 }, false));
v.push_back(*new Cucumber({ 16, 2, 0, 0, 2, 2, 0 }, false));
v.push_back(*new Cucumber({ 17, 0, 0, 1, 1, 1, 0 }, false));
double entropy_s = Entropy::calculate(v);
for (int i = 1; i < 7; i++) {
vector<vector<Cucumber>> tmp = Classify::do_classify(v, i);
double ig = InformationGain::calculate(tmp, 17, entropy_s);
printf("%d : %.3f\n", i, ig);
}
cin.get();
}