前言
本文为9月17日计算机视觉基础学习笔记——认识机器学习,分为两个章节:
- Week 2 homework;
- 机器学习。
一、Week 2 homework
生成10张图片,对应数字 0-9。对这10张图片提取特征x,用一个判别器f(x)来决策输出结果y。
import torch
def generate_data():
'''
本函数生成0-9,10个数字的图片矩阵
'''
image_data=[]
num_0 = torch.tensor(
[[0,0,1,1,0,0],
[0,1,0,0,1,0],
[0,1,0,0,1,0],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_0)
num_1 = torch.tensor(
[[0,0,0,1,0,0],
[0,0,1,1,0,0],
[0,0,0,1,0,0],
[0,0,0,1,0,0],
[0,0,1,1,1,0],
[0,0,0,0,0,0]])
image_data.append(num_1)
num_2 = torch.tensor(
[[0,0,1,1,0,0],
[0,1,0,0,1,0],
[0,0,0,1,0,0],
[0,0,1,0,0,0],
[0,1,1,1,1,0],
[0,0,0,0,0,0]])
image_data.append(num_2)
num_3 = torch.tensor(
[[0,0,1,1,0,0],
[0,0,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_3)
num_4 = torch.tensor(
[
[0,0,0,0,1,0],
[0,0,0,1,1,0],
[0,0,1,0,1,0],
[0,1,1,1,1,1],
[0,0,0,0,1,0],
[0,0,0,0,0,0]])
image_data.append(num_4)
num_5 = torch.tensor(
[
[0,1,1,1,0,0],
[0,1,0,0,0,0],
[0,1,1,1,0,0],
[0,0,0,0,1,0],
[0,1,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_5)
num_6 = torch.tensor(
[[0,0,1,1,0,0],
[0,1,0,0,0,0],
[0,1,1,1,0,0],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_6)
num_7 = torch.tensor(
[
[0,1,1,1,1,0],
[0,0,0,0,1,0],
[0,0,0,1,0,0],
[0,0,0,1,0,0],
[0,0,0,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_7)
num_8 = torch.tensor(
[[0,0,1,1,0,0],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_8)
num_9 = torch.tensor(
[[0,0,1,1,1,0],
[0,1,0,0,1,0],
[0,0,1,1,1,0],
[0,0,0,0,1,0],
[0,0,0,0,1,0],
[0,0,0,0,0,0]])
image_data.append(num_9)
return image_data
import matplotlib.pyplot as plt
image_data = generate_data()
print(image_data[0])
>>> tensor([[0, 0, 1, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 1, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0]])
# 方法:提取特征
def get_feature(x):
return torch.sum(x, 0)
print(get_feature(image_data[2]))
print(get_feature(image_data[8]))
>>> tensor([0, 2, 3, 3, 2, 0])
>>> tensor([0, 2, 3, 3, 2, 0])
i = 0
def model(x, image_data):
y = -1
for i in range(0, 10):
diff_tmp = get_feature(x) - get_feature(image_data[i])
if torch.sum(torch.abs(diff_tmp)) == 0:
y = i
break
print("{} --识别为--> {}".format(x, y))
return y
model(image_data[8], image_data)
>>> tensor([[0, 0, 1, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 1, 0, 0],
[0, 1, 0, 0, 1, 0],
[0, 0, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0]]) --识别为--> 2
二、机器学习
1、线性回归模型
- 特征: V : [ v 1 , v 2 , … , v 6 ] T \textbf{V}: [v_1, v_2, …, v_6]^T V:[v1,v2,…,v6]T;
- 判别式: y = W ∗ V = [ ω 1 , ω 2 , … , ω 6 ] [ v 1 , v 2 , … , v 6 ] T y = \textbf{W} * \textbf{V} = [\omega_1, \omega_2, …, \omega_6]\ [v_1, v_2, …, v_6]^T y=W∗V=[ω1,ω2,…,ω6] [v1,v2,…,v6]T.
- Loss: ∑ i ( y ^ i − y i ) 2 \sum_{i}(\hat{y}_i - y_i )^2 ∑i(y^i−yi)2.
- 求 loss 的最小值对应的 ω \omega ω: ∂ ∑ i ( y ^ i − y i ) 2 ∂ w j = 0 \partial \frac{\sum_{i}(\hat{y}_i - y_i )^2}{\partial w_j} =0 ∂∂wj∑i(y^i−yi)2=0.
代码如下:
import torch
from itertools import product
def generate_data():
'''
本函数生成0-9,10个数字的图片矩阵
:return: image_data,image_label
'''
image_data=[]
num_0 = torch.tensor(
[[0,0,1,1,0,0],
[0,1,0,0,1,0],
[0,1,0,0,1,0],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_0)
num_1 = torch.tensor(
[[0,0,0,1,0,0],
[0,0,1,1,0,0],
[0,0,0,1,0,0],
[0,0,0,1,0,0],
[0,0,1,1,1,0],
[0,0,0,0,0,0]])
image_data.append(num_1)
num_2 = torch.tensor(
[[0,0,1,1,0,0],
[0,1,0,0,1,0],
[0,0,0,1,0,0],
[0,0,1,0,0,0],
[0,1,1,1,1,0],
[0,0,0,0,0,0]])
image_data.append(num_2)
num_3 = torch.tensor(
[[0,0,1,1,0,0],
[0,0,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_3)
num_4 = torch.tensor(
[
[0,0,0,0,1,0],
[0,0,0,1,1,0],
[0,0,1,0,1,0],
[0,1,1,1,1,1],
[0,0,0,0,1,0],
[0,0,0,0,0,0]])
image_data.append(num_4)
num_5 = torch.tensor(
[
[0,1,1,1,0,0],
[0,1,0,0,0,0],
[0,1,1,1,0,0],
[0,0,0,0,1,0],
[0,1,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_5)
num_6 = torch.tensor(
[[0,0,1,1,0,0],
[0,1,0,0,0,0],
[0,1,1,1,0,0],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_6)
num_7 = torch.tensor(
[
[0,1,1,1,1,0],
[0,0,0,0,1,0],
[0,0,0,1,0,0],
[0,0,0,1,0,0],
[0,0,0,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_7)
num_8 = torch.tensor(
[[0,0,1,1,0,0],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[0,1,0,0,1,0],
[0,0,1,1,0,0],
[0,0,0,0,0,0]])
image_data.append(num_8)
num_9 = torch.tensor(
[[0,0,1,1,1,0],
[0,1,0,0,1,0],
[0,0,1,1,1,0],
[0,1,0,0,1,0],
[0,0,0,0,1,0],
[0,0,0,0,0,0]])
image_data.append(num_9)
image_label=[0,1,2,3,4,5,6,7,8,9]
return image_data,image_label
# 方法:提取特征
def get_feature(x):
feature = [0, 0, 0, 0]
# 提取图像x的特征 feature 的代码
def get_shadow(x, dim):
print("提取特征,x 的尺寸:", x.size())
feature = torch.sum(x, dim)
feature = feature.float()
print("提取特征,feature 的尺寸:", feature.size())
# 归一化
for i in range(feature.shape[0]):
feature[i] = feature[i] / sum(feature)
feature = feature.view(1, 6)
return feature
feature = get_shadow(x, 0)
return feature
def model(feature, weights):
y = -1
'''
# 下面添加对feature进行决策的代码,
判定出feature 属于[0,1,2,3,...9]哪个类别
'''
feature = torch.cat((feature, torch.tensor(1.0).view(1, 1)), 1)
y = feature.mm(weights)
return y
def train_model(image_data, image_label, weights):
for epoch in range(3000):
loss = 0
for i in range(len(image_data)):
feature = get_feature(image_data[i])
y = model(feature, weights)
# 计算 yhat 和期望 iamge_label 的差距
loss += 0.5 * (y.item() - image_label[i])**2 # .item()返回的是一个浮点型数据,精度更高
# 更新权重 w
# w = w - lr * (y - y1) * x
feature = feature.view(6)
lr = -0.05
weights[0, 0] = weights[0, 0] + (y.item() - image_label[i]) * feature[0] * lr
weights[1, 0] = weights[1, 0] + (y.item() - image_label[i]) * feature[1] * lr
weights[2, 0] = weights[2, 0] + (y.item() - image_label[i]) * feature[2] * lr
weights[3, 0] = weights[3, 0] + (y.item() - image_label[i]) * feature[3] * lr
weights[4, 0] = weights[4, 0] + (y.item() - image_label[i]) * feature[4] * lr
weights[5, 0] = weights[5, 0] + (y.item() - image_label[i]) * feature[5] * lr
weights[6, 0] = weights[6, 0] + (y.item() - image_label[i]) * lr
loss = 0
return weights
if __name__ == '__main__':
# 生成 w0
weights = torch.rand(7, 1)
image_data, image_label = generate_data()
# 打印出 0 的图像
print("数字 0 的图片是:", image_data[0])
print("-" * 20)
# 打印出8的图像
print("数字 8 的图片是:", image_data[8])
print("-" * 20)
# 训练模型
weights = train_model(image_data, image_label, weights)
# 测试:识别每张图片
print("对每张图片进行识别")
for i in range(6):
x = image_data[i]
feature = get_feature(x)
y = model(feature, weights)
print("图像{}的分类结果是:{},它的特征是:{}".format(i, y, feature))
2、逻辑回归模型
-
非0即1;
-
线性模型取对数。
-
Sigmoid 函数:
σ ( a ) = 1 1 + e − a \sigma(a) = \frac{1}{1+e^{-a}} σ(a)=1+e−a1
其导数为: ∂ σ ( a ) ∂ a = σ ( a ) ( 1 − σ ( a ) ) \frac{\partial \sigma (a)}{\partial a} = \sigma (a)(1-\sigma (a)) ∂a∂σ(a)=σ(a)(1−σ(a)).
- 特征:
ϕ
(
x
)
\phi(x)
ϕ(x),那么逻辑回归将这个例子属于第1类的概率建模为:
p ( t = 1 ∣ x ; w ) = σ ( w T ϕ ( x ) ) p(t=1 | \textbf{x}; \textbf{w}) = \sigma(\textbf{w}^T \phi(\textbf{x})) p(t=1∣x;w)=σ(wTϕ(x))
并将属于0类的例子的概率定义为:
p ( t = 0 ∣ x ; w ) = 1 − p ( t = 1 ∣ x ; w ) = 1 − σ ( w T ϕ ( x ) ) p(t=0 | \textbf{x}; \textbf{w}) = 1 - p(t=1 | \textbf{x}; \textbf{w}) = 1 - \sigma(\textbf{w}^T \phi(\textbf{x})) p(t=0∣x;w)=1−p(t=1∣x;w)=1−σ(wTϕ(x))
- 极大似然估计(Maximum Likelihood Estimate): 当前事件发生概率最大时,对应的概率密度函数中的参数:
L ( w ) = ∏ i = 1 N p ( t ( i ) ∣ x ( i ) ; w ) p ( t ( i ) ∣ x ( i ) ; w ) = p ( t = 1 ∣ x ( i ) ; w ) t ( i ) p ( t = 0 ∣ x ( i ) ; w ) 1 − t ( i ) L(\textbf{w} ) = \prod_{i=1}^{N} p(t^{(i)} | \textbf{x}^{(i)}; \textbf{w})\\ p(t^{(i)} | \textbf{x}^{(i)}; \textbf{w}) = p(t=1 | \textbf{x}^{(i)}; \textbf{w})^{t^{(i)}}\ p(t=0 | \textbf{x}^{(i)}; \textbf{w})^{1 - t^{(i)}} L(w)=i=1∏Np(t(i)∣x(i);w)p(t(i)∣x(i);w)=p(t=1∣x(i);w)t(i) p(t=0∣x(i);w)1−t(i)
试图找到最大化似然 L ( w ) L(\textbf{w}) L(w)的 w \textbf{w} w,即找到最大化对数似然 l ( w ) = l o g L ( w ) l(\textbf{w}) = log L(\textbf{w}) l(w)=logL(w) 的 w \textbf{w} w,为简化推导,设 ϕ ( x ) = x \phi(\textbf{x}) = \textbf{x} ϕ(x)=x:
a r g m a x l ( w ) = a r g m a x l o g ∏ i = 1 N p ( t ( i ) ∣ x ( i ) ; w ) = a r g m a x ∑ i = 1 N l o g p ( t ( i ) ∣ x ( i ) ; w ) = a r g m a x ∑ i = 1 N l o g [ p ( t = 1 ∣ x ( i ) ; w ) t ( i ) p ( t = 0 ∣ x ( i ) ; w ) 1 − t ( i ) ] = a r g m a x ∑ i = 1 N l o g [ p ( t = 1 ∣ x ( i ) ; w ) t ( i ) ] + l o g [ p ( t = 0 ∣ x ( i ) ; w ) 1 − t ( i ) ] ] = a r g m a x ∑ i = 1 N t ( i ) l o g [ p ( t = 1 ∣ x ( i ) ; w ) ] + ( 1 − t ( i ) ) l o g [ p ( t = 0 ∣ x ( i ) ; w ) ] = a r g m a x ∑ i = 1 N t ( i ) l o g [ σ ( w T x ( i ) ) ] + ( 1 − t ( i ) ) l o g [ 1 − σ ( w T x ( i ) ) ] arg\ max\ l(\textbf{w}) = arg\ max\ log\prod_{i=1}^{N} p(t^{(i)} | \textbf{x}^{(i)}; \textbf{w})\\ = arg\ max\ \sum_{i=1}^{N} log\ p(t^{(i)} | \textbf{x}^{(i)}; \textbf{w})\\ = arg\ max\ \sum_{i=1}^{N} log\ [p(t=1 | \textbf{x}^{(i)}; \textbf{w})^{t^{(i)}}\ p(t=0 | \textbf{x}^{(i)}; \textbf{w})^{1 - t^{(i)}}]\\ = arg\ max\ \sum_{i=1}^{N} log\ [p(t=1 | \textbf{x}^{(i)}; \textbf{w})^{t^{(i)}}] + log\ [p(t=0 | \textbf{x}^{(i)}; \textbf{w})^{1 - t^{(i)}}]]\\ = arg\ max\ \sum_{i=1}^{N}\ t^{(i)}\ log\ [p(t=1 | \textbf{x}^{(i)}; \textbf{w})] + (1 - t^{(i)})\ log\ [p(t=0 | \textbf{x}^{(i)}; \textbf{w})]\\ = arg\ max\ \sum_{i=1}^{N}\ t^{(i)}\ log\ [\sigma ( \textbf{w}^T\textbf{x}^{(i)})] + (1 - t^{(i)})\ log\ [1 - \sigma (\textbf{w}^T \textbf{x}^{(i)})] arg max l(w)=arg max logi=1∏Np(t(i)∣x(i);w)=arg max i=1∑Nlog p(t(i)∣x(i);w)=arg max i=1∑Nlog [p(t=1∣x(i);w)t(i) p(t=0∣x(i);w)1−t(i)]=arg max i=1∑Nlog [p(t=1∣x(i);w)t(i)]+log [p(t=0∣x(i);w)1−t(i)]]=arg max i=1∑N t(i) log [p(t=1∣x(i);w)]+(1−t(i)) log [p(t=0∣x(i);w)]=arg max i=1∑N t(i) log [σ(wTx(i))]+(1−t(i)) log [1−σ(wTx(i))]
为找到使该表达式最大的 w \textbf{w} w,求其对 w \textbf{w} w 的导数:
▽ w l ( w ) = ▽ w ∑ i = 1 N t ( i ) l o g [ σ ( w T x i ) ] + ( 1 − t ( i ) ) l o g [ 1 − σ ( w T x ( i ) ) ] = ∑ i = 1 N t ( i ) ( 1 σ ( w T x i ) ) × ( σ ( w T x i ) ( 1 − σ ( w T x i ) ) ) × x ( i ) + ( 1 − t ( i ) ) ( 1 1 − σ ( w T x i ) ) × ( − 1 ) × ( σ ( w T x i ) ( 1 − σ ( w T x i ) ) ) × x i = ∑ i = 1 N t ( i ) ( 1 − σ ( w T x ( i ) ) ) − ( 1 − t ( i ) ) ( − 1 ) ( σ ( w T x ( i ) ) x ( i ) = ∑ i = 1 N t ( i ) x ( i ) − σ ( w T x ( i ) ) x ( i ) \bigtriangledown_{\textbf{w}} l(\textbf{w}) = \bigtriangledown_{\textbf{w}} \sum_{i=1}^{N}\ t^{(i)}\ log\ [\sigma ( \textbf{w}^T\textbf{x}^{i})] + (1 - t^{(i)})\ log\ [1 - \sigma (\textbf{w}^T \textbf{x}^{(i)})]\\ = \sum_{i=1}^{N}\ t^{(i)}\ (\frac{1}{\sigma ( \textbf{w}^T\textbf{x}^{i})} )\times (\sigma ( \textbf{w}^T\textbf{x}^{i})(1 - \sigma ( \textbf{w}^T\textbf{x}^{i})))\times \textbf{x}^{(i)} + (1 - t^{(i)}) (\frac{1}{1 - \sigma ( \textbf{w}^T\textbf{x}^{i})}) \times (-1) \times (\sigma ( \textbf{w}^T\textbf{x}^{i})(1 - \sigma ( \textbf{w}^T\textbf{x}^{i})))\times \textbf{x}^{i}\\ = \sum_{i=1}^{N}\ t^{(i)}\ (1 - \sigma(\textbf{w}^T \textbf{x}^{(i)})) - (1 - t^{(i)})(-1)(\sigma(\textbf{w}^T \textbf{x}^{(i)}) \textbf{x}^{(i)} \\ = \sum_{i=1}^{N} t^{(i)} \textbf{x}^{(i)} - \sigma (\textbf{w}^T \textbf{x}^{(i)}) \textbf{x}^{(i)} ▽wl(w)=▽wi=1∑N t(i) log [σ(wTxi)]+(1−t(i)) log [1−σ(wTx(i))]=i=1∑N t(i) (σ(wTxi)1)×(σ(wTxi)(1−σ(wTxi)))×x(i)+(1−t(i))(1−σ(wTxi)1)×(−1)×(σ(wTxi)(1−σ(wTxi)))×xi=i=1∑N t(i) (1−σ(wTx(i)))−(1−t(i))(−1)(σ(wTx(i))x(i)=i=1∑Nt(i)x(i)−σ(wTx(i))x(i)
得到:
▽ w l ( w ) = ∑ i = 1 N ( t ( i ) − σ ( w T x ( i ) ) ) x ( i ) \bigtriangledown_{\textbf{w}} l(\textbf{w}) = \sum_{i=1}^{N} (t^{(i)} - \sigma (\textbf{w}^T \textbf{x}^{(i)})) \textbf{x}^{(i)} ▽wl(w)=i=1∑N(t(i)−σ(wTx(i)))x(i)
可见,无法使 ▽ w l ( w ) = 0 \bigtriangledown_{\textbf{w}} l(\textbf{w}) = 0 ▽wl(w)=0 并求得 w \textbf{w} w,因为 t ( i ) t^{(i)} t(i) 为 0 或 1,而 sigmoid 函数不可能为 0 或 1。因此只能用对似然函数进行梯度上升计算 w \textbf{w} w。