在空间中,我们定义分类的线性函数为:
g
(
x
)
=
w
T
x
+
b
g(x)=w^{T}x+b
g(x)=wTx+b
其中样本
x
=
(
x
1
,
x
2
,
.
.
.
,
x
l
)
T
x=(x_{1},x_{2},...,x_{l})^{T}
x=(x1,x2,...,xl)T,权向量
w
=
(
w
1
,
w
2
,
.
.
.
,
w
l
)
T
w=(w_{1},w_{2},...,w_{l})^{T}
w=(w1,w2,...,wl)T,偏移量是
b
b
b。
上图展示了权向量和样本的关系,由公式:
cos
<
w
,
x
i
>
=
w
⋅
x
i
∥
w
∥
∥
x
i
∥
\cos{<w,x_{i}>}=\frac{w \cdot x_{i}}{\left \| w \right \|\left \| x_{i}\right \|}
cos<w,xi>=∥w∥∥xi∥w⋅xi
可得到
x
i
x_{i}
xi到
w
w
w方向上投影为:
cos
<
w
,
x
i
>
∥
x
i
∥
=
w
⋅
x
i
∥
w
∥
\cos{<w,x_{i}>\left \| x_{i}\right \|}=\frac{w \cdot x_{i}}{\left \| w \right \|}
cos<w,xi>∥xi∥=∥w∥w⋅xi
分类的超平面(hyperplane)方程为
g
(
x
)
=
w
T
x
+
b
=
0
g(x)=\bm{w}^{T}\bm{x}+b=0
g(x)=wTx+b=0
此时所有在超平面上的样本到
w
w
w方向上的投影长度为:
∣
w
T
x
∣
∣
∣
w
∣
∣
=
∣
−
w
0
∣
∥
w
∥
\frac{|\bm{w}^{T}\bm{x}|}{||\bm{w}||}=\frac{|-\bm{w}_{0}| }{\left \| \bm{w} \right \|}
∣∣w∣∣∣wTx∣=∥w∥∣−w0∣
因为投影是垂直的,所以超平面与权向量
w
w
w垂直。
推导样本到超平面的距离:
取任意一点超平面右侧的
x
\bm{x}
x,
x
p
\bm{x}_{p}
xp为
x
\bm{x}
x在超平面上的投影,
z
z
z为投影:
w
T
x
p
+
b
=
0
z
=
w
T
(
x
−
x
p
)
∣
∣
w
∣
∣
=
w
T
x
+
b
∣
∣
w
∣
∣
\bm{w^{T}x_{p}}+b=0\\ z=\frac{\bm{w}^{T}(x-x_{p})}{||\bm{w}||}=\frac{\bm{w}^{T}\bm{x}+b}{||\bm{w}||}
wTxp+b=0z=∣∣w∣∣wT(x−xp)=∣∣w∣∣wTx+b
同理,设
x
\bm{x}
x在超平面左侧,
x
p
\bm{x}_{p}
xp为
x
\bm{x}
x在超平面上的投影,
z
z
z为投影:
z
=
w
T
(
x
p
−
x
)
∣
∣
w
∣
∣
=
−
(
w
T
x
+
b
)
∣
∣
w
∣
∣
z=\frac{\bm{w}^{T}(x_{p}-x)}{||\bm{w}||}=\frac{-(\bm{w}^{T}\bm{x}+b)}{||\bm{w}||}
z=∣∣w∣∣wT(xp−x)=∣∣w∣∣−(wTx+b)
x
\bm{x}
x到超平面的距离为:
∣
z
∣
=
∣
±
(
w
T
x
+
b
)
∣
∣
∣
w
∣
∣
=
∣
g
(
x
)
∣
∣
∣
w
∣
∣
|z|=\frac{|\pm(\bm{w}^{T}x+b)|}{||\bm{w}||}=\frac{|g(\bm{x})|}{||\bm{w}||}
∣z∣=∣∣w∣∣∣±(wTx+b)∣=∣∣w∣∣∣g(x)∣
g
(
x
)
=
w
T
x
+
b
=
∣
∣
w
∣
∣
z
g(\bm{x})=\bm{w}^{T}\bm{x}+b=||\bm{w}||z
g(x)=wTx+b=∣∣w∣∣z
线性分类函数
g
(
x
)
g(x)
g(x)刻画了样本与超平面之间的距离。
从鸢尾花数据集中挑选山鸢尾(iris-Setosa)和变色鸢尾(iris-Versicolor) 两种花的信息作为测试数据。出于可视化的原因,只考虑数据集中萼片长度(sepla length)和花瓣长度(petal length)这两个特征。
import pandas as pd
df = pd.read_csv(r'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')
df.tail()
5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa | |
---|---|---|---|---|---|
144 | 6.7 | 3.0 | 5.2 | 2.3 | Iris-virginica |
145 | 6.3 | 2.5 | 5.0 | 1.9 | Iris-virginica |
146 | 6.5 | 3.0 | 5.2 | 2.0 | Iris-virginica |
147 | 6.2 | 3.4 | 5.4 | 2.3 | Iris-virginica |
148 | 5.9 | 3.0 | 5.1 | 1.8 | Iris-virginica |
y = df.iloc[0:99, 4].values
y
import numpy as np
import matplotlib.pyplot as plt
y = np.where(y == 'Iris-setosa', -1, 1)
x = df.iloc[0: 99, [0, 2]].values
plt.scatter(x[:49, 0], x[:49, 1], color='red', marker='o', label='setosa')
plt.scatter(x[49:99, 0], x[49: 99, 1], color='blue', marker='x', label='versicolor')
plt.xlabel('petal length')
plt.ylabel('sepal length')
plt.legend(loc='upper left')
plt.show()
这里需要定义出一条线性回归线用于分类,需要再图中定义分类的线性函数:
权值向量为
w
=
[
w
0
w
1
w
2
]
w= \begin{bmatrix} w_{0}\\ w_{1}\\ w_{2} \end{bmatrix}
w=⎣⎡w0w1w2⎦⎤,
x
=
[
1
x
1
x
2
]
x= \begin{bmatrix} 1\\ x_{1}\\ x_{2} \end{bmatrix}
x=⎣⎡1x1x2⎦⎤
z
=
w
0
+
x
1
w
1
+
x
2
w
2
=
w
T
x
z=w_{0}+x_{1}w_{1}+x_{2}w_{2}=w^{T}x
z=w0+x1w1+x2w2=wTx
标签判定函数:
ϕ
(
z
)
=
{
1
,
z
≥
0
−
1
,
z
<
0
\phi (z)=\left\{\begin{matrix} 1, z\geq 0 \\ -1, z<0 \end{matrix}\right.
ϕ(z)={1,z≥0−1,z<0
误差函数:
J
(
w
)
=
1
2
n
∑
i
(
y
i
−
z
i
)
2
J(w)=\frac{1}{2n}\sum_{i}(y^{i}-z^{i})^{2}
J(w)=2n1i∑(yi−zi)2
求解偏导数:
∂
J
(
w
)
∂
w
j
=
1
2
n
∑
i
2
(
y
i
−
z
i
)
∂
(
y
i
−
z
i
)
w
j
=
−
1
n
∑
i
(
y
i
−
z
i
)
x
j
i
\frac{\partial{J(w)}}{\partial w_{j}}=\frac{1}{2n}\sum_{i}2(y^{i}-z^{i})\frac{\partial (y^{i}-z^{i})}{w_{j}}=-\frac{1}{n}\sum_{i}(y^{i}-z^{i})x^{i}_{j}
∂wj∂J(w)=2n1i∑2(yi−zi)wj∂(yi−zi)=−n1i∑(yi−zi)xji
更新权重:
w
j
:
=
w
j
−
η
∂
J
(
w
)
∂
w
j
,
j
=
0
,
1
,
2
w_{j}:=w_{j}-\eta \frac{\partial{J(w)}}{\partial w_{j}},j=0,1,2
wj:=wj−η∂wj∂J(w),j=0,1,2
采用批量梯度下降进行优化:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
class AdalineBGD:
def __init__(self, eta=0.01, n_iter=50):
self.eta = eta
self.n_iter = n_iter
def fit(self, X, y):
self.w_ = np.zeros(X.shape[1]+1)
self.J_ = []
for _ in range(self.n_iter):
z = np.dot(X, self.w_[1:]) + self.w_[0]
self.w_[0] += self.eta * (y - z).sum()
self.w_[1:] += self.eta * np.dot(X.T, (y - z))
self.J_.append(((y-z)**2).sum()*0.5)
return self
def predict(self, X):
return np.where(np.dot(X, self.w_[1:]) + self.w_[0] >= 0.0, 1, -1)
def plot_decision_regions(X, y, classifier, ax):
# setup marker generator and color map
markers = ('o', 'x')
colors = ('blue', 'red')
cmap = ListedColormap(colors[:len(np.unique(y))])
# plot the decision region
x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, 0.02), np.arange(x2_min, x2_max, 0.02))
z = classifier.predict(np.array([xx1.flatten(), xx2.flatten()]).T)
z = z.reshape(xx1.shape)
plt.contourf(xx1, xx2, z, cmap=cmap, alpha=0.3)
plt.xlim(x1_min, x1_max)
plt.ylim(x2_min, x2_max)
# plot class samples
for idx, cl in enumerate(np.unique(y)):
ax[1].scatter(x=X[y == cl, 0], y=X[y == cl, 1], cmap=cmap(idx), marker=markers[idx], label=cl, alpha=1)
ax[1].legend(loc='best')
ax[1].set_title('Adaline-梯度下降法', fontproperties='SimHei')
ax[1].set_xlabel('经标准化处理的萼片宽度', fontproperties='SimHei')
ax[1].set_ylabel('经标准化处理的花瓣宽度', fontproperties='SimHei')
df = pd.read_csv(r'http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data')
y = df.iloc[:100, 4].values
y = np.where(y == 'Iris-setosa', -1, 1)
X = df.iloc[:100, [0, 2]].values
X_std = X.copy()
X_std[:, 0] = (X[:, 0] - X[:, 0].mean()) / X[:, 0].std()
X_std[:, 1] = (X[:, 1] - X[:, 1].mean()) / X[:, 1].std()
fig1, ax1 = plt.subplots(1, 2, figsize=(8, 4))
demo = AdalineBGD(n_iter=20, eta=0.01).fit(X_std, y)
plot_decision_regions(X_std, y, demo, ax1)
ax1[0].plot(range(1, len(demo.J_)+1), np.log10(demo.J_), marker='o')
ax1[0].set_title('学习速率0.01', fontproperties='SimHei')
ax1[0].set_xlabel('迭代次数', fontproperties='SimHei')
ax1[0].set_ylabel('log(均方根误差J)', fontproperties='SimHei')
plt.show()