在训练的第二部分,我们将要通过加入正则项提升逻辑回归算法。简而言之,正则化是成本函数中的一个术语,它使算法更倾向于“更简单”的模型(在这种情况下,模型将更小的系数)。这个理论助于减少过拟合,提高模型的泛化能力。这样,我们开始吧。
设想你是工厂的生产主管,你有一些芯片在两次测试中的测试结果。对于这两次测试,你想决定是否芯片要被接受或抛弃。为了帮助你做出艰难的决定,你拥有过去芯片的测试数据集,从其中你可以构建一个逻辑回归模型
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt
data2 = pd.read_csv('D:\yuxin\data_sets\ex2data2.txt', names=['Test 1', 'Test 2', 'Accepted'])
data2.head()
Test 1 | Test 2 | Accepted | |
---|---|---|---|
0 | 0.051267 | 0.69956 | 1 |
1 | -0.092742 | 0.68494 | 1 |
2 | -0.213710 | 0.69225 | 1 |
3 | -0.375000 | 0.50219 | 1 |
4 | -0.513250 | 0.46564 | 1 |
X = data2.iloc[:,:-1]
y = data2.iloc[:,-1]
# 先将X,y变成矩阵形式才能作图
X =X.values
y = y.values
X.shape,y.shape
((118, 2), (118,))
#作图看看数据形状
plt.figure()
fig, ax = plt.subplots(figsize=(8,5))
ax.scatter(X[y==1,0],X[y==1,1],c='b',marker='o',label='Accepted')
ax.scatter(X[y==0,0],X[y==0,1],c='r',marker='x',label='Rejected')
plt.grid()
plt.legend()
plt.show()
注意到其中的正负两类数据并没有线性的决策界限。因此直接用logistic回归在这个数据集上并不能表现良好,因为它只能用来寻找一个线性的决策边界。
所以接下会提到一个新的方法。
Feature mapping
一个拟合数据的更好的方法是从每个数据点创建更多的特征。
我们将把这些特征映射到所有的x1和x2的多项式项上,直到第六次幂。
def feature_mapping(x1, x2, power):
data = {}
for i in np.arange(power + 1):
for p in np.arange(i + 1):
data["f{}{}".format(i - p, p)] = np.power(x1, i - p) * np.power(x2, p)
return pd.DataFrame(data)
X_new = feature_mapping(X[:,0],X[:,1],6)
X_new = X_new.values
经过映射,我们将有两个特征的向量转化成了一个28维的向量。
在这个高维特征向量上训练的logistic回归分类器将会有一个更复杂的决策边界,当我们在二维图中绘制时,会出现非线性。
虽然特征映射允许我们构建一个更有表现力的分类器,但它也更容易过拟合。在接下来的练习中,我们将实现正则化的logistic回归来拟合数据,并且可以看到正则化如何帮助解决过拟合的问题。
def sigmod(z):
s = 1/(1+np.exp(-z))
return s
def costReg(theta,X,y,b):
first = np.sum((-y)*np.log(sigmod(X@theta))-(1-y)*np.log(1-sigmod(X@theta)))
second = (b/2)*np.sum(np.power(theta[1:],2))
return (1/len(X))*(first + second)
theta0 = np.zeros(X_new.shape[1])
X_new.shape,theta0.shape,y.shape
((118, 28), (28,), (118,))
costReg(new_theta,X_new,y,b=1)
0.6812912508272707
def gradient(theta,X,y,b):
first = (1/len(X))*(X.T @(sigmod(X@theta)-y))
second = (b/len(X))*theta
second[0] = 0
return first+second
gradient(theta0,X_new,y,b=1)
array([8.47457627e-03, 1.87880932e-02, 7.77711864e-05, 5.03446395e-02,
1.15013308e-02, 3.76648474e-02, 1.83559872e-02, 7.32393391e-03,
8.19244468e-03, 2.34764889e-02, 3.93486234e-02, 2.23923907e-03,
1.28600503e-02, 3.09593720e-03, 3.93028171e-02, 1.99707467e-02,
4.32983232e-03, 3.38643902e-03, 5.83822078e-03, 4.47629067e-03,
3.10079849e-02, 3.10312442e-02, 1.09740238e-03, 6.31570797e-03,
4.08503006e-04, 7.26504316e-03, 1.37646175e-03, 3.87936363e-02])
# 使用高级函数
b = 1
res = opt.minimize(fun=costReg, x0=theta0, args=(X_new, y,1), method='TNC', jac=gradient)
res
fun: 0.529002729964495
jac: array([-2.15061763e-06, 6.79397579e-07, -3.48884600e-07, 8.75991550e-07,
-4.08021607e-08, -9.33530867e-07, -5.14502133e-07, 1.70832730e-08,
1.54389530e-08, -9.72648414e-07, 6.96376475e-08, 3.55042094e-08,
-2.79579871e-07, 1.79653372e-07, 2.33231029e-07, 1.47166354e-07,
-2.11930372e-07, 6.16760286e-07, -9.27456537e-08, -5.27838491e-08,
-1.48156604e-06, 2.31246000e-07, 1.80345150e-07, -1.31591101e-07,
-7.17433602e-08, -4.12247296e-07, 1.65651868e-08, -7.34831726e-07])
message: 'Converged (|f_n-f_(n-1)| ~= 0)'
nfev: 32
nit: 7
status: 1
success: True
x: array([ 1.27271027, 0.62529965, 1.18111686, -2.01987399, -0.91743189,
-1.43166928, 0.12393227, -0.36553118, -0.35725404, -0.17516292,
-1.45817009, -0.05098418, -0.61558555, -0.27469166, -1.19271299,
-0.24217841, -0.206033 , -0.04466178, -0.27778949, -0.29539514,
-0.45645981, -1.04319155, 0.02779373, -0.29244867, 0.0155576 ,
-0.32742405, -0.1438915 , -0.92467487])
result2 = opt.fmin_tnc(func=costReg, x0=theta0, fprime=gradient, args=(X_new, y, 1))
result2[0]
array([ 1.27271027, 0.62529965, 1.18111686, -2.01987399, -0.91743189,
-1.43166928, 0.12393227, -0.36553118, -0.35725404, -0.17516292,
-1.45817009, -0.05098418, -0.61558555, -0.27469166, -1.19271299,
-0.24217841, -0.206033 , -0.04466178, -0.27778949, -0.29539514,
-0.45645981, -1.04319155, 0.02779373, -0.29244867, 0.0155576 ,
-0.32742405, -0.1438915 , -0.92467487])
# 预测值
theta =result2[0]
predict = sigmod(X_new @ theta)
pre = []
for i in predict:
pre.append(1 if i >0.5 else 0)
#正确率
correct=[1 if a == b else 0 for (a,b)in zip(pre,y)]
acc = sum(correct)/len(X)
acc
0.8305084745762712
# 画出决策边界
# 用linspace确定样本点可以指定个数
x = np.linspace(-1, 1.5, 250)
xx, yy = np.meshgrid(x, x)
z = feature_mapping(xx.ravel(), yy.ravel(), 6)
z = z.values
z = z@theta
z = z.reshape(xx.shape)
z.shape
(250, 250)
plt.figure()
fig, ax = plt.subplots(figsize=(10,6))
ax.scatter(X[y==1,0],X[y==1,1],c='b',marker='o',label='Accepted')
ax.scatter(X[y==0,0],X[y==0,1],c='r',marker='x',label='Rejected')
plt.grid()
plt.legend()
plt.contour(xx, yy, z,0)
plt.show()