python实现吴恩达机器学习练习2(逻辑回归)-data1
这篇是第一个数据集:这部分练习中,你将建立一个预测学生是否被大学录取的逻辑回归模型。
假如一所大学会每个报名学生进行两项入学考试,根据两项考试成绩决定是否录取。我们的任务是根据以往100名学生的考试成绩和录取与否的数据进行训练,编码一个学生是否录取的分类器模型。
参考链接:https://blog.csdn.net/Cowry5/article/details/80247569
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
data1 = pd.read_csv('D:/Python/exercise/samples/AndrewNg/ex2/ex2data1.csv', names = ['x1','x2','y'])
len(data1)
100
1 Logistic Regression
1.1 visualizing the data
#散点图做法,有两种点,所以把每类点的数据分开,分别同一张图上画
positive = data1[data1['y'] == 1]
negative = data1[data1['y'] == 0]
positive_x1 = positive.loc[:,'x1']
negative_x2 = negative.loc[:,'x2']
positive_x2 = positive.loc[:,'x2']
negative_x1 = negative.loc[:,'x1']
plt.figure(figsize= (7,7))
plt.scatter(x = positive_x1, y = positive_x2, marker = 'x', color = 'r', label = 'positive')
plt.scatter(x = negative_x1, y = negative_x2, marker = 'o', color = 'b', label = 'negative')
plt.legend()
data1['intercept'] = 1 # 插入一列x0=1
data1 = data1.reindex(columns = ['intercept', 'x1', 'x2', 'y'])
X_ori = data1.iloc[: ,: 3].values
y_ori = data1.iloc[: ,-1].values
1.2 implementation
1.2.1 sigmoid function
定义 Sigmoid函数:
h
θ
(
x
)
=
g
(
θ
T
x
)
h_{\theta}(x)=g(\theta^{T}x)
hθ(x)=g(θTx)
令
z
=
θ
T
x
令z=\theta^{T}x
令z=θTx
∴
s
i
g
m
o
i
d
(
x
)
=
1
1
+
e
−
θ
T
x
⟹
g
(
z
)
=
1
1
+
e
−
z
\therefore sigmoid(x)=\frac{1}{1+e^{-\theta^{T}x}}\Longrightarrow g(z)=\frac{1}{1+e^{-z}}
∴sigmoid(x)=1+e−θTx1⟹g(z)=1+e−z1
def sigmoid(z):
g = 1 / (1 + np.exp(-z))
return g
1.2.2 cost function and gradient
定义 J ( θ ) J(\theta) J(θ)函数, J ( θ ) = − 1 m [ ∑ i = 1 m y ( i ) l n h θ ( x ( i ) ) + ( 1 − y ( i ) ) l n ( 1 − h θ ( x ( i ) ) ) ] J(\theta)=-\frac{1}{m}[\sum_{i=1}^my^{(i)}lnh_{\theta}(x^{(i)})+(1-y^{(i)})ln(1-h_{\theta}(x^{(i)})) ] J(θ)=−m1[i=1∑my(i)lnhθ(x(i))+(1−y(i))ln(1−hθ(x(i)))]
def J_func(theta, x, y):
cost = -y * np.log(sigmoid(x.dot(theta.T))) - (1-y) * np.log(1-sigmoid(x.dot(theta.T)))
J = cost.mean()
return J
θ \theta θ初始值为零
theta_ori = np.zeros(3)
J_func(theta_ori, X_ori, y_ori)
0.6931471805599453
定义梯度gradient,即 ∂ ∂ θ j J ( θ ) = 1 m ∑ i = 1 m ( h θ ( x ( i ) ) − y ( i ) ) x j ( i ) \frac{\partial}{\partial{\theta_j}}J(\theta) = \frac{1}{m}\sum_{i=1}^m(h_{\theta}(x^{(i)})-y^{(i)})x_j^{(i)} ∂θj∂J(θ)=m1i=1∑m(hθ(x(i))−y(i))xj(i)
#参数顺序不能乱
def gradient(theta, x, y):
gra = x.T.dot(sigmoid(x.dot(theta.T))-y) / len(x)
return gra
gradient(theta_ori, X_ori, y_ori)
array([ -0.1 , -12.00921659, -11.26284221])
1.2.3 learning parameters using fmintnc
import scipy.optimize as opt
result_ori = opt.fmin_tnc(func = J_func, x0 = theta_ori, fprime = gradient, args = (X_ori, y_ori))
result_ori[0]
array([-25.16131858, 0.20623159, 0.20147149])
决 策 边 界 : θ 0 + θ 1 x 1 + θ 2 x 2 = 0 决策边界:\theta_0+\theta_1x_1+\theta_2x_2=0 决策边界:θ0+θ1x1+θ2x2=0
x 2 = − ( θ 0 + θ 1 x 1 ) θ 2 x_2=\frac{-(\theta_0+\theta_1x_1)}{\theta_2} x2=θ2−(θ0+θ1x1)
x1 = np.arange(20,110,1)
x2 = - (result_ori[0][0] + result_ori[0][1] * x1) / result_ori[0][2]
plt.figure(figsize = (7, 7))
plt.plot(x1, x2, label = 'decision boundary') # 决策边界线
plt.scatter(x = positive_x1, y = positive_x2, marker = 'x', color = 'r', label = 'positive')
plt.scatter(x = negative_x1, y = negative_x2, marker = 'o', color = 'b', label = 'negative')
plt.legend()
实验:增加多项式项使决策边界更加拟合
根据吴恩达课程-《the problem of overfitting》
这个公式是:
g
(
θ
0
+
θ
1
x
1
+
θ
2
x
2
+
θ
3
x
1
2
+
θ
4
x
2
2
+
θ
5
x
1
x
2
)
g(\theta_{0}+\theta_{1}x_{1}+\theta_{2}x_{2}+\theta_{3}x_{1}^2+\theta_{4}x_{2}^2+\theta_{5}x_{1}x_{2})
g(θ0+θ1x1+θ2x2+θ3x12+θ4x22+θ5x1x2)
data1['x1^2'] = data1['x1'] ** 2
data1['x2^2'] = data1['x2'] ** 2
data1['x1*x2'] = data1['x1'] * data1['x2']
data1 = data1.reindex(columns = ['intercept', 'x1', 'x2', 'x1^2', 'x2^2', 'x1*x2', 'y'])
data1.head()
intercept | x1 | x2 | x1^2 | x2^2 | x1*x2 | y | |
---|---|---|---|---|---|---|---|
0 | 1 | 34.623660 | 78.024693 | 1198.797805 | 6087.852690 | 2701.500406 | 0 |
1 | 1 | 30.286711 | 43.894998 | 917.284849 | 1926.770807 | 1329.435094 | 0 |
2 | 1 | 35.847409 | 72.902198 | 1285.036716 | 5314.730478 | 2613.354893 | 0 |
3 | 1 | 60.182599 | 86.308552 | 3621.945269 | 7449.166166 | 5194.273015 | 1 |
4 | 1 | 79.032736 | 75.344376 | 6246.173368 | 5676.775061 | 5954.672216 | 1 |
X = data1.iloc[:,:6]
Y = data1.iloc[:,-1]
X = X.values
Y = Y.values
θ \theta θ的初始值用零
theta = np.zeros(6)
theta
array([0., 0., 0., 0., 0., 0.])
def J_func(theta, x, y):
cost = -y * np.log(sigmoid(x.dot(theta.T))) - (1-y) * np.log(1-sigmoid(x.dot(theta.T)))
J = cost.mean()
return J
#这个数等于-ln(0.5)
J_func(theta, X, Y)
0.6931471805599453
gradient(theta, X, Y)
array([-1.00000000e-01, -1.20092166e+01, -1.12628422e+01, -1.13895134e+03,
-1.06939408e+03, -1.09872219e+03])
import scipy.optimize as opt
result = opt.fmin_tnc(func = J_func, x0 = theta, fprime = gradient, args = (X, Y))
D:\ProgramData\lib\site-packages\ipykernel_launcher.py:2: RuntimeWarning: divide by zero encountered in log
D:\ProgramData\lib\site-packages\ipykernel_launcher.py:2: RuntimeWarning: invalid value encountered in multiply
result
(array([-1.86643292e-02, -3.75077940e-01, -3.09429907e-01, 1.06962108e-03,
3.50023874e-04, 1.03079001e-02]), 80, 1)
result = result[0]
result_2 = opt.minimize(fun = J_func, x0 = theta, args = (X,Y), method = 'TNC', jac = gradient)
D:\ProgramData\lib\site-packages\ipykernel_launcher.py:2: RuntimeWarning: divide by zero encountered in log
D:\ProgramData\lib\site-packages\ipykernel_launcher.py:2: RuntimeWarning: invalid value encountered in multiply
result_2
fun: 0.09302529598313683
jac: array([-4.49764720e-03, 1.75005873e-01, -8.49928704e-01, 4.58961567e+01,
-9.03255063e+01, -2.62541929e+01])
message: 'Converged (|f_n-f_(n-1)| ~= 0)'
nfev: 80
nit: 5
status: 1
success: True
x: array([-1.86643292e-02, -3.75077940e-01, -3.09429907e-01, 1.06962108e-03,
3.50023874e-04, 1.03079001e-02])
画二元多项式的方法,用a,b造一个网格,用c表示a和b的函数值(即 c = f ( a , b ) c=f(a,b) c=f(a,b)或 c = f ( x 1 , x 2 ) c=f(x1,x2) c=f(x1,x2)),相当于用坐标轴的连续点做出x1,x2对应的c的等高线
a = np.arange(20,110,1)
b = np.arange(20,110,1)
xs, ys = np.meshgrid(a,b)
c = result[0] + result[1]*xs + result[2]*ys + result[3]*(xs**2) + result[4]*(ys**2) + result[5]*(xs*ys)
<Figure size 1080x1080 with 0 Axes>
<Figure size 1080x1080 with 0 Axes>
#散点图做法,有两种点,所以把每类点的数据分开,分别同一张图上画
positive = data1[data1['y'] == 1]
negative = data1[data1['y'] == 0]
positive_x1 = positive.loc[:,'x1']
negative_x2 = negative.loc[:,'x2']
positive_x2 = positive.loc[:,'x2']
negative_x1 = negative.loc[:,'x1']
plt.figure(figsize= (15,15))
plt.contour(xs,ys,c,0, label = 'decision boundary') #画出c的等高线,等于0的那条
plt.scatter(x = positive_x1, y = positive_x2, marker = 'x', color = 'r', label = 'positive')
plt.scatter(x = negative_x1, y = negative_x2, label = 'negative')
plt.legend()
曲线分类的效果比线性分类直观上拟合效果更佳