作业二 Logistic回归(python实现)
前言
在此记录机器学习课后题中不懂得一些知识点,以及需要的知识点出处。
一、机器学习与深度学习作业目录
来自此博主:https://blog.csdn.net/Cowry5/article/details/83302646
二、作业二
1.建立一个逻辑回归模型来预测一个学生是否能进入大学。假设你是一所大学的行政管理人员,你想根据两门考试的结果,来决定每个申请人是否被录取。你有以前申请人的历史数据,可以将其用作逻辑回归训练集。对于每一个训练样本,你有申请人两次测评的分数以及录取的结果。为了完成这个预测任务,我们准备构建一个可以基于两次测试评分来评估录取可能性的分类模型。
代码如下():
import numpy as np
import matplotlib.pylab as plt
import scipy.optimize as op
# Load Data
data = np.loadtxt('ex2data1.txt', delimiter=',') #指定冒号作为分隔符(delimiter)
X = data[:, 0:2]
Y = data[:, 2]
============== Part 1: Plotting ====================
print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')
# 绘制散点图像
def plotData(x, y):
pos = np.where(y == 1)
neg = np.where(y == 0)
p1 = plt.scatter(x[pos, 0], x[pos, 1], marker='+', s=30, color='b')#scatter(x, y, 点的大小, 颜色,标记)绘制散点图
p2 = plt.scatter(x[neg, 0], x[neg, 1], marker='o', s=30, color='y')
plt.legend((p1, p2), ('Admitted', 'Not admitted'), loc='upper right', fontsize=8)
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.show()
plotData(X, Y)
_ = input('Press [Enter] to continue.')
np.where用法
1.np.where(condition,x,y) 当where内有三个参数时,第一个参数表示条件,当条件成立时where方法返回x,当条件不成立时where返回y
2.np.where(condition) 当where内只有一个参数时,那个参数表示条件,当条件成立时,where返回的是每个符合condition条件元素的坐标,返回的是以元组的形式
======== Part 2: Compute Cost and Gradient ============
m, n = np.shape(X)
X = np.concatenate((np.ones((m, 1)), X), axis=1) # 这里的axis=1表示按照列进行合并(axis=0表示按照行进行合并)
init_theta = np.zeros((n+1,))
sigmoid函数
def sigmoid(z):
g = 1/(1+np.exp(-1*z))
return g
计算损失函数和梯度函数
def costFunction(theta, x, y):
m = np.size(y, 0)
h = sigmoid(x.dot(theta))
if np.sum(1-h < 1e-10) != 0: #1-h < 1e-10相当于h > 0.99999999
return np.inf #np.inf 无穷大
j = -1/m*(y.dot(np.log(h))+(1-y).dot(np.log(1-h)))
return j
def gradFunction(theta, x, y):
m = np.size(y, 0)
grad = 1 / m * (x.T.dot(sigmoid(x.dot(theta)) - y))
return grad
cost = costFunction(init_theta, X, Y)
grad = gradFunction(init_theta, X, Y)
print('Cost at initial theta (zeros): ', cost)
print('Gradient at initial theta (zeros): ', grad)
_ = input('Press [Enter] to continue.')
====== Part 3: Optimizing using fmin_bfgs =============
result = op.minimize(costFunction, x0=init_theta, method='BFGS', jac=gradFunction, args=(X, Y)) #fun:求最小值的目标函数; x0:变量的初始猜测值; minimize是局部最优的解法; args:常数值(元组);
#method:求极值的方法(BFGS逻辑回归法); jac:计算梯度向量的方法
theta = result.x
print('Cost at theta found by fmin_bfgs: ', result.fun) #result.fun为最小代价
print('theta: ', theta)
高级优化算法(scipy.optimize): https://blog.csdn.net/qq_57496048/article/details/117826383
绘制图像
def plotDecisionBoundary(theta, x, y):
pos = np.where(y == 1)
neg = np.where(y == 0)
p1 = plt.scatter(x[pos, 1], x[pos, 2], marker='+', s=60, color='r')
p2 = plt.scatter(x[neg, 1], x[neg, 2], marker='o', s=60, color='y')
plot_x = np.array([np.min(x[:, 1])-2, np.max(x[:, 1]+2)])
plot_y = -1/theta[2]*(theta[1]*plot_x+theta[0])
plt.plot(plot_x, plot_y)
plt.legend((p1, p2), ('Admitted', 'Not admitted'), loc='upper right', fontsize=8)
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.show()
plotDecisionBoundary(theta, X, Y)
_ = input('Press [Enter] to continue.')
========= Part 4: Predict and Accuracies ==============
prob = sigmoid(np.array([1, 45, 85]).dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of: ', prob)
预测给定值
def predict(theta, x):
m = np.size(X, 0)
p = np.zeros((m,))
pos = np.where(x.dot(theta) >= 0)
neg = np.where(x.dot(theta) < 0)
p[pos] = 1
p[neg] = 0
return p
p = predict(theta, X)
print('Train Accuracy: ', np.sum(p == Y)/np.size(Y, 0))