1.简单预测与评估模型
#模型评估
#预测一个45、85分的学生被录取的概率
prob = sigmoid.sigmoid(np.matmul([1,45,85],theta))
print('成绩为45、85的学生被录取的概率是:',prob)
#评估模型准确率
m2 = np.size(x,0)
p = np.zeros((m2, 1))
matmul = np.matmul(x, theta)
p2 = sigmoid.sigmoid(matmul)
c1 = p2 >= 0.5
c2 = c1 == y
accuracy = np.mean(c2 * 100)
print('模型准确率为:',accuracy)
输出结果:
成绩为45、85的学生被录取的概率是: [0.77629071]
模型准确率为:0.89
与预期的结果相同
2.完成有关工厂芯片的数据练习,首先将数据可视化
在本练习的这一部分中,将实施正则化逻辑回归,以预测来自制造工厂的微芯片是否通过质量保证(QA)。 在QA期间,每个微芯片都经过各种测试以确保其正常运行。
我们有一些过去的测试数据,用来帮助我们分辨芯片是否能通过良品检测。
from pylab import *
import pandas as pd
#导入数据
path = 'C:\\Users\Administrator\PycharmProjects\Clear\ml\ex2\picture2.txt'
df = pd.read_csv(path)
#绘制散点图
n = 1024
X = df.loc[df['value']==1,'x']
Y = df.loc[df['value']==1,'y']
scatter(X,Y,marker = '*',color = 'r')
X2 = df.loc[df['value']==0,'x']
Y2 = df.loc[df['value']==0,'y']
scatter(X2,Y2,marker = '+',color = 'y')
show()
输出结果
3.完成Feature mapping
#mapFuture正则化逻辑回归
path2 = 'C:\\Users\Administrator\PycharmProjects\Clear\ml\ex2\ex2data2.txt'
df = np.genfromtxt(path2, delimiter=',')
# 这里注意,在array中取某列采用的形式是0:2或者2:3,如果只用2取一列的话会出问题
x1 = df[:, 0:1]
x2 = df[:, 1:2]
y = df[:, 2:3]
#定义mapFeature,将训练数据集变为28列的数据集
def mapFuture(x1,x2):
size = np.size(x1,0)
out = np.ones((size,1))
w = 0
for i in range(1,7):
for j in range(0,i+1):
out = np.hstack((out,x1**(i-j)*(x2**j)))
return out
out = mapFuture(x1,x2)
4.定义新的代价函数并进行初步建模
#设置初始化参数
out = mapFuture(x1,x2)
initial_theta = np.zeros((np.size(out,1),1))
l = 1
#定义新的代价函数
def costFunctionReg(theta, out, y, l):
m = len(y)
h = sigmoid.sigmoid(np.matmul(out,theta))
L = theta
L[0] = 0
J = (1/m)*(np.matmul(-y.T,log(h))-np.matmul((1-y).T,log(1-h)))+l/(2*m)*np.matmul(L.T,L)
grad = (1/m)*np.matmul(out.T,sigmoid.sigmoid(np.matmul(out,theta))-y)+(l/m)*L
return J,grad
[cost, grad] = costFunctionReg(initial_theta, out, y, l)
print('初始化theta的函数代价值为:',cost)
print('初始化theta的梯度为:(前五个)',grad[0:5])
输出结果:
初始化theta的函数代价值为: [[0.69314718]]
初始化theta的梯度为:(前五个) [[8.47457627e-03]
[1.87880932e-02]
[7.77711864e-05]
[5.03446395e-02]
[1.15013308e-02]]
与预期结果一致
5.计算并展示代价函数与梯度
test_theta = ones((size(out,1),1))
[cost, grad] = costFunctionReg(test_theta, out, y, 10)
print('Cost at test theta (with lambda = 10):',cost)
print('Gradient at test theta - first five values only:')
print(grad[0:5])
输出结果:
Cost at test theta (with lambda = 10): [[3.16450933]]
Gradient at test theta - first five values only:
[[0.20598218]
[0.18102403]
[0.20272661]
[0.19863458]
[0.10254902]]
与预期结果一致