逻辑回归之芯片检测实战

逻辑回归之芯片通过预测

1.首先基于chip.test.csv数据建立回归模型(二阶边界),评估模型表现。

2.以函数的方式求解边界曲线。

3.描绘出完整的边界曲线。

测试用的数据集链接如下:https://pan.baidu.com/s/147IAv37uWDQysnuGCKNcLA
提取码:1234

#数据加载和数据可视化,产生新数据
#建立模型并训练模型,然后用模型来预测
#准确率
#边界函数求x1、x2
#描绘边界曲线
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression
#数据加载
data = pd.read_csv('chip_test.csv')
data.head()
test1test2pass
00.0512670.699561
1-0.0927420.684941
2-0.2137100.692251
3-0.3750000.502191
40.1837600.933480
#变量赋值
x = data.drop(['pass'],axis=1)
y = data.loc[:,'pass']
x1 = data.loc[:,'test1']
x2 = data.loc[:,'test2']
mask = data.loc[:,'pass']==1
#可视化数据
fig1 = plt.figure()
plt.figure(figsize=(8,8))
passed = plt.scatter(x1[mask],x2[mask])
failed = plt.scatter(x1[~mask],x2[~mask])
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()
<Figure size 432x288 with 0 Axes>

在这里插入图片描述

#先进行逻辑回归
LR1 = LogisticRegression()
LR1.fit(x,y)
LogisticRegression()
y_predict = LR1.predict(x)
print(y_predict)
[0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1
 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 0 0]
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y,y_predict)
print(accuracy)
0.5423728813559322

可见错误率很高!我们直接采用曲线的方法来进行分类求解。

# 生成新的数据
x1_2 = x1*x1
x2_2 = x2*x2
x1_x2 =x1*x2
x_new = {'x1':x1,'x2':x2,'x1_2':x1_2,'x2_2':x2_2,'x1_x2':x1_x2}
x_new = pd.DataFrame(x_new)
x_new.head()
x1x2x1_2x2_2x1_x2
00.0512670.699560.0026280.4893840.035864
1-0.0927420.684940.0086010.469143-0.063523
2-0.2137100.692250.0456720.479210-0.147941
3-0.3750000.502190.1406250.252195-0.188321
40.1837600.933480.0337680.8713850.171536
LR2 = LogisticRegression()
LR2.fit(x_new,y)
LogisticRegression()
y_predict1 = LR2.predict(x_new)
accuracy1 = accuracy_score(y,y_predict1)
print(accuracy1)
0.8135593220338984
x1_new = x1.sort_values()#排序,原因是让后面画图按照次序画图以防止画图很凌乱无序。
theta0 = LR2.intercept_
theta1 = LR2.coef_[0][0]
theta2 = LR2.coef_[0][1]
theta3 = LR2.coef_[0][2]
theta4 = LR2.coef_[0][3]
theta5 = LR2.coef_[0][4]
def f(x):
    a = theta4
    b = theta2+theta5*x1_new
    c = theta0+theta1*x1_new+theta3*x1_new*x1_new
    x2_new_boundary1 = (-b+(np.sqrt(b*b-4*a*c)))/(2*a)
    x2_new_boundary2 = (-b-(np.sqrt(b*b-4*a*c)))/(2*a)
    return x2_new_boundary1
#可视化划分
fig3 = plt.figure()
plt.figure(figsize=(8,8))
plt.plot(x1_new,f(x1_new))
passed = plt.scatter(x1[mask],x2[mask])
failed = plt.scatter(x1[~mask],x2[~mask])
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()
D:\ProgramData\Anaconda3\envs\imooc_ai\lib\site-packages\pandas\core\arraylike.py:397: RuntimeWarning: invalid value encountered in sqrt
  result = getattr(ufunc, method)(*inputs, **kwargs)



<Figure size 432x288 with 0 Axes>

在这里插入图片描述
我们发现绘制的连续图上只有一部分,为了补齐我们将边界的另外一个表达式也表示出来这样我们重新构建一个数组来表示,然后将其重新画出来

def f(x1_new):
    a = theta4
    b = theta2+theta5*x1_new
    c = theta0+theta1*x1_new+theta3*x1_new*x1_new
    x2_new_boundary1 = (-b+(np.sqrt(b*b-4*a*c)))/(2*a)
    x2_new_boundary2 = (-b-(np.sqrt(b*b-4*a*c)))/(2*a)#另外一个求解的表达式
    return x2_new_boundary1,x2_new_boundary2
x2_new_boundary1 = []
x2_new_boundary2 = []
for x in x1_new:
    x2_new_boundary1.append(f(x)[0])
    x2_new_boundary2.append(f(x)[1])
#print(x2_new_boundary1,x2_new_boundary2)
C:\Users\61628\AppData\Local\Temp\ipykernel_96676\2471311498.py:5: RuntimeWarning: invalid value encountered in sqrt
  x2_new_boundary1 = (-b+(np.sqrt(b*b-4*a*c)))/(2*a)
C:\Users\61628\AppData\Local\Temp\ipykernel_96676\2471311498.py:6: RuntimeWarning: invalid value encountered in sqrt
  x2_new_boundary2 = (-b-(np.sqrt(b*b-4*a*c)))/(2*a)
fig3 = plt.figure()
plt.figure(figsize=(8,8))
plt.plot(x1_new,x2_new_boundary1)
plt.plot(x1_new,x2_new_boundary2)
passed = plt.scatter(x1[mask],x2[mask])
failed = plt.scatter(x1[~mask],x2[~mask])
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()
<Figure size 432x288 with 0 Axes>

在这里插入图片描述
为了将上述断掉的地方补齐,我们重新定义x的区间然后再进行画图。

x1_range = [-0.9 + x/10000 for x in range(0,19000)]
x1_range = np.array(x1_range)
x2_new_boundary1 = []
x2_new_boundary2 = []
for x in x1_range:
    x2_new_boundary1.append(f(x)[0])
    x2_new_boundary2.append(f(x)[1])
C:\Users\61628\AppData\Local\Temp\ipykernel_96676\2471311498.py:5: RuntimeWarning: invalid value encountered in sqrt
  x2_new_boundary1 = (-b+(np.sqrt(b*b-4*a*c)))/(2*a)
C:\Users\61628\AppData\Local\Temp\ipykernel_96676\2471311498.py:6: RuntimeWarning: invalid value encountered in sqrt
  x2_new_boundary2 = (-b-(np.sqrt(b*b-4*a*c)))/(2*a)
fig4 = plt.figure()
plt.figure(figsize=(8,8))
plt.plot(x1_range,x2_new_boundary1)
plt.plot(x1_range,x2_new_boundary2)
passed = plt.scatter(x1[mask],x2[mask])
failed = plt.scatter(x1[~mask],x2[~mask])
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()
<Figure size 432x288 with 0 Axes>    

在这里插入图片描述

  • 4
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值