逻辑回归之芯片检测实战

Refrain*

已于 2022-04-03 21:38:47 修改

阅读量1.9k

点赞数 4

分类专栏：机器学习

于 2022-04-03 21:36:08 首次发布

本文链接：https://blog.csdn.net/weixin_42660711/article/details/123943942

版权

机器学习

机器学习专栏收录该内容

18 篇文章 14 订阅

订阅专栏

逻辑回归之芯片通过预测

1.首先基于chip.test.csv数据建立回归模型（二阶边界），评估模型表现。

2.以函数的方式求解边界曲线。

3.描绘出完整的边界曲线。

测试用的数据集链接如下：https://pan.baidu.com/s/147IAv37uWDQysnuGCKNcLA
提取码：1234

#数据加载和数据可视化,产生新数据
#建立模型并训练模型，然后用模型来预测
#准确率
#边界函数求x1、x2
#描绘边界曲线
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LogisticRegression

#数据加载
data = pd.read_csv('chip_test.csv')
data.head()

	test1	test2	pass
0	0.051267	0.69956	1
1	-0.092742	0.68494	1
2	-0.213710	0.69225	1
3	-0.375000	0.50219	1
4	0.183760	0.93348	0

#变量赋值
x = data.drop(['pass'],axis=1)
y = data.loc[:,'pass']
x1 = data.loc[:,'test1']
x2 = data.loc[:,'test2']
mask = data.loc[:,'pass']==1

#可视化数据
fig1 = plt.figure()
plt.figure(figsize=(8,8))
passed = plt.scatter(x1[mask],x2[mask])
failed = plt.scatter(x1[~mask],x2[~mask])
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()

<Figure size 432x288 with 0 Axes>

在这里插入图片描述

#先进行逻辑回归
LR1 = LogisticRegression()
LR1.fit(x,y)

LogisticRegression()

y_predict = LR1.predict(x)
print(y_predict)

[0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1
 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 0 0]

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y,y_predict)
print(accuracy)

0.5423728813559322

可见错误率很高！我们直接采用曲线的方法来进行分类求解。

# 生成新的数据
x1_2 = x1*x1
x2_2 = x2*x2
x1_x2 =x1*x2
x_new = {'x1':x1,'x2':x2,'x1_2':x1_2,'x2_2':x2_2,'x1_x2':x1_x2}
x_new = pd.DataFrame(x_new)
x_new.head()

	x1	x2	x1_2	x2_2	x1_x2
0	0.051267	0.69956	0.002628	0.489384	0.035864
1	-0.092742	0.68494	0.008601	0.469143	-0.063523
2	-0.213710	0.69225	0.045672	0.479210	-0.147941
3	-0.375000	0.50219	0.140625	0.252195	-0.188321
4	0.183760	0.93348	0.033768	0.871385	0.171536

LR2 = LogisticRegression()
LR2.fit(x_new,y)

LogisticRegression()

y_predict1 = LR2.predict(x_new)
accuracy1 = accuracy_score(y,y_predict1)
print(accuracy1)

0.8135593220338984

x1_new = x1.sort_values()#排序,原因是让后面画图按照次序画图以防止画图很凌乱无序。
theta0 = LR2.intercept_
theta1 = LR2.coef_[0][0]
theta2 = LR2.coef_[0][1]
theta3 = LR2.coef_[0][2]
theta4 = LR2.coef_[0][3]
theta5 = LR2.coef_[0][4]

def f(x):
    a = theta4
    b = theta2+theta5*x1_new
    c = theta0+theta1*x1_new+theta3*x1_new*x1_new
    x2_new_boundary1 = (-b+(np.sqrt(b*b-4*a*c)))/(2*a)
    x2_new_boundary2 = (-b-(np.sqrt(b*b-4*a*c)))/(2*a)
    return x2_new_boundary1

#可视化划分
fig3 = plt.figure()
plt.figure(figsize=(8,8))
plt.plot(x1_new,f(x1_new))
passed = plt.scatter(x1[mask],x2[mask])
failed = plt.scatter(x1[~mask],x2[~mask])
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()

D:\ProgramData\Anaconda3\envs\imooc_ai\lib\site-packages\pandas\core\arraylike.py:397: RuntimeWarning: invalid value encountered in sqrt
  result = getattr(ufunc, method)(*inputs, **kwargs)



<Figure size 432x288 with 0 Axes>

在这里插入图片描述
我们发现绘制的连续图上只有一部分，为了补齐我们将边界的另外一个表达式也表示出来这样我们重新构建一个数组来表示，然后将其重新画出来

def f(x1_new):
    a = theta4
    b = theta2+theta5*x1_new
    c = theta0+theta1*x1_new+theta3*x1_new*x1_new
    x2_new_boundary1 = (-b+(np.sqrt(b*b-4*a*c)))/(2*a)
    x2_new_boundary2 = (-b-(np.sqrt(b*b-4*a*c)))/(2*a)#另外一个求解的表达式
    return x2_new_boundary1,x2_new_boundary2

x2_new_boundary1 = []
x2_new_boundary2 = []
for x in x1_new:
    x2_new_boundary1.append(f(x)[0])
    x2_new_boundary2.append(f(x)[1])
#print(x2_new_boundary1,x2_new_boundary2)

C:\Users\61628\AppData\Local\Temp\ipykernel_96676\2471311498.py:5: RuntimeWarning: invalid value encountered in sqrt
  x2_new_boundary1 = (-b+(np.sqrt(b*b-4*a*c)))/(2*a)
C:\Users\61628\AppData\Local\Temp\ipykernel_96676\2471311498.py:6: RuntimeWarning: invalid value encountered in sqrt
  x2_new_boundary2 = (-b-(np.sqrt(b*b-4*a*c)))/(2*a)

fig3 = plt.figure()
plt.figure(figsize=(8,8))
plt.plot(x1_new,x2_new_boundary1)
plt.plot(x1_new,x2_new_boundary2)
passed = plt.scatter(x1[mask],x2[mask])
failed = plt.scatter(x1[~mask],x2[~mask])
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()

<Figure size 432x288 with 0 Axes>

在这里插入图片描述
为了将上述断掉的地方补齐，我们重新定义x的区间然后再进行画图。

x1_range = [-0.9 + x/10000 for x in range(0,19000)]
x1_range = np.array(x1_range)
x2_new_boundary1 = []
x2_new_boundary2 = []
for x in x1_range:
    x2_new_boundary1.append(f(x)[0])
    x2_new_boundary2.append(f(x)[1])

C:\Users\61628\AppData\Local\Temp\ipykernel_96676\2471311498.py:5: RuntimeWarning: invalid value encountered in sqrt
  x2_new_boundary1 = (-b+(np.sqrt(b*b-4*a*c)))/(2*a)
C:\Users\61628\AppData\Local\Temp\ipykernel_96676\2471311498.py:6: RuntimeWarning: invalid value encountered in sqrt
  x2_new_boundary2 = (-b-(np.sqrt(b*b-4*a*c)))/(2*a)

fig4 = plt.figure()
plt.figure(figsize=(8,8))
plt.plot(x1_range,x2_new_boundary1)
plt.plot(x1_range,x2_new_boundary2)
passed = plt.scatter(x1[mask],x2[mask])
failed = plt.scatter(x1[~mask],x2[~mask])
plt.xlabel('test1')
plt.ylabel('test2')
plt.legend((passed,failed),('passed','failed'))
plt.show()

<Figure size 432x288 with 0 Axes>

在这里插入图片描述

Refrain*

关注

4
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
逻辑回归之芯片检测实战

逻辑回归之芯片通过预测1.首先基于chip.test.csv数据建立回归模型（二阶边界），评估模型表现。2.以函数的方式求解边界曲线。3.描绘出完整的边界曲线。测试用的数据集链接如下：https://pan.baidu.com/s/147IAv37uWDQysnuGCKNcLA提取码：1234#数据加载和数据可视化,产生新数据#建立模型并训练模型，然后用模型来预测#准确率#边界函数求x1、x2#描绘边界曲线import pandas as pdimport matplotlib.py
复制链接

扫一扫

专栏目录