数据的回归分析

本文介绍了使用Excel进行线性回归分析以研究父母与子女身高的关系,并探讨了线性回归是否成立的判断方法。接着,通过鸢尾花数据集展示了如何利用Anaconda创建虚拟环境,安装必要的Python包,并运用LinearSVC实现分类。通过调整C参数,观察决策边界的改变,强调了C值对分类边界的影响。
摘要由CSDN通过智能技术生成

一、Excel线性回归数据分析

这里使用excel的数据分析功能来分析父母身高与子女身高的关系
在这里插入图片描述
如果没有则需要去文件–>选项–>加载项里面开启功能

在这里插入图片描述
首先点击数据分析,选择回归,然后选择对应的X和Y值,最后选择所需输出的信息即可
在这里插入图片描述
这里由于没有处理掉多余项,所以相关性会极差并出现以下图片,但是也不妨碍大概分析

父子身高关系图:
在这里插入图片描述
根据图中信息显示,子女的身高与父亲的身高大致呈正比关系,并由方程得出当父亲身高为75英寸时,子女身高大概为69.065英寸

母子身高关系图:
在这里插入图片描述
根据图中信息显示,子女的身高与父亲的身高大致呈正比关系

其实由与没有对数据进行预处理,因此得出的回归方程并不成立,但是我们还是能大致看出“母高高一窝,父高高一个”的习俗说法还是成立的

二、判断线性回归是否成立

这里判断了4个例子的线性回归是否成立

在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
不难看出,除了第三个外,其余例子中的线性回归都不成立

三、鸢尾花Iris数据集

1、Anaconda创建虚拟环境及安装对应的包

创建虚拟环境
1.命令行创建
打开命令行
在这里插入图片描述
输入下面命令

conda create -n sklearn python=3.6

   
   

tf1是自己为创建虚拟环境取的名字,后面python的版本可以根据自己需求进行选择。


2.界面创建
打开界面
在这里插入图片描述
创建环境
在这里插入图片描述

安装包

pip install 包名
直接这样安装可以由于网络的原因,安装失败或者安装很慢
解决方式:
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple 包名
此处安装的包包括numpy、pandas、sklearn、matplotlib

2、LinearSVC(C)方式实现分类

导入需要使用的包

#导入相应的包
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC

   
   

获取数据

# 获取所需数据集
iris=datasets.load_iris()
#每行的数据,一共四列,每一列映射为feature_names中对应的值
X=iris.data
#每行数据对应的分类结果值(也就是每行数据的label值),取值为[0,1,2]
Y=iris.target
#通过Y=iris.target.size,可以得到一共150行数据,三个类别个50条数据,并且数据是按照012的顺序放的

   
   

对数据进行处理

#只取y<2的类别,也就是0 1并且只取前两个特征
X=X[:,:2]
#获取0 1类别的数据
Y1=Y[Y<2]
y1=len(Y1)
#获取0类别的数据
Y2=Y[Y<1]
y2=len(Y2)
X=X[:y1,:2]

   
   

未经标准化的原始数据点的绘制

#绘制出类别0和类别1
plt.scatter(X[0:y2,0],X[0:y2,1],color='red')
plt.scatter(X[y2+1:y1,0],X[y2+1:y1,1],color='blue')
plt.show()

   
   

在这里插入图片描述
数据归一化处理

#标准化
standardScaler=StandardScaler()
standardScaler.fit(X)
#计算训练数据的均值和方差
X_standard=standardScaler.transform(X)
#用scaler中的均值和方差来转换X,使X标准化
svc=LinearSVC(C=1e9)
svc.fit(X_standard,Y1)

   
   

画出决策边界
相关函数的说明:

meshgrid() 返回了有两个向量定义的方形空间中的所有点的集合。x0是x值,x1是y的值
ravel() 将向量拉成一行
c_[] 将向量排列在一起
contourf() 等高线

def plot_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),# 600个,影响列数
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),# 600个,影响行数
    )
    # x0 和 x1 被拉成一列,然后拼接成3600002列的矩阵,表示所有点
    X_new = np.c_[x0.ravel(), x1.ravel()]    # 变成 600 * 600行, 2列的矩阵
y_predict <span class="token operator">=</span> model<span class="token punctuation">.</span><span class="token function">predict</span><span class="token punctuation">(</span>X_new<span class="token punctuation">)</span>   # 二维点集才可以用来预测
zz <span class="token operator">=</span> y_predict<span class="token punctuation">.</span><span class="token function">reshape</span><span class="token punctuation">(</span>x0<span class="token punctuation">.</span>shape<span class="token punctuation">)</span>   # <span class="token punctuation">(</span><span class="token number">600</span><span class="token punctuation">,</span> <span class="token number">600</span><span class="token punctuation">)</span>
<span class="token keyword">from</span> matplotlib<span class="token punctuation">.</span>colors <span class="token keyword">import</span> ListedColormap
custom_cmap <span class="token operator">=</span> <span class="token function">ListedColormap</span><span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token string">'#EF9A9A'</span><span class="token punctuation">,</span><span class="token string">'#FFF59D'</span><span class="token punctuation">,</span><span class="token string">'#90CAF9'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>    
plt<span class="token punctuation">.</span><span class="token function">contourf</span><span class="token punctuation">(</span>x0<span class="token punctuation">,</span> x1<span class="token punctuation">,</span> zz<span class="token punctuation">,</span> linewidth<span class="token operator">=</span><span class="token number">5</span><span class="token punctuation">,</span> cmap<span class="token operator">=</span>custom_cmap<span class="token punctuation">)</span>
    #<span class="token function">print</span><span class="token punctuation">(</span>X_new<span class="token punctuation">)</span>

plot_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[0:y2,0], X_standard[0:y2,1],color=‘red’)
plt.scatter(X_standard[y2:y1,0], X_standard[y2:y1,1],color=‘blue’)
plt.show()

在这里插入图片描述
例化一个svc2(主要是LinearSVC©中C的修改)

svc2=LinearSVC(C=0.01)
svc2.fit(X_standard,Y1)
print(svc2.coef_)
print(svc2.intercept_)
plot_decision_boundary(svc2, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[0:y2,0], X_standard[0:y2,1],color='red')
plt.scatter(X_standard[y2:y1,0], X_standard[y2:y1,1],color='blue')
plt.show()

 
 

在这里插入图片描述
通过两个绘制得到的图表可以很明显的看到决策边界的不同,表面C C 越小容错空间越大。
分类后的内容基础上添加上下边界

def plot_svc_decision_boundary(model, axis):
    x0, x1 = np.meshgrid(
        np.linspace(axis[0], axis[1], int((axis[1]-axis[0])*100)).reshape(-1, 1),# 600个,影响列数
        np.linspace(axis[2], axis[3], int((axis[3]-axis[2])*100)).reshape(-1, 1),# 600个,影响行数
    )
    # x0 和 x1 被拉成一列,然后拼接成3600002列的矩阵,表示所有点
    X_new = np.c_[x0.ravel(), x1.ravel()]    # 变成 600 * 600行, 2列的矩阵
y_predict <span class="token operator">=</span> model<span class="token punctuation">.</span><span class="token function">predict</span><span class="token punctuation">(</span>X_new<span class="token punctuation">)</span>   # 二维点集才可以用来预测
zz <span class="token operator">=</span> y_predict<span class="token punctuation">.</span><span class="token function">reshape</span><span class="token punctuation">(</span>x0<span class="token punctuation">.</span>shape<span class="token punctuation">)</span>   # <span class="token punctuation">(</span><span class="token number">600</span><span class="token punctuation">,</span> <span class="token number">600</span><span class="token punctuation">)</span>

<span class="token keyword">from</span> matplotlib<span class="token punctuation">.</span>colors <span class="token keyword">import</span> ListedColormap
custom_cmap <span class="token operator">=</span> <span class="token function">ListedColormap</span><span class="token punctuation">(</span><span class="token punctuation">[</span><span class="token string">'#EF9A9A'</span><span class="token punctuation">,</span><span class="token string">'#FFF59D'</span><span class="token punctuation">,</span><span class="token string">'#90CAF9'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>

plt<span class="token punctuation">.</span><span class="token function">contourf</span><span class="token punctuation">(</span>x0<span class="token punctuation">,</span> x1<span class="token punctuation">,</span> zz<span class="token punctuation">,</span> linewidth<span class="token operator">=</span><span class="token number">5</span><span class="token punctuation">,</span> cmap<span class="token operator">=</span>custom_cmap<span class="token punctuation">)</span>

w <span class="token operator">=</span> model<span class="token punctuation">.</span>coef_<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>
b <span class="token operator">=</span> model<span class="token punctuation">.</span>intercept_<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span>

index_x <span class="token operator">=</span> np<span class="token punctuation">.</span><span class="token function">linspace</span><span class="token punctuation">(</span>axis<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> axis<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> <span class="token number">100</span><span class="token punctuation">)</span>

f(x,y) = w[0]x1 + w[1]x2 + b

1 = w[0]x1 + w[1]x2 + b 上边界

-1 = w[0]x1 + w[1]x2 + b 下边界

y_up <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token operator">-</span>w<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token operator">*</span>index_x <span class="token operator">-</span> b<span class="token punctuation">)</span> <span class="token operator">/</span> w<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span>
y_down <span class="token operator">=</span> <span class="token punctuation">(</span><span class="token operator">-</span><span class="token number">1</span><span class="token operator">-</span>w<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token operator">*</span>index_x <span class="token operator">-</span> b<span class="token punctuation">)</span> <span class="token operator">/</span> w<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span>

x_index_up <span class="token operator">=</span> index_x<span class="token punctuation">[</span><span class="token punctuation">(</span>y_up<span class="token operator">&lt;=</span>axis<span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span>  <span class="token operator">&amp;</span> <span class="token punctuation">(</span>y_up<span class="token operator">&gt;=</span>axis<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">]</span>
x_index_down <span class="token operator">=</span> index_x<span class="token punctuation">[</span><span class="token punctuation">(</span>y_down<span class="token operator">&lt;=</span>axis<span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">&amp;</span> <span class="token punctuation">(</span>y_down<span class="token operator">&gt;=</span>axis<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">]</span>

y_up <span class="token operator">=</span> y_up<span class="token punctuation">[</span><span class="token punctuation">(</span>y_up<span class="token operator">&lt;=</span>axis<span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span>  <span class="token operator">&amp;</span> <span class="token punctuation">(</span>y_up<span class="token operator">&gt;=</span>axis<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">]</span>
y_down <span class="token operator">=</span> y_down<span class="token punctuation">[</span><span class="token punctuation">(</span>y_down<span class="token operator">&lt;=</span>axis<span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">)</span> <span class="token operator">&amp;</span> <span class="token punctuation">(</span>y_down<span class="token operator">&gt;=</span>axis<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">]</span>

plt<span class="token punctuation">.</span><span class="token function">plot</span><span class="token punctuation">(</span>x_index_up<span class="token punctuation">,</span> y_up<span class="token punctuation">,</span> color<span class="token operator">=</span><span class="token string">"black"</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span><span class="token function">plot</span><span class="token punctuation">(</span>x_index_down<span class="token punctuation">,</span> y_down<span class="token punctuation">,</span> color<span class="token operator">=</span><span class="token string">"black"</span><span class="token punctuation">)</span>

plot_svc_decision_boundary(svc, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[0:y2,0], X_standard[0:y2,1],color=‘red’)
plt.scatter(X_standard[y2:y1,0], X_standard[y2:y1,1],color=‘blue’)
plt.show()

在这里插入图片描述
修改C的值

plot_svc_decision_boundary(svc2, axis=[-3, 3, -3, 3])
plt.scatter(X_standard[0:y2,0], X_standard[0:y2,1],color='red')
plt.scatter(X_standard[y2:y1,0], X_standard[y2:y1,1],color='blue')
plt.show()

 
 

在这里插入图片描述

参考链接

https://blog.csdn.net/qq_43279579/article/details/114950002
https://blog.csdn.net/qq_43279579/article/details/114955059

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值