sklearn对父母子女身高数据集做线性分析

最新推荐文章于 2023-02-15 09:00:13 发布

在下摸鱼怪

最新推荐文章于 2023-02-15 09:00:13 发布

阅读量1.8k

点赞数

分类专栏：机器学习文章标签：机器学习 python 线性分析

本文链接：https://blog.csdn.net/xwmrqqq/article/details/115183695

版权

机器学习专栏收录该内容

7 篇文章

订阅专栏

该博客通过Python的sklearn库对父亲-儿子和母亲-儿子的身高数据进行线性回归分析，首先进行了数据预处理，然后使用线性回归模型进行建模，并绘制了散点图及拟合直线，最后计算并展示了决定系数(R²)。实验结果显示了父子、母子身高之间的关系。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章目录

一、实验说明

实验环境

Anaconda + python3.6 + jupyter

实验说明

分别对 “父亲-儿子” 和 “母亲-儿子” 的身高数据做线性分析

数据集和源码

链接：https://pan.baidu.com/s/1qVCv03xuWp9OtIGmCGKmTA
提取码：b3mb

二、数据预处理

原始数据格式如下：

在这里插入图片描述

第一步，用excel筛选出子女性别为M的数据。

第二步，去掉家庭编号重复的数据。

结果如下：

在这里插入图片描述

三、sklearn 线性回归

1. “父亲-儿子” 线性回归分析

① 导入数据

import csv
import numpy as np
father = [] # 存放父亲身高
mother = [] # 存放母亲身高
son = [] # 存放儿子身高
# 数据读取
with open("Pretreatment.csv", 'r') as file:
    reader = csv.reader(file, dialect='excel')
    i = 0
    for row in reader:
        # 去掉第一行数据
        if i == 0:
            i = 1
            continue
        father.append(float(row[1]))
        mother.append(float(row[2]))
        son.append(float(row[4]))
x_father = np.array(father).reshape(-1, 1)
x_mother = np.array(mother).reshape(-1, 1)
y_son = np.array(son).reshape(-1, 1)

② sklearn 线性回归分析

from sklearn import linear_model
#创建线性回归对象
linear_regressor = linear_model.LinearRegression()
#用训练数据集训练模型,向fit方法中提供输入数据即可
linear_regressor.fit(x_father,y_son)

③ 绘图

import matplotlib.pyplot as plt
# 绘制散点
plt.figure("Linear Regression", facecolor="lightgray")
plt.title("Linear Regression", fontsize=16)
plt.grid(linestyle=":")
plt.scatter(x_father, y_son, s=70, color="dodgerblue", label="samples")
# 绘制拟合直线
prd_y = linear_regressor.predict(x_father)
plt.plot(x_father, prd_y, color="orangered", label="Predict")
plt.legend()
plt.tight_layout()
plt.show()

在这里插入图片描述

④ R-square(决定系数)

import sklearn.metrics as sm
print("权重：",linear_regressor.coef_)
print("截距：",linear_regressor.intercept_)
print("R2得分:", sm.r2_score(y_son, prd_y))

在这里插入图片描述

一些常用的评价指标具体可参考：回归——回归预测的评价指标

2. “母亲-儿子” 线性回归分析

步骤与上述步骤一致

linear_regressor.fit(x_mother,y_son)

# 绘制散点
plt.figure("Linear Regression", facecolor="lightgray")
plt.title("Linear Regression", fontsize=16)
plt.grid(linestyle=":")
plt.scatter(x_mother, y_son, s=70, color="dodgerblue", label="samples")
# 绘制拟合直线
prd_y = linear_regressor.predict(x_mother)
plt.plot(x_mother, prd_y, color="orangered", label="Predict")
plt.legend()
plt.tight_layout()
plt.show()
print("权重：",linear_regressor.coef_)
print("截距：",linear_regressor.intercept_)
print("R2得分:", sm.r2_score(y_son, prd_y))