线性回归案例心得

最新推荐文章于 2023-03-27 19:41:13 发布

Hugo Zhong

最新推荐文章于 2023-03-27 19:41:13 发布

阅读量796

点赞数

文章标签： sklearn 机器学习 python

本文链接：https://blog.csdn.net/Hugo_Knight/article/details/120288564

版权

机器学习一般步骤

1、导入数据集
2、查看数据集信息，处理缺失值
3、特征工程
4、特征非数值类型的编码，特征缩放
5、划分训练集和测试集，或训练集、验证集和测试集
6、选择模型
7、模型评估

线性回归案例

1、数据集信息：在这里插入图片描述
2、数据预处理

缺失值为0个
#查看缺失值 data.isnull().sum()
对非数值类型数据使用labelEncoder

#这里对全部类型都使用了LabelEncoder编码，除了需预测值charges

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
firstdata = pd.DataFrame()
for col in data.columns:
    if col == 'charges':
        firstdata[col] = data[col]
        continue
    firstdata[col]=le.fit_transform(data[col])
firstdata.head()

划分数据集，使用sklearn的train_test_split()函数

from sklearn.model_selection import train_test_split

# 划分自变量, 因变量
X = firstdata.iloc[:,:-1]
y = firstdata.iloc[:,-1]
# 划分测试集和训练集
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size=0.2, random_state = 42)

模型训练即评估

from sklearn.linear_model import LinearRegression
linear_model=LinearRegression()
linear_model.fit(x_train,y_train)
from sklearn.metrics import r2_score
y_pred=linear_model.predict(x_test)
print(r2_score(y_test,y_pred))

问题：
本次案例问题，之后尝试使用one_hot Encoder之后结果一致。但发现如果对目标值charges使用了labelEncoder之后，效果会提高，使得r2_score增加，不知道为何且是否合理

Hugo Zhong

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
线性回归案例心得

机器学习一般步骤1、导入数据集2、查看数据集信息，处理缺失值3、特征工程4、特征非数值类型的编码，特征缩放5、划分训练集和测试集，或训练集、验证集和测试集6、选择模型7、模型评估线性回归案例1、数据集信息：2、数据预处理缺失值为0个#查看缺失值 data.isnull().sum()对非数值类型数据使用labelEncoder#这里对全部类型都使用了LabelEncoder编码，除了需预测值chargesfrom sklearn.preprocessing import L
复制链接

扫一扫