Python提取数据集中的数值变量/分类变量

最新推荐文章于 2024-05-19 20:24:18 发布

Avasla

最新推荐文章于 2024-05-19 20:24:18 发布

阅读量3.6k

点赞数 1

分类专栏： Python BUG 文章标签：数据分析 python

本文链接：https://blog.csdn.net/WHYbeHERE/article/details/108553799

版权

Python 同时被 2 个专栏收录

60 篇文章 14 订阅

订阅专栏

BUG

26 篇文章 0 订阅

订阅专栏

问题描述：

如何快速提取出数据集里的数值型变量或者分类变量？比如提取出所有float格式的数据。

解决方法：

#找出不是数值类型的列
categorical_features_indices=np.where(x.dtypes != np.float)[0]
#提取分类变量
categorical_features=x.iloc[categorical_features_indices]
#提取数值型变量，直接将分类变量删除
numeric_variable=x.drop(x.columns[categorical_features_indices],axis=1)

注意！ 是根据变量格式float提取。如果是数字但是格式是int，依旧会被归类为分类变量。

完整代码：

例如，需要提取泰坦尼克中的数值变量/分类变量，不想要做数据类型转换，直接提取数据作为练习。

#导入数据集
x=pd.read_csv('train.csv')
x.dropna(inplace=True) #因为y从x导出，所以需要先删除缺失值
y=x.pop('Survived')

#查看数据类型
x.dtypes

在这里插入图片描述

#找出不是数值类型的列
categorical_features_indices=np.where(x.dtypes != np.float)[0]
#查看结果
categorical_features_indices

结果输出分类变量所在的位置：array([ 0, 1, 2, 3, 5, 6, 7, 9, 10])

#提取分类变量
categorical_features=x.iloc[categorical_features_indices]
#提取数值型变量，直接将分类变量删除
numeric_variable=x.drop(x.columns[categorical_features_indices],axis=1)
#查看结果
numeric_variable