Python_data pre-processing

最新推荐文章于 2023-06-29 17:34:45 发布

wyxjohn205

最新推荐文章于 2023-06-29 17:34:45 发布

阅读量867

点赞数

分类专栏：记忆面包？不存在的文章标签： python

本文链接：https://blog.csdn.net/wyxjohn205/article/details/79797668

版权

本文介绍了Python数据预处理的基本步骤，包括检查数据形状、缺失值处理、数值型缺失值用pandas填充、类别特征编码为数值以及变量间相关性和二元输出偏斜的检查，这些都是构建稳健预测模型前的重要步骤。

摘要由CSDN通过智能技术生成

All of the tech are basic solving solution. In order to obtain robust predictive model, missing value imputation should be consider case by case.

如果是个小白，比如我，就可以这样非常generally clean data.

0. Check for shape, missing value and variable types in dataset

data.dtypes
data.isnull.sum()
data.info()

1. impute numeric missing value with pandas

data['column name'].fillna(0.0, inplace=True)
data['column name'].fillna('U1', inplace=True)

2. encode categorical columns with number (otherwise may cause problem in machine learning algorithms)

cleanup_nums = {
               "stage":{"IV":0,"IIB":1,"III":2,"IIA":3,"I":4},
               "side":{"both":2,"right":1,"left":0}}<

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

关注关注