1.离散特征的独热编码
先按照示例代码过一遍,然后完成下列题目
现在在py文件中 一次性处理data数据中所有的连续变量和离散变量
- 读取data数据
-
import pandas as pd data = pd.read_csv('data.csv') data.columns # 看列 for discrete_features in data.columns: if data[discrete_features].dtype =='object': print(discrete_features)
对离散变量进行one-hot编码
data = pd.get_dummies(data,columns=['Home Ownership'])
data.columns
# 读取数据
data = pd.read_csv('data.csv')
# 找到离散变量
discrete_lists=[] # 存放离散变量
for discrete_features in data.columns:
if data[discrete_features].dtype == 'object':
discrete_lists.append(discrete_features)
data = pd.get_dummies(data,columns= discrete_lists,drop_first = True)
data.columns
对独热编码后的变量转化为int类型
data2 = pd.read_csv('data.csv')
list_final = []
for i in data.columns:
if i not in data2.columns:
list_final.append(i)
list_final
# 类型转换
for i in list_final:
data[i] = data[i].astype(int)
data.head()
对所有缺失值进行填充
for i in data.columns:
if data[i].isnull().sum()>0:
mean_value = data[i].mean()
data[i].fillna(mean_value,inplace = True)
data.isnull().sum()
@浙大疏锦行