● 字典的简单介绍
● 标签编码
● 连续特征的处理:归一化和标准化
1.字典
映射
字典的键值对,键是唯一的,值可以重复
数据的特征是固定的,但是值可以变化
dict={'name': 'Eric','age':'22','city':'Hangzhou'}
dict
# 访问字典中的值
dict['name']
2.标签编码
import pandas as pd
data=pd.read_csv('data.csv')
data.head(10)
data["Home Ownership"].value_counts()
mapping={
"Own Home":0
"Rent":1
"Have Mortgage":2
"Home Mortgage":3
}
data["Home Ownership"].head()
data["Home Ownership"]=data["Home Ownership"].map(mapping)
data["Home Ownership"].head()
data["Term"].value_counts()
mapping={
"Short Term":1,
"Long Term":0
}
data["Term"].head()
data["Term"]=data["Term"].map(mapping)
data["Term"].head()
*可合起来运算
3.连续特征处理
归一化
#定义函数
def manual_normalize(data):
min_val=data.min()
max_val=data.max()
normalized_data=(data-min_val)/(max_val-min_val)
return normalized_data
data['Annual Income']=manual_normalize(data['Annual Income'])
data[''Annual Income ].head()
#用sklearn归一化
from sklearn.preprocessing import StandardScaler,MinMaxScaler
data=pd.read_csv("data.csv")
min_max_scaler=MinMaxScaler()
data['Annual Income']=min_max_scaler.fit_transform(data[['Annual Income']])
data['Annual Income'].head()
标准化
data=pd.read_csv("data.csv")
scaler=StandardScaler()
data['Annual Income']=scaler.fit_transform(data[['Annual Income']])
data['Annual Income'].head()