1.读取数据
本文采用的是美国成年人收入的数据集
import pandas as pd
from IPython.display import display
data = pd.read_csv(
adult_path, header=None, index_col=False,
names=['age', 'workclass', 'fnlwgt', 'education', 'education-num',
'marital-status', 'occupation', 'relationship', 'race', 'gender',
'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
'income'])
2.检查字符串的分类数据
使用pandas Series 的value_counts函数,显示类别和出现次数
print(data.gender.value_counts())
#输出
Male 21790
Female 10771
Name: gender, dtype: int64
3.对数据进行o