数据来源:https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data —— stage_2_train.csv
API
- groupy
- unstack
- value_counts
csv data
查看 label
label = traindf.Label.values
label
array([0, 0, 0, …, 0, 0, 0])
分解ID 和subtype
traindf = traindf.ID.str.rsplit("_", n=1, expand=True)
traindf.head()
合并label
traindf.loc[:, "label"] = label
traindf.head()
0,1 列名重命名
traindf = traindf.rename({0: "id", 1: "subtype"}, axis=1)
traindf.head()
统计标签类别数
subtype_counts = traindf.groupby("subtype").label.value_counts()
unstack
subtype_counts.unstack()