最近在写代码的时候需要可视化一些特征,想了解一下某些离散特征的label分布,其横轴是特征,纵轴是频率,直方图用不同的颜色标记不同的label。我这里分享一下我的代码:
import pandas as pd
%matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="white")
sns.color_palette("deep")
sns.set_color_codes("deep")
# Initialize the matplotlib figure
fig, ax = plt.subplots(figsize=(10, 7))
plt.title("feature distribution with labels")
top15 = data[data['invest_times']<=10]
# Plot the total attack by year
p = sns.countplot(x="invest_times", data=top15, label="Total label", color="r")
g = sns.countplot(x="invest_times", data=top15[top15["label"] == 1],
label="label with success", color="b")
# Add a legend and informative axis label
ax.legend(ncol=2, loc="upper right", frameon=False)
p.set_xticklabels(p.get_xticklabels(), rotation=90)
sns.despine(left=True, bottom=True)
ax.set(ylabel="", xlabel="invest_times")
数据我就不公开了,我是用pandas读取的表格数据,直接调用代码可视化的。
参考文献
[1]. https://github.com/ibmw/Shark-Attack/blob/master/Shark%20Attack.ipynb