1,这是将age_interval与group_names 以dataframe的形式,对应起来;
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]
group_names = ["Youth", "YoungAdult", "MiddleAged", "Senior"]
bins= [18,25,35,60,100]
cats = pd.cut(ages,bins,right = True)
cat_interval = list(cats.categories)
# print(cat_interval)
dict_cats = dict(zip(cat_interval,group_names))
dafr = pd.DataFrame([dict_cats])
print(dafr.T)
2,将Series或array分段切割,
# 是个数组
data_5 = np.random.rand(1000)
# quantiles 是下累计分位数
quantiles = [0, 0.1, 0.5, 0.9, 1.]
# data_5_cut = pd.qcut(data_5, 5, precision=2)
data_5_cut = pd.qcut(data_5, quantiles,precision=3)
print(data_5_cut)
u = pd.Series(data_5_cut).value_counts()
# 输出的是,根据index,每个数对应的区间
print(u)