1、pd.cut()
用于将数据值按照值本身进行分段并排序到 bins 中。
参数包含:x, bins, right, include_lowest, labels, retbins, precision
- x :被划分的数组
- bins :被划分的区间/区间数
- ① 当 bins 为整数时,表示数组 x 被划分为多少个等间距的区间;
- ② 当 bins 为序列时,表示数组 x 将被划分在该指定序列中,若不在则输出 NaN;
# x = [1,2,3,5,3,4,1], bins = 3
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),3)
[Out] [(0.996, 2.333], (0.996, 2.333], (2.333, 3.667], (3.667, 5.0], (2.333, 3.667], (3.667, 5.0], (0.996, 2.333]]
Categories (3, interval[float64]): [(0.996, 2.333] < (2.333, 3.667] < (3.667, 5.0]]
# x = [1,2,3,5,3,4,1], bins = [1,2,3]
[In ] pd.cut(np.array([1,2,3,5,3,4,1]),[1,2,3])
[Out] [NaN, (1.0, 2.0], (2.0, 3.0], NaN, (2.0, 3.0], NaN, NaN]
Categories (2, interval[int64]): [(1, 2] < (2, 3]]
- right :是否包含右端点,默认为 True;
- include_lowest :是否包含左端点,默认为