pandas的cut()和qcut()的区别
-
相同点:都是对连续性数据通过分桶实现离散化
-
不同点:
- cut():先划分成等宽的桶,然后将数据填充到所属的桶中,导致每个桶中数据的个数有多有少;
- 而qcut():首先对数据进行排序,然后等宽分桶,每个桶内的数据量一样多
代码如下:
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
from sklearn import linear_model
_arr = np.random.randint(0,100,20)
_arr1 = np.sort(_arr)
print(_arr1)
cut_factor = pd.cut(_arr, 5)
qcut_factor = pd.qcut(_arr,5)
print("cut_rs:","*"*30, "\n",cut_factor.value_counts())
print("qcut_rs:","*"*30, "\n", qcut_factor.value_counts())
----------------------result-----------------
[ 9 13 26 30 32 49 51 55 61 63 63 66 71 77 82 87 88 92 96 96]
cut_rs: ******************************
(8.913, 26.4] 3
(26.4, 43.8] 2
(43.8, 61.2] 4
(61.2, 78.6] 5
(78.6, 96.0] 6
dtype: int64
qcut_rs: ******************************
(8.999, 31.6] 4
(31.6, 58.6] 4
(58.6, 68.0] 4
(68.0, 87.2] 4
(87.2, 96.0] 4
dtype: int64