numpy , pandas 划分bins

最新推荐文章于 2024-06-06 09:15:00 发布

华间一壶酒

最新推荐文章于 2024-06-06 09:15:00 发布

阅读量6.7k

点赞数

分类专栏： numpy/pandas/matplotlib

本文链接：https://blog.csdn.net/qq_24846511/article/details/109823056

版权

numpy/pandas/matplotlib 专栏收录该内容

15 篇文章

订阅专栏

numpy 中划分bins，并计算一个bin内的均值

import numpy
data = np.array([range(100)])
bins = numpy.linspace(0, 50, 10)
bins=np.append(bins,np.inf)#最后一个bin到无穷大
digitized = numpy.digitize(data, bins)#Return the indices of the bins to which each value in input array belongs.
# 计算bin内均值法一
bin_means = [data[digitized == i].mean() for i in range(1, len(bins))]
#法二
bin_means1 = (numpy.histogram(data, bins, weights=data)[0] /
             numpy.histogram(data, bins)[0])
# https://stackoverflow.com/questions/6163334/binning-data-in-python-with-scipy-numpy

如果numpy.digitize(data, bins)中，data，超过bins的边缘，那么函数会自动在bins边缘加一个bin，如：

data=np.array([-1,0.5,1.5,2.5,3.5,4.5,5,6])
bins=np.linspace(0,5,6)
print(bins)
di=np.digitize(data,bins)
dt=np.c_[data,di]
print(dt)
'''
[0. 1. 2. 3. 4. 5.]
[[-1.   0. ]
 [ 0.5  1. ]
 [ 1.5  2. ]
 [ 2.5  3. ]
 [ 3.5  4. ]
 [ 4.5  5. ]
 [ 5.   6. ]
 [ 6.   6. ]]
 '''

解释下法二，
numpy.histogram(a, bins=10, range=None, normed=None, weights=None, density=None)

Returns
– histarray
The values of the histogram. See density and weights for a description of the possible semantics.
– bin_edges array of dtype float
Return the bin edges (length(hist)+1).
Parameters
– weights array_like, optional
An array of weights, of the same shape as a. Each value in a only contributes its associated weight towards the bin count (instead of 1).

举例说明这里怎么计算均值，一个bin里包括[1,2,3,4],那么
$n u m p y . h i s t o g r a m (d a t a, b i n s, w e i g h t s = d a t a) [0] / n u m p y . h i s t o g r a m (d a t a, b i n s) [0] = (1 * 1 + 2 * 1 + 3 * 1 + 4 * 1) / 4 = 2.5$

pandas 划分bins

a=pd.DataFrame(np.random.rand(10,1),columns=['A'])
a['A_cat']=pd.cut(a['A'],bins=np.linspace(0,1,5),labels=[1,2,3,4])

显然labels应该比bins多一个。
参考：

Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
https://stackoverflow.com/questions/6163334/binning-data-in-python-with-scipy-numpy