python高级数据筛选,高效的数据筛选以获取唯一值（Python）

最新推荐文章于 2023-06-04 04:29:09 发布

weixin_39955149

最新推荐文章于 2023-06-04 04:29:09 发布

阅读量225

点赞数

文章标签： python高级数据筛选

NumPy 数组处理分组求和二维数组数据操作

关键词由CSDN通过智能技术生成

I have a 2D Numpy array that consists of (X,Y,Z,A) values, where (X,Y,Z) are Cartesian coordinates in 3D space, and A is some value at that location. As an example..

__X__|__Y__|__Z__|__A_

13 | 7 | 21 | 1.5

9 | 2 | 7 | 0.5

15 | 3 | 9 | 1.1

13 | 7 | 21 | 0.9

13 | 7 | 21 | 1.7

15 | 3 | 9 | 1.1

Is there an efficient way to find all the unique combinations of (X,Y), and add their values? For example, the total for (13,7) would be (1.5+0.9+1.7), or 4.1.

解决方案

Approach #1

Get each row as a view, thus converting each into a scalar each and then use np.unique to tag each row as a minimum scalar starting from (0......n), withnas no. of unique scalars based on the uniqueness among others and finally usenp.bincount` to perform the summing of the last column based on the unique scalars obtained earlier.

Here's the implementation -

def get_row_view(a):

void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))

a = np.ascontiguousarray(a)

return a.reshape(a.shape[0], -1).view(void_dt).ravel()

def groupby_cols_view(x):

a = x[:,:2].astype(int)

a1D = get_row_view(a)

_, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)

return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]

Approach #2

Same as approach #1, but instead of working with the view, we will generate equivalent linear index equivalent for each row and thus reducing each row to a scalar. Rest of the workflow is same as with the first approach.

The implementation -

def groupby_cols_linearindex(x):

a = x[:,:2].astype(int)

a1D = a[:,0] + a[:,1]*(a[:,0].max() - a[:,1].min() + 1)

_, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)

return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]

Sample runs

In [80]: data

Out[80]:

array([[ 2. , 5. , 1. , 0.40756048],

[ 3. , 4. , 6. , 0.78945661],

[ 1. , 3. , 0. , 0.03943097],

[ 2. , 5. , 7. , 0.43663582],

[ 4. , 5. , 0. , 0.14919507],

[ 1. , 3. , 3. , 0.03680583],

[ 1. , 4. , 8. , 0.36504428],

[ 3. , 4. , 2. , 0.8598825 ]])

In [81]: groupby_cols_view(data)

Out[81]:

array([[ 1. , 3. , 0.0762368 ],

[ 1. , 4. , 0.36504428],

[ 2. , 5. , 0.8441963 ],

[ 3. , 4. , 1.64933911],

[ 4. , 5. , 0.14919507]])

In [82]: groupby_cols_linearindex(data)

Out[82]: