python高级数据筛选,高效的数据筛选以获取唯一值(Python)

I have a 2D Numpy array that consists of (X,Y,Z,A) values, where (X,Y,Z) are Cartesian coordinates in 3D space, and A is some value at that location. As an example..

__X__|__Y__|__Z__|__A_

13 | 7 | 21 | 1.5

9 | 2 | 7 | 0.5

15 | 3 | 9 | 1.1

13 | 7 | 21 | 0.9

13 | 7 | 21 | 1.7

15 | 3 | 9 | 1.1

Is there an efficient way to find all the unique combinations of (X,Y), and add their values? For example, the total for (13,7) would be (1.5+0.9+1.7), or 4.1.

解决方案

Approach #1

Get each row as a view, thus converting each into a scalar each and then use np.unique to tag each row as a minimum scalar starting from (0......n), withnas no. of unique scalars based on the uniqueness among others and finally usenp.bincount` to perform the summing of the last column based on the unique scalars obtained earlier.

Here's the implementation -

def get_row_view(a):

void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))

a = np.ascontiguousarray(a)

return a.reshape(a.shape[0], -1).view(void_dt).ravel()

def groupby_cols_view(x):

a = x[:,:2].astype(int)

a1D = get_row_view(a)

_, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)

return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]

Approach #2

Same as approach #1, but instead of working with the view, we will generate equivalent linear index equivalent for each row and thus reducing each row to a scalar. Rest of the workflow is same as with the first approach.

The implementation -

def groupby_cols_linearindex(x):

a = x[:,:2].astype(int)

a1D = a[:,0] + a[:,1]*(a[:,0].max() - a[:,1].min() + 1)

_, indx, IDs = np.unique(a1D, return_index=1, return_inverse=1)

return np.c_[x[indx,:2],np.bincount(IDs, x[:,-1])]

Sample runs

In [80]: data

Out[80]:

array([[ 2. , 5. , 1. , 0.40756048],

[ 3. , 4. , 6. , 0.78945661],

[ 1. , 3. , 0. , 0.03943097],

[ 2. , 5. , 7. , 0.43663582],

[ 4. , 5. , 0. , 0.14919507],

[ 1. , 3. , 3. , 0.03680583],

[ 1. , 4. , 8. , 0.36504428],

[ 3. , 4. , 2. , 0.8598825 ]])

In [81]: groupby_cols_view(data)

Out[81]:

array([[ 1. , 3. , 0.0762368 ],

[ 1. , 4. , 0.36504428],

[ 2. , 5. , 0.8441963 ],

[ 3. , 4. , 1.64933911],

[ 4. , 5. , 0.14919507]])

In [82]: groupby_cols_linearindex(data)

Out[82]:

array([[ 1. , 3. , 0.0762368 ],

[ 1. , 4. , 0.36504428],

[ 3. , 4. , 1.64933911],

[ 2. , 5. , 0.8441963 ],

[ 4. , 5. , 0.14919507]])

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值