python ndarray find_python – 查找一组索引,将一个NumPy ndarray的行映射到另一个

最新推荐文章于 2024-04-26 00:11:55 发布

weixin_39866741

最新推荐文章于 2024-04-26 00:11:55 发布

阅读量448

点赞数

文章标签： python ndarray find

我有两个结构化的2D numpy数组,原则上是相同的,意思是

A = numpy.array([[a1,b1,c1],

[a2,b2,c2],

[a3,b3,c3],

[a4,b4,c4]])

B = numpy.array([[a2,b2,c2],

[a4,b4,c4],

[a3,b3,c3],

[a1,b1,c1]])

不是在意义上

numpy.array_equal(A,B) # False

numpy.array_equiv(A,B) # False

numpy.equal(A,B) # ndarray of True and False

但是从一个阵列(A)是原始阵列而在另一个阵列(B)中,数据沿着一个轴(可以沿着行或列)进行混洗.

什么是对B进行排序/混洗以匹配或变为等于A的有效方式,或者将A排序为等于B？只要两个数组都被混洗以相互匹配,相等检查确实不重要. A和因此B具有唯一的行.

我尝试使用view方法对这两个数组进行排序

def sort2d(A):

A_view = np.ascontiguousarray(A).view(np.dtype((np.void,

A.dtype.itemsize * A.shape[1])))

A_view.sort()

return A_view.view(A.dtype).reshape(-1,A.shape[1])

但这显然不适用于此.需要对非常大的阵列执行此操作,因此性能和可伸缩性至关重要.

解决方法:

根据您的示例,您似乎已同时对所有列进行了混洗,因此存在一个映射A→B的行索引向量.这是一个玩具示例：

A = np.random.permutation(12).reshape(4, 3)

idx = np.random.permutation(4)

B = A[idx]

print(repr(A))

# array([[ 7, 11, 6],

# [ 4, 10, 8],

# [ 9, 2, 0],

# [ 1, 3, 5]])

print(repr(B))

# array([[ 1, 3, 5],

# [ 4, 10, 8],

# [ 7, 11, 6],

# [ 9, 2, 0]])

我们想要恢复一组索引idx,这样A [idx] == B.当且仅当A和B不包含重复行时,这将是唯一的映射.

一种有效的方法是找到将在A中对行进行词法排序的索引,然后找到B中的每一行落在A的排序版本中的位置.A useful trick是使用np.void将A和B视为1D数组将每行视为单个元素的dtype：

rowtype = np.dtype((np.void, A.dtype.itemsize * A.size / A.shape[0]))

# A and B must be C-contiguous, might need to force a copy here

a = np.ascontiguousarray(A).view(rowtype).ravel()

b = np.ascontiguousarray(B).view(rowtype).ravel()

a_to_as = np.argsort(a) # indices that sort the rows of A in lexical order

现在我们可以使用np.searchsorted来执行二进制搜索,其中B中的每一行都落在A的排序版本中：

# using the `sorter=` argument rather than `a[a_to_as]` avoids making a copy of `a`

as_to_b = a.searchsorted(b, sorter=a_to_as)

A→B的映射可以表示为A→As→B的复合

a_to_b = a_to_as.take(as_to_b)

print(np.all(A[a_to_b] == B))

# True

如果A和B不包含重复行,则也可以使用B→A获得逆映射

b_to_a = np.argsort(a_to_b)

print(np.all(B[b_to_a] == A))

# True

作为单一功能：

def find_row_mapping(A, B):

"""

Given A and B, where B is a copy of A permuted over the first dimension, find

a set of indices idx such that A[idx] == B.

This is a unique mapping if and only if there are no repeated rows in A and B.

Arguments:

A, B: n-dimensional arrays with same shape and dtype

Returns:

idx: vector of indices into the rows of A

"""

if not (A.shape == B.shape):

raise ValueError('A and B must have the same shape')

if not (A.dtype == B.dtype):

raise TypeError('A and B must have the same dtype')

rowtype = np.dtype((np.void, A.dtype.itemsize * A.size / A.shape[0]))

a = np.ascontiguousarray(A).view(rowtype).ravel()

b = np.ascontiguousarray(B).view(rowtype).ravel()

a_to_as = np.argsort(a)

as_to_b = a.searchsorted(b, sorter=a_to_as)

return a_to_as.take(as_to_b)

基准测试：

In [1]: gen = np.random.RandomState(0)

In [2]: %%timeit A = gen.rand(1000000, 100); B = A.copy(); gen.shuffle(B)

....: find_row_mapping(A, B)

1 loop, best of 3: 2.76 s per loop

*最昂贵的步骤是行上的快速排序,平均为O(n log n).我不确定有可能比这更好.

标签：python,algorithm,sorting,mapping,numpy

weixin_39866741

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python ndarray find_python – 查找一组索引,将一个NumPy ndarray的行映射到另一个

我有两个结构化的2D numpy数组,原则上是相同的,意思是A = numpy.array([[a1,b1,c1],[a2,b2,c2],[a3,b3,c3],[a4,b4,c4]])B = numpy.array([[a2,b2,c2],[a4,b4,c4],[a3,b3,c3],[a1,b1,c1]])不是在意义上numpy.array_equal(A,B) # Falsenumpy.arra...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。