NumPy之四：高级索引和索引技巧

最新推荐文章于 2024-08-17 12:13:26 发布

RisingSunny

最新推荐文章于 2024-08-17 12:13:26 发布

阅读量3.2w

点赞数 28

分类专栏： Python 科学计算文章标签： numpy 科学计算

本文链接：https://blog.csdn.net/wangwenzhi276/article/details/53436694

版权

Python 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

科学计算

4 篇文章 0 订阅

订阅专栏

1. 使用索引数组进行索引

>>> a = np.arange(12)**2                       # the first 12 square numbers
>>> i = np.array( [ 1,1,3,8,5 ] )              # an array of indices
>>> a[i]                                       # the elements of a at the positions i
array([ 1,  1,  9, 64, 25])
>>>
>>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] )      # a bidimensional array of indices
>>> a[j]                                       # 生成的数组形状和j一样
array([[ 9, 16],
       [81, 49]])

如果被索引的数组a是多维的，那么索引数组将引用数组a的第一维。

>>> palette = np.array( [ [0,0,0],                # black
...                       [255,0,0],              # red
...                       [0,255,0],              # green
...                       [0,0,255],              # blue
...                       [255,255,255] ] )       # white
>>> image = np.array( [ [ 0, 1, 2, 0 ],           # each value corresponds to a color in the palette
...                     [ 0, 3, 4, 0 ]  ] )
>>> palette[image]                            # the (2,4,3) color image
array([[[  0,   0,   0],
        [255,   0,   0],
        [  0, 255,   0],
        [  0,   0,   0]],
       [[  0,   0,   0],
        [  0,   0, 255],
        [255, 255, 255],
        [  0,   0,   0]]])

也可以给出多于1维的索引。针对每个维的索引数组必须形状相同。

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> i = np.array( [ [0,1],                      # 第一个轴的索引
...                 [1,2] ] )
>>> j = np.array( [ [2,1],                        # 第二个轴的索引
...                 [3,3] ] )
>>>
>>> a[i,j]                                     # i 和 j形状必须相同
array([[ 2,  5],
       [ 7, 11]])

解释一下这个过程：a[i,j]的机制是数组i和数组j相同位置的对应数字两两组成一对索引，然后用这对索引在数组a中进行取值。比如数组i的索引(0,0)处的值为0，数组j的索引(0,0)处的值为2，它们组成的索引对是(0,2)，在数组a中对应的值是2。在这样的机制下，理所当然要求数组i和数组j需要有相同的形状，否则将无法取得相应的索引对。又因为数组i和数组j分别是数组a的两个轴(axis)上的索引数组，所以最终的结果也就和数组i/j的形状相同。

>>> a[i,2]
array([[ 2,  6],
       [ 6, 10]])

上面的过程是：数组i是数组a第一个轴的索引数组，a[i,2]中的数字2表示数组a的第二个轴的索引，数组i中的每个数字都与2组成索引对，也就是([ [(0,2), (1,2)], [(1,2),(2,2)] ])，然后依据这些索引对和相应的形状取数组a中的值。

>>> a[:,j]                                  
array([[[ 2,  1],
        [ 3,  3]],
       [[ 6,  5],
        [ 7,  7]],
       [[10,  9],
        [11, 11]]])

上面的过程是：对数组a第一个轴进行完整切片，得到(0,1,2)，然后每个值都与数组j中的元素两两组成索引对，也就是组成3个二维索引对，然后根据索引对取数组a中的值。

自然，我们也可以将i和j放入一个序列(比如一个列表)中，然后用这个序列进行索引。

>>> l = [i,j]
>>> a[l]                                       # 等价于 a[i,j]
array([[ 2,  5],
       [ 7, 11]])

但是，我们不能把i和j组成大数组后再去对数组a进行索引，因为根据前面的内容，我们知道，用1个索引数组对另一个数据索引时，索引数组中的元素都被解释成数组a第一维的索引。

>>> s = np.array( [i,j] )
>>> s
array([[[0, 1],
        [1, 2]],

       [[2, 1],
        [3, 3]]])
>>> s.shape
(2, 2, 2)
>>> a[s]                                       
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IndexError: index (3) out of range (0<=index<=2) in dimension 0

上面的错误信息很明确地指出：数组a的第一维索引最大为2，而数组s中出现了3，超出了索引范围。也就是说，出错的根本原因是索引超出了范围，而不是a[s]语法本身有问题。可以自己试验来验证。

>>> a[tuple(s)]                                # 等价于a[i,j]
array([[ 2,  5],
       [ 7, 11]])

可以利用数组索引对数组赋值。

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])

但是，如果索引列表有重复值，赋值的话也会多次赋值，以最后一次赋值为准：

>>> a = np.arange(5)
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])

看起来很合理，但要小心，如果你想要使用Python的+=运算，结果可能大出所料：

>>> a = np.arange(5)
>>> a[[0,0,2]]+=1
>>> a
array([1, 1, 3, 3, 4])

尽管索引列表中0出现了2次，0号元素却只增加了1。

2. 使用布尔值数组进行索引

使用布尔索引最自然的方式是布尔值数组与原数组有相同的形状:

>>> a = np.arange(12).reshape(3,4)
>>> b = a > 4
>>> b                                          # b is a boolean with a's shape
array([[False, False, False, False],
       [False,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)
>>> a[b]                                       # 选中的元素组成一维数组
array([ 5,  6,  7,  8,  9, 10, 11])

这个性质很适合用来给元素重新赋值：

>>> a[b] = 0                                   # All elements of 'a' higher than 4 become 0
>>> a
array([[0, 1, 2, 3],
       [4, 0, 0, 0],
       [0, 0, 0, 0]])

使用布尔索引的第二种方式比较类似于整数索引；对数组每一维，我们提供一维的布尔数组来选择我们想要的值。

>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True])             # first dim selection
>>> b2 = np.array([True,False,True,False])       # second dim selection
>>>
>>> a[b1,:]                                   # 选择第2、3行的所有列
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> a[b1]                                     # same thing
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>>
>>> a[:,b2]                                   # selecting columns
array([[ 0,  2],
       [ 4,  6],
       [ 8, 10]])
>>>
>>> a[b1,b2]                                  # 奇怪的事发生了
array([ 4, 10])

**注意一维布尔数组的长度必须和你要切片的维(或axis)的长度相同。在上面的例子中，b1是长度为3的一维数组，b2是长度为4，适合索引数组a的第二维。

3. ix_()函数

ix_ 函数可以合并不同的向量来获得各个n元组的结果。
举个例子，如果你想要计算三个向量两两组合的结果a+b*c，也就是说要计算 $\sum_{i=0}(a_i+\prod_{j=0,k=0}b_j*c_k)$ ，在下面的例子中，a,b,c长度分别为4，3，5，这样算下来，最终的结果应该有60(4*3*5）个。数据量少的时候可以手工算，如果数据量大的话，ix_函数就排上用场了。

>>> a = np.array([2,3,4,5])
>>> b = np.array([8,5,4])
>>> c = np.array([5,4,6,8,3])
>>> ax,bx,cx = np.ix_(a,b,c)
>>> ax
array([[[2]],
       [[3]],
       [[4]],
       [[5]]])
>>> bx
array([[[8],
        [5],
        [4]]])
>>> cx
array([[[5, 4, 6, 8, 3]]])
>>> ax.shape, bx.shape, cx.shape
((4, 1, 1), (1, 3, 1), (1, 1, 5))
>>> result = ax+bx*cx
>>> result
array([[[42, 34, 50, 66, 26],
        [27, 22, 32, 42, 17],
        [22, 18, 26, 34, 14]],
       [[43, 35, 51, 67, 27],
        [28, 23, 33, 43, 18],
        [23, 19, 27, 35, 15]],
       [[44, 36, 52, 68, 28],
        [29, 24, 34, 44, 19],
        [24, 20, 28, 36, 16]],
       [[45, 37, 53, 69, 29],
        [30, 25, 35, 45, 20],
        [25, 21, 29, 37, 17]]])
>>> result[3,2,4]
17
>>> a[3]+b[2]*c[4]
17

显然，最后的结果数组result包含了所有可能的数值，且位置和原数组一一对应，比如a[2]+b[0]*c[4]正是result[2,0,4]。

还可以像下面一样来执行同样的功能：

>>> def ufunc_reduce(ufct, *vectors):
...    vs = np.ix_(*vectors)
...    r = ufct.identity
...    for v in vs:
...        r = ufct(r,v)
...    return r
and then use it as:

>>>
>>> ufunc_reduce(np.add,a,b,c)
array([[[15, 14, 16, 18, 13],
        [12, 11, 13, 15, 10],
        [11, 10, 12, 14,  9]],
       [[16, 15, 17, 19, 14],
        [13, 12, 14, 16, 11],
        [12, 11, 13, 15, 10]],
       [[17, 16, 18, 20, 15],
        [14, 13, 15, 17, 12],
        [13, 12, 14, 16, 11]],
       [[18, 17, 19, 21, 16],
        [15, 14, 16, 18, 13],
        [14, 13, 15, 17, 12]]])