numpy之searchsorted使用方法

最新推荐文章于 2024-11-29 23:45:41 发布

深山里的小白羊

最新推荐文章于 2024-11-29 23:45:41 发布

阅读量3w

点赞数 22

分类专栏：日用小技能文章标签： numpy searchsorted

本文链接：https://blog.csdn.net/qq_33757398/article/details/89876088

版权

日用小技能专栏收录该内容

73 篇文章

订阅专栏

1. 定义：

np.searchsorted(a, v, side='left', sorter=None)

在数组a中插入数组v（并不执行插入操作），返回一个下标列表，这个列表指明了v中对应元素应该插入在a中那个位置上

2. 参数

a : 1-D array_like

输入数组。当sorter参数为None的时候，a必须为升序数组；否则，sorter不能为空，存放a中元素的index，用于反映a数组的升序排列方式。

v : array_like
插入a数组的值，可以为单个元素，list或者array。

side : {'left', 'right'}, optional

查询方向：

当为left时，将返回第一个符合条件的元素下标；

当为right时，将返回最后一个符合条件的元素下标，如果没有符合的元素，将返回0或者N（a的长度）

sorter : 1-D array_like, optional

存放a数组元素的index，index对应元素为升序。

摘抄与博客：https://blog.csdn.net/qq_17753903/article/details/85165637

3. 单个元素测试

import numpy as np
 
a = np.array([0,1,5,9,11,18,26,33])
print(a)

result1 = np.searchsorted(a, 15)                      # result1 = 5, 默认side为left
print('result1:', result1)
result2 = np.searchsorted(a, 15, side='left')         # result2 = 5
print('result2:', result2)
result3 = np.searchsorted(a, 15, side='right')        # result3 = 5
print('result3:', result3)

result4 = np.searchsorted(a, -1, side='left')         # result4 = 0
print('result4:', result4)
result5 = np.searchsorted(a, -1, side='right')        # result5 = 0
print('result5:', result5)

result6 = np.searchsorted(a, 35, side='left')         # result6 = 8
print('result6:', result6)
result7 = np.searchsorted(a, 35, side='right')        # result7 = 8
print('result7:', result7)
## 这组实验说明：
## 1）searchsorted side的默认模式为left
## 2）当搜索一个元组a中不存在的元素时，side模式不管用，如果这个元素比a的最小值还小，就返回0，如果比a的最大值还大，就返回数组a的长度N
## 3）如果这个不存在的元素位于数组a的中间时，返回比它大的那个元素的位置


result8 = np.searchsorted(a, 11, side='left')         # result8 = 4
print('result8:', result8)
result9 = np.searchsorted(a, 11, side='right')        # result9 = 5
print('result9:', result9)

result10 = np.searchsorted(a, 0, side='left')         # result10 = 0
print('result10:', result10)
result11 = np.searchsorted(a, 0, side='right')        # result11 = 1
print('result11:', result11)

result12 = np.searchsorted(a, 33, side='left')        # result12 = 7
print('result12:', result12)
result13 = np.searchsorted(a, 33, side='right')       # result13 = 8
print('result13:', result13)
## 这组实验说明：
## 1）如果搜索的元素存在于数组a中，left方式和right方式的返回值是不同的
## 2）对于left，返回的是与这个元素相等的元素的位置
## 3）对于right，返回的是与这个元素相等的元素的下一个位置

4. 列表测试

列表测试的结果与单个元素的测试一样，只是返回的也是一个列表

result14 = np.searchsorted(a, [-1, 0, 11, 15, 33, 35], side='left')
print('result14:', result14)      # result14 = [0 0 4 5 7 8]
result15 = np.searchsorted(a, [-1, 0, 11, 15, 33, 35], side='right')
print('result15:', result15)      # result15 = [0 1 5 5 8 8]

5. a不是升序的情况

np.random.shuffle(a)
print('a =', a)                   # a = [ 0  5 26 33 11  9 18  1]

a_sort = np.argsort(a)
print('a_sort =', a_sort)         # a_sort = [0 7 1 5 4 6 2 3]
result16 = np.searchsorted(a, [-1, 0, 11, 15, 33, 35], side='left', sorter=a_sort)
print('result16:', result16)      # result16 = [0 0 4 5 7 8]
result17 = np.searchsorted(a, [-1, 0, 11, 15, 33, 35], side='right', sorter=a_sort)
print('result17:', result17)      # result17 = [0 1 5 5 8 8]
## 得到与上面一样的结果

6. 利用searchsorted来替换数组里面的值

import numpy as np

img = np.random.randint(0, 10, (10,10))
# 创造一个随机整形数组
print(img)
shape = img.shape
# img = np.ravel(img)
ids_img, sizes_img = np.unique(img, return_counts=True)
# 计算这个随机数组中不同的id，以及他们对应的数量
print('ids_img =', ids_img)
print('sizes_img =', sizes_img)
print('len of ids_img:', len(ids_img))

ids_dict = dict(zip(ids_img, sizes_img))
# 将id和他们的数量定义为字典
print('ids_dict =', ids_dict)

sizes_sort = sorted(ids_dict.items(), key=lambda ids_dict:ids_dict[1], reverse=True)
# 按字典的value值大小就行排序
print(sizes_sort)
sizes_sort = dict(sizes_sort[:5])
# 取出前面数量最大的5个id和数量
print(sizes_sort)

ids_need = list(sizes_sort.keys())
maxval = np.max(img)
to_vals = range(1, len(sizes_sort)+1) + maxval
# 定义这5个id想变化到的新id
print('to_vals =', to_vals)
d = dict(zip(ids_need, to_vals))

d2 = dict()
for i in ids_img:
    if i in d:
        d2[i] = d[i]
    else:
        d2[i] = maxval 
# 创建一个字典，字典的key包含img中的所有id，字典的value，如果id在之前的五个id里面
# value就是他们的value，否则为img的最大值

from_label = d2.keys()
print('from_label =', from_label)
# 所有旧id
to_label = d2.values()
print('to_label =', to_label)
# 所有新id

from_label = np.array(list(from_label))
to_label = np.array(list(to_label))
# 从字典转换为numpy
##########################################################
# 核心步骤：
sort_idx = np.argsort(from_label)
# 读取from_label数组的升序下标
print('sort_idx =', sort_idx)
idx = np.searchsorted(from_label, img, sorter = sort_idx)
# 将img插入到from_label的位置，idx的维度与img一样
print(idx)
out = to_label[sort_idx][idx]
# numpy高级索引，按sort_idx排序to_label，然后按照idx的索引将
# to_label[sort_idx]放到out里面去
##########################################################
out = out - maxval
print('out =', out)

输出：

[[0 9 3 5 4 0 9 0 5 0]
 [5 6 2 1 4 7 9 7 0 4]
 [3 3 1 3 1 4 9 9 8 8]
 [3 4 1 9 0 4 4 2 2 7]
 [8 5 0 0 6 5 2 6 9 1]
 [4 6 7 8 0 2 4 0 7 0]
 [6 6 9 5 9 2 3 2 6 9]
 [7 4 6 1 9 1 4 1 2 9]
 [7 7 5 0 3 5 4 4 1 4]
 [9 3 0 6 4 6 2 6 9 8]]
ids_img = [0 1 2 3 4 5 6 7 8 9]
sizes_img = [13  9  9  8 15  8 11  8  5 14]
len of ids_img: 10
ids_dict = {0: 13, 1: 9, 2: 9, 3: 8, 4: 15, 5: 8, 6: 11, 7: 8, 8: 5, 9: 14}
[(4, 15), (9, 14), (0, 13), (6, 11), (1, 9), (2, 9), (3, 8), (5, 8), (7, 8), (8, 5)]
{4: 15, 9: 14, 0: 13, 6: 11, 1: 9}
to_vals = [10 11 12 13 14]
from_label = dict_keys([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
to_label = dict_values([12, 14, 9, 9, 10, 9, 13, 9, 9, 11])
sort_idx = [0 1 2 3 4 5 6 7 8 9]
[[0 9 3 5 4 0 9 0 5 0]
 [5 6 2 1 4 7 9 7 0 4]
 [3 3 1 3 1 4 9 9 8 8]
 [3 4 1 9 0 4 4 2 2 7]
 [8 5 0 0 6 5 2 6 9 1]
 [4 6 7 8 0 2 4 0 7 0]
 [6 6 9 5 9 2 3 2 6 9]
 [7 4 6 1 9 1 4 1 2 9]
 [7 7 5 0 3 5 4 4 1 4]
 [9 3 0 6 4 6 2 6 9 8]]
out = [[3 2 0 0 1 3 2 3 0 3]
 [0 4 0 5 1 0 2 0 3 1]
 [0 0 5 0 5 1 2 2 0 0]
 [0 1 5 2 3 1 1 0 0 0]
 [0 0 3 3 4 0 0 4 2 5]
 [1 4 0 0 3 0 1 3 0 3]
 [4 4 2 0 2 0 0 0 4 2]
 [0 1 4 5 2 5 1 5 0 2]
 [0 0 0 3 0 0 1 1 5 1]
 [2 0 3 4 1 4 0 4 2 0]]

参考博客：https://blog.csdn.net/qq_17753903/article/details/85165637

https://blog.csdn.net/norsd/article/details/76602101