【人工智障学习指北-002】70 道 NumPy 面试题—解题出坑记

最新推荐文章于 2024-07-23 08:55:29 发布

Zackberg

最新推荐文章于 2024-07-23 08:55:29 发布

阅读量3.1k

点赞数 4

本文链接：https://blog.csdn.net/weixin_46177681/article/details/119176955

版权

这篇博客详细介绍了NumPy的基础知识和常见操作，包括数组创建、重塑、排序、过滤、填充缺失值等。通过实例解析了多个面试题，如基于多个条件过滤数组、对数组元素排序、在多维数组中排序、寻找局部极大值等。还探讨了如何处理不规则日期序列和处理数组中的缺失值。此外，博客还涵盖了数组的交换、拼接、转换等操作，以及一题多解的实例，展示了NumPy在处理数值计算和数据处理方面的强大功能。

摘要由CSDN通过智能技术生成

基础 — Numpy 入门70题

👇题目来源
70 道 NumPy 面试题

👇答案参考
numpy70道基础训练题(解释说明)

👇自己做的答案（仅供参考+留待以后查阅）
【人工智障学习指北-003】70道 Numpy 面试题-答案

一、预储备知识点

先上一波在刷题过程中会遇到的知识点

#基础操作函数-从第20题左右开始用到
numpy.array(object, dtype=None, *, copy=True, order='K', subok=False, ndmin=0, like=None)
numpy.append(arr,values,axis)  
###values must have the same dimension with the arr

numpy.reshape(a, newshape, order='C')  newshape=intefer/tuple
#或者为a.reshape(x)   x=-1时表示扩展为一维数组  

numpy.repeat(a, repeats, axis=None) 
#e.g. np.repeat(np.array([1,2,3]),np.array([1,2,3]))
>>> array([1,2,2,3,3,3]) 
>
###paras:
repeats:int or array of ints;
axis: int. The axis along which to repeat values. 
By default, use the flattened input array, 
and return a flat output array.
###
numpy.squeeze(a, axis=None)  axis must be of length 1
numpy.expand_dims(a, axis)  axix: int or tuple
np.array*3 : array([1,2,3])*3 -> array([3,6,9])
普通array*3 : [1,2,3]*3 -> [1,2,3, 1,2,3, 1,2,3]
numpy.transpose(a, axes=None) 
#axes: tuple or list of ints,元组时按照tuple顺序转职
a = np.range(16).reshape(4,4)[y0:y1,x0:x1]
#取第0维y0-y1,第1维x0-x1
a = np.range(16).reshape(4,4)[[y0,y1],[x0, x1, x3]]
#取第0维y0和y1,第1维x0, x1, x3

#求相关系数
numpy.corrcoef(x, y=None, rowvar=True, bias=<no value>, ddof=<no value>, *, dtype=None)
#求差值
numpy.diff(a, n=1, axis=-1, prepend=<no value>, append=<no value>)
#求符号
numpy.sign(x, /, out=None, *, where=True, casting='same_kind', order='K', dtype=None, subok=True[, signature, extobj]) = <ufunc 'sign'>
#大于0的返回1；小于0的返回-1；等于0的返回0

#设置数组输出形式
np.set_printoptions(precision=None, threshold=None, edgeitems=None, 
linewidth=None, suppress=None, nanstr=None, infstr=None, formatter=None)
###paras:
precision 设置浮点数的精度 （默认值：8）
threshold 设置显示的数目（超出部分省略号显示， np.nan是完全输出，默认值：1000）
edgeitems 设置显示前几个，后几个 （默认值：3）
suppress  设置是否科学记数法显示 （默认值：False）
###

#应用函数
numpy.apply_along_axis(func1d, axis, arr, *args, **kwargs)
#沿着axis，对arr中每个元素应用函数func1d

#判断是否存在目标值
numpy.any(a, axis=None, out=None, keepdims=<no value>, *, where=<no value>)
#判断是否全是目标值
numpy.all(...)

#条件返回值：Where True, yield x, otherwise yield y.
numpy.where(condition[, x, y])
#一维条件下 <=>
[xv if c else yv for c, xv, yv in zip(condition, x, y)]

#去重/查找单一值
numpy.unique(ar, return_index=False, return_inverse=False, return_counts=False, axis=None)
#并按元素由大到小返回一个新的无元素重复的元组或者列表
#return_index: 返回“新”于“旧”中索引
#return_inverse: 返回“旧”于“新”中索引
#return_counts: 返回计数数量等
numpy.argwhere(a)
#返回a中非zero（真）的值的索引

#查找最大值
numpy.amax(a, )  #返回最大值
numpy.argmax(a, axis=None, out=None) 
#返回第一个最大值的索引Index！！
#axis: 默认将array拉平；否则沿相应维度进行

#改变形状
numpy.ravel(a, order='C')
<=>  a.ravel()
array[:,None] / array[:,:,None] -> None所在位置，数组增加一维

#any type -> 映射为数字类型
numpy.digitize(x, bins, right=False) #right表示左闭右开/左开右闭
#类型转化
numpy.astype(np.type)
numpy.asarray(a, dtype=None, order=None, *, like=None)  
#将其他形式转化为array类型，e.g. lists, lists of tuples, 
# tuples, tuples of tuples, tuples of lists and ndarrays


#转化为上下界内：上多上切，下多下切
numpy.clip(a, a_min, a_max, out=None, **kwargs)
#Given an interval, values outside the interval are clipped to the interval edges.

#搜索目标值
numpy.searchsorted(a, v, side='left', sorter=None)
#side：'left'-返回目标值索引(插前)；'right'-返回目标值右值索引(插后)

#排序
numpy.sort(a, axis=- 1, kind=None, order=None) #返回排序后的复制值
numpy.argsort(a, axis=- 1, kind=None, order=None)
#返回排序的索引
#axis: 按照哪一个维度排序；None时化为一维数组
#kind: 排序算法 {‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}

#产生随机数组
a=np.random.random((3,3))
a=np.random.randint(0,10,size=[3,3])
t3 = np.random.uniform(10, 15, (3, 4))  # 包含最小值10,不包含最大值15
#产生num个均匀分布于[start, stop)内的数字
numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)
numpy.random.uniform(start=0, stop=1, num)  #均匀分布
#随机选择
random.choice(a, size=None, replace=True, p=None)
#Generates a random sample from a given 1-D array
#repace:是否放回；p:每一个元素被选中的概率；size：输出形状

#填充
numpy.full(shape, fill_value, dtype=None, order='C', *, like=None)
#shape：int或元组；fill-value: int/array

#产生向量函数
class numpy.vectorize(pyfunc, otypes=None, doc=None, excluded=None, cache=False, signature=None)

#连接矩阵：
np.r_= <numpy.lib.index_tricks.RClass object>纵向相加；列数相等
np.c_ / numpy.hstack(tup) 横向相加；行数相等; #tup必须为tuple(元组)类型


#创建二维数组
numpy.arange([start, ]stop, [step, ]dtype=None, *, like=None)
print("普通方法")
num = [['a' for i in range(1,5)] for j in range(1,5)]
print("numpy方法")
num2 = np.array([np.arange(1,4),np.arange(1,4),np.arange(1,4)])
num3 = np.zeros((3,5),dtype=np.bool_) #bool8 & bool_ 默认都是false类型

#储存为txt文件
numpy.savetxt(fname, X, fmt='%.18e', delimiter=' ', 
newline='\n', header='', footer='', comments='# ', encoding=None)

#求某一百分位的值
np.percentile(a, q, axis, keepdims)
###paras:
a : np数组
q : float in range of [0,100] (or sequence of floats),
Percentile to compute,  要计算的q分位数。
axis : 那个轴上运算。
keepdims :bool是否保持维度不变。
###

np.nan——非空(None);np.nan==np.nan >>> False
numpy.newaxis——新增一维；e.g. x[:, np.newaxis, np.newaxis]

#关于array的大小
c=[[]]
>>> len(c)
>>> 1
>>> len(c[0])
>>> 0

#运算：
array +/-/* array  <=>  按位计算（数值计算）
int(a) == int(b)  -> 返回bool
array(a) == int(b) -> 为a中每个元素计算“a[i]==b”,并返回bool列表
np.dot(x, y) <=> x.dot(y) -> 矩阵点乘（数积）


'''
#log日志
l1 = [1,2,3,4]
l2 = [1,2,3,4,5]
print([x+y for x in l1 for y in l2])
>>> [2, 3, 4, 5, 6,| 3, 4, 5, 6, 7,| 4, 5, 6, 7, 8, 5, 6, 7, 8, 9]

print([[x+y for x in l1] for y in l2])
>>> [[2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7], [5, 6, 7, 8], [6, 7, 8, 9]]

'''
#关于【列表推导式】的理解
#1.遍历列表中的每一个元素，并进行相应的计算操作；返回一个列表
#2.简洁直观，书写便利
#3.提高运行速度

#多重循环：
[x+y for x in l1 for y in l2] 从左到右 <=> 由外而内
<=> for x in l1:
        for y in l2:
            ...
            
（嵌套式）
[[x+y for x in l1] for y in l2] 从内而外 <=> 从内而外 | []改变优先级
<=> for y in l2:
        for x in l1:
            ...

二、未理解题目

（1）034 如何基于两个或以上条件过滤 NumPy 数组？

难度：L3

问题：过滤 iris_2d 中满足 petallength（第三列）> 1.5 和 sepallength（第一列）< 5.0 的行。

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])

问题详情：
#way1: 用 i in range(len(iris_2d)) 索引遍历的方法
#np.append(a, value) 发生未知错误

出坑方式：
#way2: & 按位与运算符——全真为真，并且可以计算array数据类型

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
condition = (iris_2d[:,2] > 1.5) & (iris_2d[:,0] < 5.0)
ret = iris_2d[condition]
print(ret)

（2）054 如何使用 NumPy 对数组中的项进行排序？

难度：L2

问题：为给定的数值数组 a 创建排序。

输入：

np.random.seed(10)
a = np.random.randint(20, size=10)print(a)
#> [ 9 4 15 0 17 16 17 8 9 0]

期望输出：

[4 2 6 0 8 7 9 3 5 1]

我的方法
#way1: 正常思路

b = np.argsort(a)
print(a[b])

#way2: ？？？？

#######################没看懂在干嘛##################
c = a.argsort().argsort()
print(c)

（3）055 如何使用 NumPy 对多维数组中的项进行排序？

难度：L3

问题：给出一个数值数组 a，创建一个形态相同的排序数组。

输入：

np.random.seed(10)
a = np.random.randint(20, size=[2,5])print(a)#> [[ 9 4 15 0 17]#> [16 17 8 9 0]]

期望输出：

#> [[4 2 6 0 8]
#> [7 9 3 5 1]]

#way1: 基本想法

shape = a.shape
print(shape)
a = np.sort(a.reshape(1,-1))
a = a.reshape(shape)
print(a)

#way2: ???没看懂为何排序排两遍

#######################没看懂在干嘛##################
print(a.ravel().argsort().argsort().reshape(a.shape))

这道题和 #054 都是没理解他的“排序”到底是什么意思

（4）063 如何在一个 1 维数组中找到所有的局部极大值（peak）？

难度：L4

问题：在 1 维数组 a 中找到所有的 peak，peak 指一个数字比两侧的数字都大。

输入：

a = np.array([1, 3, 7, 1, 2, 6, 0, 1])

期望输出：

#> array([2, 5])

自己没想到比较快的做法，直接看了答案

#######################做法清奇##########################
a = np.array([1, 3, 7, 1, 2, 6, 0, 1])
b = np.diff(np.sign(np.diff(a)))
peak = np.where(b == -2)[0] + 1
print(peak)

不是很理解“ b == -2 ”；是不是可以改成“ np.where(b == 2)[0] - 1” ？

（5）069 如何在不规则 NumPy 日期序列中填充缺失日期？

难度：L3

问题：给定一个非连续日期序列的数组，通过填充缺失的日期，使其变成连续的日期序列。

输入：

# Input
dates = np.arange(np.datetime64( 2018-02-01 ), np.datetime64( 2018-02-25 ), 2)
print(dates)
#> [ 2018-02-01 2018-02-03 2018-02-05 2018-02-07 2018-02-09
#> 2018-02-11 2018-02-13 2018-02-15 2018-02-17 2018-02-19
#> 2018-02-21 2018-02-23 ]

依然是自己太菜没想到怎么做，直接看了答案

##################不是很懂：为何要np.hstack??################
dates = np.arange(np.datetime64('2018-02-01'), np.datetime64('2018-02-25'), 2)
print(dates)
filled = np.array([np.arange(date, (date+d)) for date, d in zip(dates, np.diff(dates))]).reshape(-1)
output = np.hstack([filled,dates[-1]])
print(output)

#问题：np.hstack()函数的作用是什么？
#解答：
在这里插入图片描述
对比了一下输出，发现 output 比 filled 多了一个 “2018-02-23”
应该是防止有漏，保护性接上？（相同的自动不接了？）

二、一些坑坑

（1）016 如何在 2d NumPy 数组中交换两个列？

难度：L2

问题：在数组 arr 中交换列 1 和列 2。

arr = np.arange(9).reshape(3,3)

#way1: 普通思路

a = np.arange(9).reshape(3,3)
print(a)
b = np.transpose(a)
temp = np.copy(b[0])
b[0] = b[1]
b[1] = temp
c = np.transpose(b)
print(c)

#注意：若直接写 temp=a[0]，两者指向同一个对象
在这里插入图片描述
解决方案：
#way2: 用高维数组的切片 a[[y0,y1,y2],[x1,x4,x5,x6]]

a = np.arange(9).reshape(3,3)
a[:,[0,1]] = a[:,[1,0]]
print(a)

三、一题多解

（1）007 如何重塑（reshape）数组？

难度：L1

问题：将 1 维数组转换成 2 维数组（两行）。

输入：

np.arange(10)
#> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

期望输出

#> array([[0, 1, 2, 3, 4],
#> [5, 6, 7, 8, 9]])

思路：本质都是奇数元素补齐为偶数，再转换为两行
#way1:

if n%2==0:
    b = a.reshape(2,n//2)
else:
    temp = a[:((n//2)+1):]
    temp = np.append(temp, a[((n//2)+1)::])
    temp = np.append(temp, 0)
    b = temp.reshape(2,(n//2)+1)
    print(b)

#way2:

if n%2==0:
    c = np.reshape(a, (2,-1))
    print(c)
else:
    c = np.append(a, 0)
    c = np.reshape(c, (2,-1))
    print(c)

（2）020 如何创建一个包含 5 和 10 之间随机浮点的 2 维数组？

难度：L2

问题：创建一个形态为 5×3 的 2 维数组，包含 5 和 10 之间的随机十进制小数。

#way1: np.random.random((a,b)) 生成a行b列的随机数组，0~1之间

a = np.random.random((5,3))
print(a*5+5)

#way2: uniform正态分布

a = np.random.uniform(5, 10, (5,3))
print(a)

（3）026 如何从 1 维元组数组中提取特定的列？

难度：L2

问题：从前一个问题导入的 1 维 iris 中提取文本列 species。

输入：

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',' , dtype=None)

#way1: type = ndarray (sklearn 提前出现！)

from sklearn.datasets import load_iris
iris = load_iris()
print(iris['target'])

#way2: type = 1d-array + tuple

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_1d = np.genfromtxt(url, delimiter=',' , dtype=None)
a = np.array([x[4] for x in iris_1d])
print(a)

（4）040 如何将一个数值转换为一个类别（文本）数组？

难度：L2

问题：将 iris_2d 的 petallength（第三列）转换以构建一个文本数组，按如下规则进行转换：

Less than 3 –> ‘small’
3-5 –> medium
=5 –> large

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='float', usecols=[0,1,2,3])
names = ('sepallength','sepalwidth', 'petallength' , 'petalwidth' , 'species')

#way1:where()嵌套l[]函数

petallength = iris_2d[:, 2]
step1 = np.where(petallength<3, 'small', ['large' if item>=5 else 'medium' for item in petallength])
print(step1)

#way2:利用digitize() + .astype()

length = np.digitize(iris_2d[:,2].astype('float'), [0,3,5,10])
label = {1:'small', 2:'medium', 3:'large',4:np.nan}
length_petal = [label[i] for i in length]
print(length_petal)

（5）041 如何基于 NumPy 数组现有列创建一个新的列？

难度：L2

问题：为 iris_2d 中的 volume 列创建一个新的列，volume 指 (pi x petallength x sepal_length^2)/3。

# Input
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data'
iris_2d = np.genfromtxt(url, delimiter=',', dtype='object')
names = ('sepallength','sepalwidth', 'petallength' , 'petalwidth' , 'species')
petallength =  iris_2d[:,2].astype(float)
sepal_length = iris_2d[:,0].astype(float)
volume = (np.pi *petallength*sepal_length**2)/3

#way1: 转置矩阵

volume = np.expand_dims(volume, axis=0)
volume = np.around(volume, 2)
new = np.append(np.transpose(iris_2d),volume,axis=0)
new = np.transpose(new)
print(new)

#way2:newaixs + hstack(tuple)

volume0 = volume[:,np.newaxis]
print(volume0.shape)
print(np.hstack((iris_2d, volume0)))

（6）047 如何将数组中所有大于给定值的数替换为给定的 cutoff 值？

难度：L2

问题：对于数组 a，将所有大于 30 的值替换为 30，将所有小于 10 的值替换为 10。

输入：

np.random.seed(100)
a = np.random.uniform(1,50, 20)

#way1: where()嵌套l[]函数

step1 = np.where(a<10, 10, [30 if item>30 else item for item in a])
print(step1)

缺点：当截断数 > 3 时，难以继续嵌套。（？）

#way2: 用clip(a, low, high)函数

print(np.clip(a,10,30))

（7）050 如何将 array_of_arrays 转换为平面 1 维数组？

难度：L2

问题：将 array_of_arrays 转换为平面线性 1 维数组。

# Input:
arr1 = np.arange(3)
arr2 = np.arange(3,7)
arr3 = np.arange(7,10)
array_of_arrays = np.array([arr1, arr2, arr3])
print(array_of_arrays)
#> array([array([0, 1, 2]), array([3, 4, 5, 6]), array([7, 8, 9])], dtype=object)

期望输出：

#> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

#way1: 用np.array(list) 方式进行转换

out = []
[[out.append(item) for item in row] for row in array_of_arrays]
print(np.array(out))

#way2:

arr = np.array([a for i in array_of_arrays for a in i])
print(arr)

（8）056 如何在 2 维 NumPy 数组中找到每一行的最大值？

难度：L2

问题：在给定数组中找到每一行的最大值。

np.random.seed(100)
a = np.random.randint(1,10, [5,3])
print(a)
#> array([[9, 9, 4],
#> [8, 8, 1],
#> [5, 3, 6],
#> [3, 3, 3],
#> [2, 1, 9]])

#way1: 列表推导式

print([np.amax(item) for item in a])

#way2: amax()-numpy的数组友好性

print([np.amax(a,axis=0)])

（9）061 如何删除 NumPy 数组中所有的缺失值？

难度：L2

问题：从 1 维 NumPy 数组中删除所有的 nan 值。

输入：

np.array([1,2,3,np.nan,5,6,7,np.nan])

期望输出：

array([ 1., 2., 3., 5., 6., 7.])

#way1:

out = [not np.isnan(item) for item in a]
print(a[out])

#way2: 利用“按位取反~”运算符（好像出过bug？）

print(a[~np.isnan(a)])

（10）064 如何从 2 维数组中减去 1 维数组，从 2 维数组的每一行分别减去 1 维数组的每一项？

难度：L2

问题：从 2 维数组 a_2d 中减去 1 维数组 b_1d，即从 a_2d 的每一行分别减去 b_1d 的每一项。

输入：

a_2d = np.array([[3,3,3],[4,4,4],[5,5,5]])
b_1d = np.array([1,1,1])

期望输出：

#> [[2 2 2]
#> [2 2 2]
#> [2 2 2]]

#way1: 利用 np.repeat(a, repeats) 函数性质

c = np.repeat(b_1d, a_2d.shape[0]).reshape(a_2d.shape)
print(a_2d-c)

#way2: 数组类型运算 (3,3) - (3,1) 后者自动repeat扩充为(3,3)

print(b_1d[:,None])
d = a_2d - b_1d[:,None]
print(d)

（11）065 如何在数组中找出某个项的第 n 个重复索引？

难度：L2

问题：找到数组 x 中数字 1 的第 5 个重复索引。

x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])

#way1: argwhere(condition)返回2d数组（array(索引)）

print(np.argwhere(x==1)[4][0])

#way2: where(condition)返回2元元组（array(索引)，dtype=?）

n = 5
print(np.where(x==1)[0][n-1])

#注意：

np.argwhere(condition)
np.where(condition)

生成的是不同形状的两个索引矩阵
在这里插入图片描述

（12）070 如何基于给定的 1 维数组创建 strides？

难度：L4

问题：给定 1 维数组 arr，使用 strides 生成一个 2 维矩阵，其中 window length 等于 4，strides 等于 2，例如 [[0,1,2,3], [2,3,4,5], [4,5,6,7]…]。

输入：

arr = np.arange(15)
print(arr)
#> array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])

期望输出：

#> [[ 0 1 2 3]
#> [ 2 3 4 5]
#> [ 4 5 6 7]
#> [ 6 7 8 9]
#> [ 8 9 10 11]
#> [10 11 12 13]]

#way1: 面向过程

print(arr)
stride, width, size = 2, 4, len(arr)
out = [arr[i:i+width] for i in range(0, size, stride)]
out = np.array(out)
out = np.array([a for i in out for a in i])
print(out.reshape(-1,width))

#way2: OOP

def gen_strides(arr, stride_len = 5, window_len = 5):
    n_strides = ((arr.size - window_len) // stride_len) + 1
    return np.array([arr[i:(i + window_len)]  for i in np.arange(0, n_strides * stride_len, stride_len)])
print(gen_strides(arr, stride_len=2, window_len=4))