目录
访问和删除 ndarray 中的元素及向其中插入元素
你已经知道如何创建各种 ndarray,现在将学习 NumPy 使我们如何有效地操纵 ndarray 中的数据。NumPy ndarray 是可变的,意味着 ndarray 中的元素在 ndarray 创建之后可以更改。NumPy ndarray 还可以切片,因此可以通过多种方式拆分 ndarray。例如,我们可以从 ndarray 中获取想要的任何子集。通常,在机器学习中,你需要使用切片拆分数据,例如将数据集拆分为训练集、交叉验证集和测试集。
通过索引访问或修改 ndarray 中的元素
1、我们首先将了解如何通过索引访问或修改 ndarray 中的元素。可以在方括号 [ ] 中添加索引来访问元素。在 NumPy 中,你可以使用正索引和负索引访问 ndarray 中的元素。正索引表示从数组的开头访问元素,负索引表示从数组的末尾访问元素。我们来看看如何访问秩为 1 的 ndarray 中的元素:
# We create a rank 1 ndarray that contains integers from 1 to 5
x = np.array([1, 2, 3, 4, 5])
# We print x
print()
print('x = ', x)
print()
# Let's access some elements with positive indices
print('This is First Element in x:', x[0])
print('This is Second Element in x:', x[1])
print('This is Fifth (Last) Element in x:', x[4])
print()
# Let's access the same elements with negative indices
print('This is First Element in x:', x[-5])
print('This is Second Element in x:', x[-4])
print('This is Fifth (Last) Element in x:', x[-1])
x = [1 2 3 4 5]
This is First Element in x: 1
This is Second Element in x: 2
This is Fifth (Last) Element in x: 5
This is First Element in x: 1
This is Second Element in x: 2
This is Fifth (Last) Element in x: 5
注意,要访问 ndarray 中的第一个元素,我们需要使用索引 0,而不是 1。此外注意,可以同时使用正索引和负索引访问同一个元素。正如之前提到的,正索引用于从数组的开头访问元素,负索引用于从数组的末尾访问元素。
2、现在我们看看如何更改秩为 1 的 ndarray 中的元素。方法是访问要更改的元素,然后使用 =
符号分配新的值:
# We create a rank 1 ndarray that contains integers from 1 to 5
x = np.array([1, 2, 3, 4, 5])
# We print the original x
print()
print('Original:\n x = ', x)
print()
# We change the fourth element in x from 4 to 20
x[3] = 20
# We print x after it was modified
print('Modified:\n x = ', x)
Original: x = [1 2 3 4 5]
Modified: x = [ 1 2 3 20 5]
3、同样,我们可以访问和修改秩为 2 的 ndarray 中的特定元素。要访问秩为 2 的 ndarray 中的元素,我们需要提供两个索引,格式为 [row, column]
。我们来看一些示例:
# We create a 3 x 3 rank 2 ndarray that contains integers from 1 to 9
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
# We print X
print()
print('X = \n', X)
print()
# Let's access some elements in X
print('This is (0,0) Element in X:', X[0,0])
print('This is (0,1) Element in X:', X[0,1])
print('This is (2,2) Element in X:', X[2,2])
X =
[[1 2 3]
[4 5 6]
[7 8 9]]
This is (0,0) Element in X: 1
This is (0,1) Element in X: 2
This is (2,2) Element in X: 9
注意,索引 [0, 0]
是指第一行第一列的元素。
4、可以像针对秩为 1 的 ndarray 一样修改秩为 2 的 ndarray 中的元素。我们来看一个示例:
# We create a 3 x 3 rank 2 ndarray that contains integers from 1 to 9
X = np.array([[1,2,3],[4,5,6],[7,8,9]])
# We print the original x
print()
print('Original:\n X = \n', X)
print()
# We change the (0,0) element in X from 1 to 20
X[0,0] = 20
# We print X after it was modified
print('Modified:\n X = \n', X)
Original:
X =
[[1 2 3]
[4 5 6]
[7 8 9]]
Modified:
X =
[[20 2 3]
[ 4 5 6]
[ 7 8 9]]
向 ndarray 中添加元素及删除其中的元素
1、我们可以使用 np.delete(ndarray, elements, axis)
函数删除元素。此函数会沿着指定的轴
从给定 ndarray
中删除
给定的元素
列表。对于秩为 1 的 ndarray,不需要使用关键字 axis
。对于秩为 2 的 ndarray,axis = 0
表示选择行,axis = 1
表示选择列。我们来看一些示例:
# We create a rank 1 ndarray
x = np.array([1, 2, 3, 4, 5])
# We create a rank 2 ndarray
Y = np.array([[1,2,3],[4,5,6],[7,8,9]])
# We print x
print()
print('Original x = ', x)
# We delete the first and last element of x
x = np.delete(x, [0,4])
# We print x with the first and last element deleted
print()
print('Modified x = ', x)
# We print Y
print()
print('Original Y = \n', Y)
# We delete the first row of y
w = np.delete(Y, 0, axis=0)
# We delete the first and last column of y
v = np.delete(Y, [0,2], axis=1)
# We print w
print()
print('w = \n', w)
# We print v
print()
print('v = \n', v)
Original x = [1 2 3 4 5]
Modified x = [2 3 4]
Original Y =
[[1 2 3]
[4 5 6]
[7 8 9]]
w =
[[4 5 6]
[7 8 9]]
v =
[[2]
[5]
[8]]
2、现在我们来看看如何向 ndarray 中附加值。我们可以使用 np.append(ndarray, elements, axis)
函数向 ndarray 中附加值。该函数会将给定的元素
列表沿着指定的轴
附加到 ndarray
中。我们来看一些示例:
# We create a rank 1 ndarray
x = np.array([1, 2, 3, 4, 5])
# We create a rank 2 ndarray
Y = np.array([[1,2,3],[4,5,6]])
# We print x
print()
print('Original x = ', x)
# We append the integer 6 to x
x = np.append(x, 6)
# We print x
print()
print('x = ', x)
# We append the integer 7 and 8 to x
x = np.append(x, [7,8])
# We print x
print()
print('x = ', x)
# We print Y
print()
print('Original Y = \n', Y)
# We append a new row containing 7,8,9 to y
v = np.append(Y, [[7,8,9]], axis=0)
# We append a new column containing 9 and 10 to y
q = np.append(Y,[[9],[10]], axis=1)
# We print v
print()
print('v = \n', v)
# We print q
print()
print('q = \n', q)
Original x = [1 2 3 4 5]
x = [1 2 3 4 5 6]
x = [1 2 3 4 5 6 7 8]
Original Y =
[[1 2 3]
[4 5 6]]
v =
[[1 2 3]
[4 5 6]
[7 8 9]]
q =
[[ 1 2 3 9]
[ 4 5 6 10]]
注意,当我们将行或列附加到秩为 2 的 ndarray 中时,行或列的形状必须正确,以与秩为 2 的 ndarray 的形状相符。
向 ndarray 中插入值
我们可以使用 np.insert(ndarray, index, elements, axis)
函数向 ndarray 中插入值。此函数会将给定的元素
列表沿着指定的轴
插入到 ndarray
中,并放在给定的索引
前面。我们来看一些示例:
# We create a rank 1 ndarray
x = np.array([1, 2, 5, 6, 7])
# We create a rank 2 ndarray
Y = np.array([[1,2,3],[7,8,9]])
# We print x
print()
print('Original x = ', x)
# We insert the integer 3 and 4 between 2 and 5 in x.
x = np.insert(x,2,[3,4])
# We print x with the inserted elements
print()
print('x = ', x)
# We print Y
print()
print('Original Y = \n', Y)
# We insert a row between the first and last row of y
w = np.insert(Y,1,[4,5,6],axis=0)
# We insert a column full of 5s between the first and second column of y
v = np.insert(Y,1,5, axis=1)
# We print w
print()
print('w = \n', w)
# We print v
print()
print('v = \n', v)
Original x = [1 2 5 6 7]
x = [1 2 3 4 5 6 7]
Original Y =
[[1 2 3]
[7 8 9]]
w =
[[1 2 3]
[4 5 6]
[7 8 9]]
v =
[[1 5 2 3]
[7 5 8 9]]
将 ndarray 上下堆叠起来,或者左右堆叠
可以使用 np.vstack()
函数进行垂直堆叠,或使用 np.hstack()
函数进行水平堆叠。请务必注意,为了堆叠 ndarray,ndarray 的形状必须相符。我们来看一些示例:
# We create a rank 1 ndarray
x = np.array([1,2])
# We create a rank 2 ndarray
Y = np.array([[3,4],[5,6]])
# We print x
print()
print('x = ', x)
# We print Y
print()
print('Y = \n', Y)
# We stack x on top of Y
z = np.vstack((x,Y))
# We stack x on the right of Y. We need to reshape x in order to stack it on the right of Y.
w = np.hstack((Y,x.reshape(2,1)))
# We print z
print()
print('z = \n', z)
# We print w
print()
print('w = \n', w)
x = [1 2]
Y =
[[3 4]
[5 6]]
z =
[[1 2]
[3 4]
[5 6]]
w =
[[3 4 1]
[5 6 2]]
ndarray 切片
正如之前提到的,我们除了能够一次访问一个元素之外,NumPy 还提供了访问 ndarray 子集的方式,称之为切片。切片方式是在方括号里用冒号 :
分隔起始和结束索引。通常,你将遇到三种类型的切片:
1. ndarray[start:end]
2. ndarray[start:]
3. ndarray[:end]
第一种方法用于选择在 start
和 end
索引之间的元素。第二种方法用于选择从 start
索引开始直到最后一个索引的所有元素。第三种方法用于选择从第一个索引开始直到 end
索引的所有元素。请注意,在第一种方法和第三种方法中,结束索引不包括在内。此外注意,因为 ndarray 可以是多维数组,在进行切片时,通常需要为数组的每个维度指定一个切片。
现在我们将查看一些示例,了解如何使用上述方法从秩为 2 的 ndarray 中选择不同的子集。
# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)
# We print X
print()
print('X = \n', X)
print()
# We select all the elements that are in the 2nd through 4th rows and in the 3rd to 5th columns
Z = X[1:4,2:5]
# We print Z
print('Z = \n', Z)
# We can select the same elements as above using method 2
W = X[1:,2:5]
# We print W
print()
print('W = \n', W)
# We select all the elements that are in the 1st through 3rd rows and in the 3rd to 5th columns
Y = X[:3,2:5]
# We print Y
print()
print('Y = \n', Y)
# We select all the elements in the 3rd row
v = X[2,:]
# We print v
print()
print('v = ', v)
# We select all the elements in the 3rd column
q = X[:,2]
# We print q
print()
print('q = ', q)
# We select all the elements in the 3rd column but return a rank 2 ndarray
R = X[:,2:3]
# We print R
print()
print('R = \n', R)
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
Z =
[[ 7 8 9]
[12 13 14]
[17 18 19]]
W =
[[ 7 8 9]
[12 13 14]
[17 18 19]]
Y =
[[ 2 3 4]
[ 7 8 9]
[12 13 14]]
v = [10 11 12 13 14]
q = [ 2 7 12 17]
R =
[[ 2]
[ 7]
[12]
[17]]
注意,当我们选择第 3 列中的所有元素,即上述变量 q
,切片返回一个秩为 1 的 ndarray,而不是秩为 2 的 ndarray。但是,如果以稍微不同的方式切片X
,即上述变量 R
,实际上可以获得秩为 2 的 ndarray。
请务必注意,如果对 ndarray 进行切片并将结果保存到新的变量中,就像之前一样,数据不会复制到新的变量中。初学者对于这一点经常比较困惑。因此,我们将深入讲解这方面的知识。
在上述示例中,当我们进行赋值时,例如:
Z = X[1:4,2:5]
原始数组 X
的切片没有复制到变量 Z
中。X
和 Z
现在只是同一个 ndarray 的两个不同名称。我们提到,切片只是创建了原始数组的一个视图。也就是说,如果对 Z
做出更改,也会更改 X
中的元素。我们来看一个示例:
# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)
# We print X
print()
print('X = \n', X)
print()
# We select all the elements that are in the 2nd through 4th rows and in the 3rd to 4th columns
Z = X[1:4,2:5]
# We print Z
print()
print('Z = \n', Z)
print()
# We change the last element in Z to 555
Z[2,2] = 555
# We print X
print()
print('X = \n', X)
print()
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
Z =
[[ 7 8 9]
[12 13 14]
[17 18 19]]
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[ 15 16 17 18 555]]
可以从上述示例中清晰地看出,如果对 Z
做出更改,X
也会更改。
np.copy()
函数
但是,如果我们想创建一个新的 ndarray,其中包含切片中的值的副本,需要使用 np.copy()
函数。np.copy(ndarray)
函数会创建给定 ndarray
的一个副本。此函数还可以当做方法使用,就像之前使用 reshape 函数一样。我们来看看之前的相同示例,但是现在创建数组副本。我们将 copy
同时当做函数和方法。
# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)
# We print X
print()
print('X = \n', X)
print()
# create a copy of the slice using the np.copy() function
Z = np.copy(X[1:4,2:5])
# create a copy of the slice using the copy as a method
W = X[1:4,2:5].copy()
# We change the last element in Z to 555
Z[2,2] = 555
# We change the last element in W to 444
W[2,2] = 444
# We print X
print()
print('X = \n', X)
# We print Z
print()
print('Z = \n', Z)
# We print W
print()
print('W = \n', W)
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
Z =
[[ 7 8 9]
[ 12 13 14]
[ 17 18 555]]
W =
[[ 7 8 9]
[ 12 13 14]
[ 17 18 444]]
可以清晰地看出,通过使用 copy
命令,我们创建了完全相互独立的新 ndarray。
一个 ndarray 对另一个 ndarray 进行切片
通常,我们会使用一个 ndarray 对另一个 ndarray 进行切片、选择或更改另一个 ndarray 的元素。我们来看一些示例:
# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(20).reshape(4, 5)
# We create a rank 1 ndarray that will serve as indices to select elements from X
indices = np.array([1,3])
# We print X
print()
print('X = \n', X)
print()
# We print indices
print('indices = ', indices)
print()
# We use the indices ndarray to select the 2nd and 4th row of X
Y = X[indices,:]
# We use the indices ndarray to select the 2nd and 4th column of X
Z = X[:, indices]
# We print Y
print()
print('Y = \n', Y)
# We print Z
print()
print('Z = \n', Z)
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]
indices = [1 3]
Y =
[[ 5 6 7 8 9]
[15 16 17 18 19]]
Z =
[[ 1 3]
[ 6 8]
[11 13]
[16 18]]
NumPy 内置函数
NumPy 还提供了从 ndarray 中选择特定元素的内置函数。例如,np.diag(ndarray, k=N)
函数会以 N
定义的对角线
提取元素。默认情况下,k=0
,表示主对角线。k > 0
的值用于选择在主对角线之上的对角线中的元素,k < 0
的值用于选择在主对角线之下的对角线中的元素。我们来看一个示例:
# We create a 4 x 5 ndarray that contains integers from 0 to 19
X = np.arange(25).reshape(5, 5)
# We print X
print()
print('X = \n', X)
print()
# We print the elements in the main diagonal of X
print('z =', np.diag(X))
print()
# We print the elements above the main diagonal of X
print('y =', np.diag(X, k=1))
print()
# We print the elements below the main diagonal of X
print('w = ', np.diag(X, k=-1))
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
z = [ 0 6 12 18 24]
y = [ 1 7 13 19]
w = [ 5 11 17 23]
通常我们都会从 ndarray 中提取唯一的元素。我们可以使用 np.unique()
函数查找 ndarray 中的唯一元素。np.unique(ndarray)
函数会返回给定 ndarray
中的 唯一
元素,如以下示例所示:
# Create 3 x 3 ndarray with repeated values
X = np.array([[1,2,3],[5,2,8],[1,2,3]])
# We print X
print()
print('X = \n', X)
print()
# We print the unique elements of X
print('The unique elements in X are:',np.unique(X))
X =
[[1 2 3]
[5 2 8]
[1 2 3]]
The unique elements in X are: [1 2 3 5 8]
布尔型索引、集合运算和排序
布尔型索引
到目前为止,我们了解了如何使用索引进行切片以及选择 ndarray 元素。当我们知道要选择的元素的确切索引时,这些方法很有用。但是,在很多情况下,我们不知道要选择的元素的索引。例如,假设有一个 10,000 x 10,000 ndarray,其中包含从 1 到 15,000 的随机整数,我们只想选择小于 20 的整数。这时候就要用到布尔型索引,对于布尔型索引,我们将使用逻辑参数(而不是确切的索引)选择元素。我们来看一些示例:
# We create a 5 x 5 ndarray that contains integers from 0 to 24
X = np.arange(25).reshape(5, 5)
# We print X
print()
print('Original X = \n', X)
print()
# We use Boolean indexing to select elements in X:
print('The elements in X that are greater than 10:', X[X > 10])
print('The elements in X that lees than or equal to 7:', X[X <= 7])
print('The elements in X that are between 10 and 17:', X[(X > 10) & (X < 17)])
# We use Boolean indexing to assign the elements that are between 10 and 17 the value of -1
X[(X > 10) & (X < 17)] = -1
# We print X
print()
print('X = \n', X)
print()
Original X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]
[20 21 22 23 24]]
The elements in X that are greater than 10: [11 12 13 14 15 16 17 18 19 20 21 22 23 24]
The elements in X that lees than or equal to 7: [0 1 2 3 4 5 6 7]
The elements in X that are between 10 and 17: [11 12 13 14 15 16]
X =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 -1 -1 -1 -1]
[-1 -1 17 18 19]
[20 21 22 23 24]]
集合运算
除了布尔型索引之外,NumPy 还允许进行集合运算。可以用来比较 ndarray,例如查找两个 ndarray 中的相同元素。我们来看一些示例:
# We create a rank 1 ndarray
x = np.array([1,2,3,4,5])
# We create a rank 1 ndarray
y = np.array([6,7,2,8,4])
# We print x
print()
print('x = ', x)
# We print y
print()
print('y = ', y)
# We use set operations to compare x and y:
print()
print('The elements that are both in x and y:', np.intersect1d(x,y))
print('The elements that are in x that are not in y:', np.setdiff1d(x,y))
print('All the elements of x and y:',np.union1d(x,y))
x = [1 2 3 4 5]
y = [6 7 2 8 4]
The elements that are both in x and y: [2 4]
The elements that are in x that are not in y: [1 3 5]
All the elements of x and y: [1 2 3 4 5 6 7 8]
排序
我们还可以在 NumPy 中对 ndarray 进行排序。我们将了解如何使用 np.sort()
函数以不同的方式对秩为 1 和 2 的 ndarray 进行排序。和我们之前看到的其他函数一样,sort
函数也可以当做方法使用。但是,对于此函数来说,数据在内存中的存储方式有很大变化。当 np.sort()
当做函数使用时,它不会对ndarray进行就地排序,即不更改被排序的原始 ndarray。但是,如果将 sort
当做方法,ndarray.sort()
会就地排序 ndarray,即原始数组会变成排序后的数组。我们来看一些示例:
# We create an unsorted rank 1 ndarray
x = np.random.randint(1,11,size=(10,))
# We print x
print()
print('Original x = ', x)
# We sort x and print the sorted array using sort as a function.
print()
print('Sorted x (out of place):', np.sort(x))
# When we sort out of place the original array remains intact. To see this we print x again
print()
print('x after sorting:', x)
Original x = [9 6 4 4 9 4 8 4 4 7]
Sorted x (out of place): [4 4 4 4 4 6 7 8 9 9]
x after sorting: [9 6 4 4 9 4 8 4 4 7]
注意,np.sort()
会对数组进行排序,但是如果被排序的 ndarray 具有重复的值,np.sort()
将在排好序的数组中保留这些值。但是,我们可以根据需要,同时使用 sort 函数和 unique 函数仅对 x
中的唯一元素进行排序。我们来看看如何对上述 x
中的唯一元素进行排序:
# We sort x but only keep the unique elements in x
print(np.sort(np.unique(x)))
[4 6 7 8 9]
最后,我们来看看如何将 sort 当做方法,原地对 ndarray 进行排序:
# We create an unsorted rank 1 ndarray
x = np.random.randint(1,11,size=(10,))
# We print x
print()
print('Original x = ', x)
# We sort x and print the sorted array using sort as a method.
x.sort()
# When we sort in place the original array is changed to the sorted array. To see this we print x again
print()
print('x after sorting:', x)
Original x = [9 9 8 1 1 4 3 7 2 8]
x after sorting: [1 1 2 3 4 7 8 8 9 9]
在对秩为 2 的 ndarray 进行排序时,我们需要在 np.sort()
函数中指定是按行排序,还是按列排序。为此,我们可以使用关键字 axis
。我们来看一些示例:
# We create an unsorted rank 2 ndarray
X = np.random.randint(1,11,size=(5,5))
# We print X
print()
print('Original X = \n', X)
print()
# We sort the columns of X and print the sorted array
print()
print('X with sorted columns :\n', np.sort(X, axis = 0))
# We sort the rows of X and print the sorted array
print()
print('X with sorted rows :\n', np.sort(X, axis = 1))
Original X =
[[6 1 7 6 3]
[3 9 8 3 5]
[6 5 8 9 3]
[2 1 5 7 7]
[9 8 1 9 8]]
X with sorted columns :
[[2 1 1 3 3]
[3 1 5 6 3]
[6 5 7 7 5]
[6 8 8 9 7]
[9 9 8 9 8]]
X with sorted rows :
[[1 3 6 6 7]
[3 3 5 8 9]
[3 5 6 8 9]
[1 2 5 7 7]
[1 8 8 9 9]]