NumPy入门（三）

最新推荐文章于 2024-06-07 15:42:46 发布

ZLuby

最新推荐文章于 2024-06-07 15:42:46 发布

阅读量198

点赞数

分类专栏： python 文章标签： python numpy

本文链接：https://blog.csdn.net/weixin_38300566/article/details/80627396

版权

python 专栏收录该内容

39 篇文章 10 订阅

订阅专栏

算术运算和广播

我们将了解 NumPy 如何对 ndarray 进行算术运算。NumPy 允许对 ndarray 执行元素级运算以及矩阵运算。在这节课，我们将仅了解如何对 ndarray 进行元素级运算。为了进行元素级运算，NumPy 有时候会用到广播功能。广播一词用于描述 NumPy 如何对具有不同形状的 ndarray 进行元素级算术运算。例如，在标量和 ndarray 之间进行算术运算时，会隐式地用到广播。

元素级运算

我们先在 ndarray 之间进行元素级加减乘除运算。为此，我们可以在 NumPy 中使用 np.add() 等函数，或者使用 + 等算术符号，后者与数学方程式的写法更像。这两种形式都执行相同的运算，唯一的区别是如果采用函数方式，函数通常都具有各种选项，可以通过关键字调整这些选项。请注意，在进行元素级运算时，对其执行运算的 ndarray 必须具有相同的形状或者可以广播。我们将在这节课的稍后阶段详细讲解这方面的知识。我们先对秩为 1 的 ndarray 执行元素级算术运算：

# We create two rank 1 ndarrays
x = np.array([1,2,3,4])
y = np.array([5.5,6.5,7.5,8.5])

# We print x
print()
print('x = ', x)

# We print y
print()
print('y = ', y)
print()

# We perfrom basic element-wise operations using arithmetic symbols and functions
print('x + y = ', x + y)
print('add(x,y) = ', np.add(x,y))
print()
print('x - y = ', x - y)
print('subtract(x,y) = ', np.subtract(x,y))
print()
print('x * y = ', x * y)
print('multiply(x,y) = ', np.multiply(x,y))
print()
print('x / y = ', x / y)
print('divide(x,y) = ', np.divide(x,y))

x = [1 2 3 4]

y = [ 5.5 6.5 7.5 8.5]

x + y = [ 6.5 8.5 10.5 12.5]
add(x,y) = [ 6.5 8.5 10.5 12.5]

x - y = [-4.5 -4.5 -4.5 -4.5]
subtract(x,y) = [-4.5 -4.5 -4.5 -4.5]

x * y = [ 5.5 13. 22.5 34. ]
multiply(x,y) = [ 5.5 13. 22.5 34. ]

x / y = [ 0.18181818 0.30769231 0.4 0.47058824]
divide(x,y) = [ 0.18181818 0.30769231 0.4 0.47058824]

我们还可以对秩为 2 的 ndarray 执行相同的元素级算术运算。同样，为了执行这些运算，ndarray 的形状必须一样或者可广播。

# We create two rank 2 ndarrays
X = np.array([1,2,3,4]).reshape(2,2)
Y = np.array([5.5,6.5,7.5,8.5]).reshape(2,2)

# We print X
print()
print('X = \n', X)

# We print Y
print()
print('Y = \n', Y)
print()

# We perform basic element-wise operations using arithmetic symbols and functions
print('X + Y = \n', X + Y)
print()
print('add(X,Y) = \n', np.add(X,Y))
print()
print('X - Y = \n', X - Y)
print()
print('subtract(X,Y) = \n', np.subtract(X,Y))
print()
print('X * Y = \n', X * Y)
print()
print('multiply(X,Y) = \n', np.multiply(X,Y))
print()
print('X / Y = \n', X / Y)
print()
print('divide(X,Y) = \n', np.divide(X,Y))

X =
[[1 2]
[3 4]]

Y =
[[ 5.5 6.5]
[ 7.5 8.5]]

X + Y =
[[ 6.5 8.5]
[ 10.5 12.5]]

add(X,Y) =
[[ 6.5 8.5]
[ 10.5 12.5]]

X - Y =
[[-4.5 -4.5]
[-4.5 -4.5]]

subtract(X,Y) =
[[-4.5 -4.5]
[-4.5 -4.5]]

X * Y =
[[ 5.5 13. ]
[ 22.5 34. ]]

multiply(X,Y) =
[[ 5.5 13. ]
[ 22.5 34. ]]

X / Y =
[[ 0.18181818 0.30769231]
[ 0.4 0.47058824]]

divide(X,Y) =
[[ 0.18181818 0.30769231]
[ 0.4 0.47058824]]

应用数学函数

我们还可以同时对 ndarray 的所有元素应用数学函数，例如 sqrt(x)。

# We create a rank 1 ndarray
x = np.array([1,2,3,4])

# We print x
print()
print('x = ', x)

# We apply different mathematical functions to all elements of x
print()
print('EXP(x) =', np.exp(x))
print()
print('SQRT(x) =',np.sqrt(x))
print()
print('POW(x,2) =',np.power(x,2)) # We raise all elements to the power of 2

x = [1 2 3 4]

EXP(x) = [ 2.71828183 7.3890561 20.08553692 54.59815003]

SQRT(x) = [ 1. 1.41421356 1.73205081 2. ]

POW(x,2) = [ 1 4 9 16]

统计学函数

NumPy 的另一个重要特性是具有大量不同的统计学函数。统计学函数为我们提供了关于 ndarray 中元素的统计学信息。既可以当做数组的实例方法调用，也可以作为NumPy函数使用，我们来看一些示例：

# We create a 2 x 2 ndarray
X = np.array([[1,2], [3,4]])

# We print x
print()
print('X = \n', X)
print()

print('Average of all elements in X:', X.mean())
print('Average of all elements in the columns of X:', X.mean(axis=0))
print('Average of all elements in the rows of X:', X.mean(axis=1))
print()
print('Sum of all elements in X:', X.sum())
print('Sum of all elements in the columns of X:', X.sum(axis=0))
print('Sum of all elements in the rows of X:', X.sum(axis=1))
print()
print('Standard Deviation of all elements in X:', X.std())
print('Standard Deviation of all elements in the columns of X:', X.std(axis=0))
print('Standard Deviation of all elements in the rows of X:', X.std(axis=1))
print()
print('Median of all elements in X:', np.median(X))
print('Median of all elements in the columns of X:', np.median(X,axis=0))
print('Median of all elements in the rows of X:', np.median(X,axis=1))
print()
print('Maximum value of all elements in X:', X.max())
print('Maximum value of all elements in the columns of X:', X.max(axis=0))
print('Maximum value of all elements in the rows of X:', X.max(axis=1))
print()
print('Maximum value of all elements in X:', X.min())
print('Maximum value of all elements in the columns of X:', X.min(axis=0))
print('Maximum value of all elements in the rows of X:', X.min(axis=1))

X =
[[1 2]
[3 4]]

Average of all elements in X: 2.5
Average of all elements in the columns of X: [ 2. 3.]
Average of all elements in the rows of X: [ 1.5 3.5]

Sum of all elements in X: 10
Sum of all elements in the columns of X: [4 6]
Sum of all elements in the rows of X: [3 7]

Standard Deviation of all elements in X: 1.11803398875
Standard Deviation of all elements in the columns of X: [ 1. 1.]
Standard Deviation of all elements in the rows of X: [ 0.5 0.5]

Median of all elements in X: 2.5
Median of all elements in the columns of X: [ 2. 3.]
Median of all elements in the rows of X: [ 1.5 3.5]

Maximum value of all elements in X: 4
Maximum value of all elements in the columns of X: [3 4]
Maximum value of all elements in the rows of X: [2 4]

Maximum value of all elements in X: 1
Maximum value of all elements in the columns of X: [1 2]
Maximum value of all elements in the rows of X: [1 3]

但是当x为列表时，当方法不可用，用函数可以。此处的sample_props 为列表。

NumPy 广播

最后，我们来看看 NumPy 如何使 ndarray 中的所有元素与单个数字相加，而不使用复杂的循环。

# We create a 2 x 2 ndarray
X = np.array([[1,2], [3,4]])

# We print x
print()
print('X = \n', X)
print()

print('3 * X = \n', 3 * X)
print()
print('3 + X = \n', 3 + X)
print()
print('X - 3 = \n', X - 3)
print()
print('X / 3 = \n', X / 3)

X =
[[1 2]
[3 4]]

3 * X =
[[ 3 6]
[ 9 12]]

3 + X =
[[4 5]
[6 7]]

X - 3 =
[[-2 -1]
[ 0 1]]

X / 3 =
[[ 0.33333333 0.66666667]
[ 1. 1.33333333]]

在上述示例中，NumPy 在后台对 ndarray 广播 3，使它们具有相同的形状。这样我们仅使用一行代码，就可以使 X 的每个元素加 3。

Numpy 可以对两个形状不同的 ndarray 执行相同的操作，但是存在一些限制，如下所示。

# We create a rank 1 ndarray
x = np.array([1,2,3])

# We create a 3 x 3 ndarray
Y = np.array([[1,2,3],[4,5,6],[7,8,9]])

# We create a 3 x 1 ndarray
Z = np.array([1,2,3]).reshape(3,1)

# We print x
print()
print('x = ', x)
print()

# We print Y
print()
print('Y = \n', Y)
print()

# We print Z
print()
print('Z = \n', Z)
print()

print('x + Y = \n', x + Y)
print()
print('Z + Y = \n',Z + Y)

x = [1 2 3]

Y =
[[1 2 3]
[4 5 6]
[7 8 9]]

Z =
[[1]
[2]
[3]]

x + Y =
[[ 2 4 6]
[ 5 7 9]
[ 8 10 12]]

Z + Y =
[[ 2 3 4]
[ 6 7 8]
[10 11 12]]

和之前一样，NumPy 能够通过沿着大的 ndarray 对更小的 ndarray 进行广播，将 1 x 3 和 3 x 1 ndarray 加到 3 x 3 ndarray 上。通常，NumPy 能够这么操作的前提是，更小的 ndarray（例如我们的示例中的 1 x 3 ndarray）可以扩展成更大的 ndarray 的形状，并且生成的广播很清晰明确。

确保阅读 NumPy 文档，详细了解广播及其规则： Broadcasting

均值标准化

在机器学习中，我们会使用大量数据训练我们的模型。某些机器学习算法可能需要标准化数据才能正常工作。标准化是指特征缩放，旨在确保所有数据都采用相似的刻度，即所有数据采用相似范围的值。例如，数据集的值范围在 0 到 5,000 之间。通过标准化数据，可以使值范围在 0 到 1 之间。

在此 Lab 中，你将执行一种特殊形式的特征缩放，称之为均值标准化。均值标准化不仅会缩放数据，而且会确保数据的均值为 0。

首先，你将导入 NumPy 并创建一个秩为 2 的 ndarray，其中包含 0 到 5,000（含）之间的随机整数，共有 1000 行和 20 列。此数组将模拟一个值范围很广的数据集。请填充以下代码

# import NumPy into Python
import numpy as np

# Create a 1000 x 20 ndarray with random integers in the half-open interval [0, 5001).
X = np.random.randint(0,5001,size=(1000,20))

# print the shape of X
print(X.shape)

(1000, 20)

创建好数组后，我们将标准化数据。我们将使用以下方程进行均值标准化：

Norm_Col?=(Col?−??)/??

其中 Col?是 ?的第 ? 列，??是 ? 的第 ?列的平均值，??是 ? 的第 ? 列的标准差。换句话说，均值标准化的计算方法是将值减去 ?的每列的平均值，然后除以值的标准差。在下面的空白处，你首先需要计算 ?的每列的平均值和标准差。

# Average of the values in each column of X
ave_cols = X.mean(axis=0)

# Standard Deviation of the values in each column of X
std_cols = X.std(axis=0)

如果你正确地完成了上述计算过程，则 ave_cols 和 std_cols 向量的形状都应该为 (20,)，因为 ?X 有 20 列。你可以通过填充以下代码验证这一点：

# Print the shape of ave_cols
print(ave_cols.shape)
# Print the shape of std_cols
print(std_cols.shape)

(20,) (20,)

现在，你可以利用广播计算 ?X 的均值标准化版本，借助上述方程，用一行代码就能搞定。请填充以下代码

# Mean normalize X

X_norm = (X-ave_cols)/std_cols

如果你正确地完成了均值标准化过程，那么 ?_norm中的所有元素的平均值应该接近 0。你可以通过填充以下代码验证这一点：

# Print the average of all the values of X_norm
print(X_norm.mean())
# Print the minimum value of each column of X_norm
print(X_norm.min(axis=0))
# Print the maximum value of each column of X_norm
print(X_norm.max(axis=0))

请注意，因为 ? 是使用随机整数创建的，因此上述值将有所变化。

数据分离

数据均值标准化后，通常在机器学习中，我们会将数据集拆分为三个集合：

训练集
交叉验证集
测试集

划分方式通常为，训练集包含 60% 的数据，交叉验证集包含 20% 的数据，测试集包含 20% 的数据。

在此部分，你需要将 X_norm 分离成训练集、交叉验证集和测试集。每个数据集将包含随机选择的 X_norm 行，确保不能重复选择相同的行。这样可以保证所有的 X_norm 行都能被选中，并且在三个新的数据集中随机分布。

首先你需要创建一个秩为 1 的 ndarray，其中包含随机排列的 X_norm 行索引。为此，你可以使用 np.random.permutation() 函数。np.random.permutation(N) 函数会创建一个从 0 到 N - 1的随机排列的整数集。我们来看一个示例：

# We create a random permutation of integers 0 to 4
np.random.permutation(5)

array([3, 1, 2, 0, 4])

在下面的空白处，创建一个秩为 1 的 ndarray，其中包含随机排列的 X_norm 行索引。用一行代码就能搞定：使用 shape 属性提取 X_norm 的行数，然后将其传递给 np.random.permutation() 函数。注意，shape 属性返回一个包含两个数字的元组，格式为 (rows,columns)。

# Create a rank 1 ndarray that contains a random permutation of the row indices of `X_norm`
row_indices = np.random.permutation(X_norm.shape[0])

现在，你可以使用 row_indices ndarray 创建三个数据集，并选择进入每个数据集的行。注意，训练集包含 60% 的数据，交叉验证集包含 20% 的数据，测试集包含 20% 的数据。每个集合都只需一行代码就能创建。请填充以下代码

# Make any necessary calculations.
# You can save your calculations into variables to use later.
x1=row_indices[:600]
x2=row_indices[600:800]
x3=row_indices[800:]

# Create a Training Set
X_train = X_norm[x1,:]

# Create a Cross Validation Set
X_crossVal = X_norm[x2,:]

# Create a Test Set
X_test = X_norm[x3,:]

如果你正确地完成了上述计算步骤，那么 X_tain 应该有 600 行和 20 列，X_crossVal 应该有 200 行和 20 列，X_test 应该有 200 行和 20 列。你可以通过填充以下代码验证这一点：

# Print the shape of X_train
print(X_train.shape)
# Print the shape of X_crossVal
print(X_crossVal.shape)
# Print the shape of X_test
print(X_test.shape)

(600, 20) (200, 20) (200, 20)