NumPy入门（一）

最新推荐文章于 2023-12-25 23:15:46 发布

ZLuby

最新推荐文章于 2023-12-25 23:15:46 发布

阅读量678

点赞数

分类专栏： python 文章标签： python numpy

本文链接：https://blog.csdn.net/weixin_38300566/article/details/80600620

版权

python 专栏收录该内容

39 篇文章 10 订阅

订阅专栏

np.arange()--创建均匀分布的 ndarray

np.linspace(start, stop, N)

np.reshape()

np.random模块

np.random.randint(start, stop, size = shape)

np.random.normal(mean, standard deviation, size=shape)

np.random.choice(start, stop, size = shape)

NumPy 简介

NumPy 是 Numerical Python 的简称，它是 Python 中的科学计算基本软件包。NumPy 为 Python 提供了大量数学库，使我们能够高效地进行数字计算。这些课程将简要讲解 NumPy 基本概念，并介绍一些最重要的 NumPy 功能。

在下面的课程中，你将学习：

如何导入 NumPy
如何使用各种方法创建多维 NumPy ndarray
如何访问和更改 ndarray 中的元素
如何加载和保存 ndarray
如何使用切片选择或更改 ndarray 的子集
了解 ndarray 视图和副本之间的区别
如何使用布尔型索引并设置操作以选择或更改 ndarray 的子集
如何对 ndarray 排序
如何对 ndarray 执行元素级操作
了解 NumPy 如何使用广播对不同大小的 ndarray 执行操作。

下载 NumPy

Anaconda 中包含 NumPy。

NumPy 版本

和很多 Python 软件包一样，NumPy 也会不断更新。以下课程在制作时采用的是 NumPy 1.13.0 版。你可以检查你的 NumPy 版本：在 Jupyter notebook 中输入 !conda list numpy，或在 Anaconda 提示符处输入 conda list numpy。如果你的计算机安装的是另一个版本的 NumPy，你可以通过在 Anaconda 提示符处输入 conda install numpy=1.13 更新你的 NumPy 版本。随着新版 NumPy 的推出，一些功能可能会过时或被替换掉，因此确保在运行代码前，安装正确的 NumPy 版本。这样可以保证代码顺利运行。

NumPy 文档

NumPy 是一个强大的数学库，其中包含很多函数和功能。在这些入门课程中，我们将仅介绍 NumPy 的一些基本功能。如果你想深入学习 NumPy，确保参阅 NumPy 文档：

NumPy 手册
 NumPy 用户指南
 NumPy 参考资料
 Scipy 讲座

你可能会疑问，为何要使用 NumPy，已经知道 Python 可以处理列表。

虽然 Python 列表本身很强大，但是 NumPy 具有很多关键功能，从而比 Python 列表更具优势。其中一个优势便是速度。在对大型数组执行操作时，NumPy 的速度比 Python 列表的速度快了好几百倍。这是因为 NumPy 数组本身能节省内存，并且 NumPy 在执行算术、统计和线性代数运算时采用了优化算法。

NumPy 的另一个强大功能是具有可以表示向量和矩阵的多维数组数据结构。稍后，你将在这门课程的线性代数部分学习向量和矩阵，很快你会发现，很多机器学习算法都依赖于矩阵运算。例如，在训练神经网络时，通常需要多次进行矩阵乘法运算。NumPy 对矩阵运算进行了优化，使我们能够高效地执行线性代数运算，使其非常适合解决机器学习问题。

与 Python 列表相比，NumPy 具有的另一个强大优势是具有大量优化的内置数学函数。这些函数使你能够非常快速地进行各种复杂的数学计算，并且用到的代码很少（无需使用复杂的循环），使程序更容易读懂和理解。

这些只是使 NumPy 成为 Python 中的科学计算必要软件包的其中一些关键功能。实际上，NumPy 已经变得非常热门，Pandas 等很多 Python 软件包都是在 NumPy 的基础上构建而成。

创建 NumPy ndarray

NumPy 的核心是 ndarray，其中 nd 表示 n 维。ndarray 是一个多维数组，其中的所有元素类型都一样。换句话说，ndarray 是一个形状可以多样，并且可以存储数字或字符串的网格。在很多机器学习问题中，你通常都会发现需要以多种不同的方式使用 ndarray。例如，你可能会使用 ndarray 存储一个图像的像素值，然后将该图像馈送到神经网络中以进行图像分类。

但是在深入讲解 NumPy 并开始使用 NumPy 创建 ndarray 之前，我们需要在 Python 中导入 NumPy。我们可以使用 import 命令在 Python 中导入软件包。通常，我们使用 np 导入 NumPy。因此，你可以在 Jupyter notebook 中输入以下命令，导入 NumPy：

import numpy as np

我们可以通过多种方式在 NumPy 中创建 ndarray。我们将学习创建 ndarray 的两种方式：

使用普通的 Python 列表
使用内置 NumPy 函数

在此部分，我们将通过向 NumPy np.array() 函数提供 Python 列表创建 ndarray。对于初学者来说，这种方法可能会造成困惑，请务必注意，np.array() 不是类，它只是一个返回 ndarray 的函数。要阐明的是，用到的示例都将使用简单的小型 ndarray。我们开始创建一维 ndarray 吧。

# We import NumPy into Python
import numpy as np

# We create a 1D ndarray that contains only integers
x = np.array([1, 2, 3, 4, 5])

# Let's print the ndarray we just created using the print() command
print('x = ', x)

x = [1 2 3 4 5]

我们先暂停一下，了解一些实用的术语。我们将一维数组称之为秩为 1 的数组。通常，N 维数组的秩为 N。因此，二维数组称为秩为 2 的数组。数组的另一个重要特性是形状。数组的形状是指每个维度的大小。例如，秩为 2 的数组的形状对应于数组的行数和列数。你将发现，NumPy ndarray 具有特殊的属性，使我们能够非常直观地获取关于 ndarray 的信息。例如，可以通过 .shape 属性获取 ndarray 的形状。shape 属性返回一个由 n 个正整数（用于指定每个维度的大小）组成的元组。在下面的示例中，我们将创建一个秩为 1 的数组，并了解如何获取其形状、类型和元素数据类型 (dtype)。

# We create a 1D ndarray that contains only integers
x = np.array([1, 2, 3, 4, 5])

# We print x
print()
print('x = ', x)
print()

# We print information about x
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = [1 2 3 4 5]

x has dimensions: (5,)
x is an object of type: class 'numpy.ndarray'
The elements in x are of type: int64

可以看出，shape 属性返回了元组 (5,)，告诉我们 x 的秩为 1（即 x 只有一个维度），并且有 5 个元素。type() 函数告诉我们 x 的确是 NumPy ndarray。最后，.dtype 属性告诉我们 x 的元素作为有符号 64 位整数存储在内存中。NumPy 的另一个重要优势是能够处理的数据类型比 Python 列表要多。你可以在以下链接中查看 NumPy 支持的所有不同数据类型：

NumPy 数据类型

正如之前提到的，ndarray 还可以存储字符串。我们来看看如何按照之前的相同方式创建一个秩为 1 的字符串 ndarray：向 np.array() 函数提供 Python 字符串列表。

# We create a rank 1 ndarray that only contains strings
x = np.array(['Hello', 'World'])

# We print x
print()
print('x = ', x)
print()

# We print information about x
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = ['Hello' 'World']

x has dimensions: (2,)
x is an object of type: class 'numpy.ndarray' The elements in x are of type: U5

可以看出，shape 属性告诉我们 x 现在只有 2 个元素，虽然 x 现在存储的是字符串，但是 type() 函数告诉我们 x 依然像之前一样是 ndarray。但是，.dtype 属性告诉我们 x 中的元素作为具有 5 个字符的 Unicode 字符串存储在内存中。

请务必注意，Python 列表和 ndarray 之间的最大区别是：与 Python 列表不同的是，ndarray 的所有元素都必须类型相同。因此，虽然我们可以同时使用整数和字符串创建 Python 列表，但是无法在 ndarray 中同时使用这两种类型。如果向 np.array() 函数提供同时具有整数和字符串的 Python 列表，NumPy 会将所有元素解析为字符串。我们可以在下面的示例中见到这种情况：

# We create a rank 1 ndarray from a Python list that contains integers and strings
x = np.array([1, 2, 'World'])

# We print the ndarray
print()
print('x = ', x)
print()

# We print information about x
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = ['1' '2' 'World']

x has dimensions: (3,)
x is an object of type: 'numpy.ndarray' 类 The elements in x are of type: U21

可以看出，虽然 Python 列表具有不同的数据类型，但是 x 中的元素类型都一样，即具有 21 个字符的 Unicode 字符串。在 NumPy 简介的剩余部分，我们将不使用存储字符串的 ndarray，但是请注意，ndarray 也可以存储字符串。

现在看看如何利用嵌套 Python 列表创建秩为 2 的 ndarray。

# We create a rank 2 ndarray that only contains integers
Y = np.array([[1,2,3],[4,5,6],[7,8,9], [10,11,12]])

# We print Y
print()
print('Y = \n', Y)
print()

# We print information about Y
print('Y has dimensions:', Y.shape)
print('Y has a total of', Y.size, 'elements')
print('Y is an object of type:', type(Y))
print('The elements in Y are of type:', Y.dtype)

Y =
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]

Y has dimensions: (4, 3)
Y has a total of 12 elements Y is an object of type: class 'numpy.ndarray'
The elements in Y are of type: int64

可以看出，现在 shape 属性返回元组 (4,3)，告诉我们 Y 的秩为 2，有 4 行 3 列。.size 属性告诉我们 Y 共有 12 个元素。

注意，当 NumPy 创建 ndarray 时，它会自动根据用于创建 ndarray 的元素的类型为其分配 dtype。到目前为止，我们只创建了包含整数和字符串的 ndarray。我们发现，当我们创建只有整数的 ndarray 时，NumPy 将自动为其元素分配 dtype int64。我们来看看当我们创建具有浮点数和整数的 ndarray 时，会发生什么。

# We create a rank 1 ndarray that contains integers
x = np.array([1,2,3])

# We create a rank 1 ndarray that contains floats
y = np.array([1.0,2.0,3.0])

# We create a rank 1 ndarray that contains integers and floats
z = np.array([1, 2.5, 4])

# We print the dtype of each ndarray
print('The elements in x are of type:', x.dtype)
print('The elements in y are of type:', y.dtype)
print('The elements in z are of type:', z.dtype)

The elements in x are of type: int64
The elements in y are of type: float64
The elements in z are of type: float64

可以看出，当我们创建只有浮点数的 ndarray 时，NumPy 将元素当做 64 位浮点数 (float64) 存储在内存中。但是，当我们创建同时包含浮点数和整数的 ndarray 时，就像上面的 z ndarray，NumPy 也会为其元素分配 float64 dtype。这叫做向上转型。因为 ndarray 的所有元素都必须类型相同，因此在这种情况下，NumPy 将 z 中的整数向上转型为浮点数，避免在进行数学计算时丢失精度。虽然 NumPy 自动为 ndarray 选择 dtype，但是 NumPy 也允许你指定要为 ndarray 的元素分配的特定 dtype。当你在 np.array() 函数中创建 ndarray 时，可以使用关键字 dtype 指定 dtype。我们来看一个示例：

# We create a rank 1 ndarray of floats but set the dtype to int64
x = np.array([1.5, 2.2, 3.7, 4.0, 5.9], dtype = np.int64)

# We print x
print()
print('x = ', x)
print()

# We print the dtype x
print('The elements in x are of type:', x.dtype)

x = [1 2 3 4 5]

The elements in x are of type: int64

可以看出，虽然用浮点数创建了 ndarray，但是通过将 dtype 指定为 int64，NumPy 通过去除小数将浮点数转换成了整数。如果你不希望 NumPy 意外地选择错误的数据类型，或者你只希望达到一定的计算精度，从而节省内存，则指定 ndarray 的数据类型很有用。

创建 ndarray 后，你可能需要将其保存到文件中，以便以后读取该文件或供另一个程序使用。NumPy 提供了一种将数组保存到文件中以供日后使用的方式。我们来看看操作方式。

# We create a rank 1 ndarray
x = np.array([1, 2, 3, 4, 5])

# We save x into the current directory as 
np.save('my_array', x)

上述代码将 x ndarray 保存到叫做 my_array.npy 的文件中。你可以使用 load() 函数将保存的 ndarray 加载到变量中。

# We load the saved array from our current directory into variable y
y = np.load('my_array.npy')

# We print y
print()
print('y = ', y)
print()

# We print information about the ndarray we loaded
print('y is an object of type:', type(y))
print('The elements in y are of type:', y.dtype)

y = [1 2 3 4 5]

y is an object of type: class 'numpy.ndarray' The elements in y are of type: int64

从文件中加载数组时，确保包含文件名和扩展名 .npy，否则将出错。

使用内置函数创建 ndarray

NumPy 的一个非常节省时间的功能是使用内置函数创建 ndarray。借助这些函数，我们只需编写一行代码就能创建某些类型的 ndarray。以下是一些创建 ndarray 的最实用内置函数，你在进行 AI 编程时将遇到这些函数。

`np.zeros()` 函数

我们先创建一个具有指定形状的 ndarray，其中的元素全是 0。为此，我们可以使用 np.zeros() 函数。函数 np.zeros(shape) 会创建一个全是 0 并且为给定形状的 ndarray。因此，例如如果你想创建一个秩为 2 的数组，其中包含 3 行和 4 列，你将以 (行, 列) 的形式将该形状传递给函数，如以下示例所示：

# We create a 3 x 4 ndarray full of zeros. 
X = np.zeros((3,4))

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)

X =
[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]

X has dimensions: (3, 4)
X is an object of type: class 'numpy.ndarray'
The elements in X are of type: float64

可以看出，np.zeros() 函数默认地创建一个 dtype 为 float64 的数组。你可以使用关键字 dtype 更改数据类型。

`np.ones()` 函数

同样，我们可以创建一个具有指定形状的 ndarray，其中的元素全是 1。为此，我们可以使用 np.ones() 函数。和 np.zeros() 函数一样，np.ones() 函数会用一个参数来指定你要创建的 ndarray 的形状。我们来看一个示例：

# We create a 3 x 2 ndarray full of ones. 
X = np.ones((3,2))

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)

X =
[[ 1. 1.]
[ 1. 1.]
[ 1. 1.]]

X has dimensions: (3, 2)
X is an object of type: class 'numpy.ndarray'
The elements in X are of type: float64

可以看出，np.ones() 函数也默认地创建一个 dtype 为 float64 的数组。你可以使用关键字 dtype 更改数据类型。

`np.full()` 函数

我们还可以创建一个具有指定形状的 ndarray，其中的元素全是我们想指定的任何数字。为此，我们可以使用 np.full() 函数。np.full(shape, constant value) 函数有两个参数。第一个参数是你要创建的 ndarray 的形状，第二个参数是你要向数组中填充的常数值。我们来看一个示例：

# We create a 2 x 3 ndarray full of fives. 
X = np.full((2,3), 5) 

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)

X =
[[5 5 5]
[5 5 5]]

X has dimensions: (2, 3)
X is an object of type: class 'numpy.ndarray'
The elements in X are of type: int64

np.full() 函数默认地创建一个数据类型和用于填充数组的常数值相同的数组。你可以使用关键字 dtype 更改数据类型。

`np.repeat()` 函数

np.repeat(‘red’, 5) array([red,red,red,red,red])

`np.eye(N)`

稍后你将发现，线性代数中的基本数组是单位矩阵。单位矩阵是主对角线上全是 1，其他位置全是 0 的方形矩阵。函数 np.eye(N) 会创建一个对应于单位矩阵的方形 N x N ndarray。因为所有单位矩阵都是方形，因此，np.eye() 函数仅接受一个整数作为参数。我们来看一个示例：

# We create a 5 x 5 Identity matrix. 
X = np.eye(5)

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)

X =
[[ 1. 0. 0. 0. 0.]
[ 0. 1. 0. 0. 0.]
[ 0. 0. 1. 0. 0.]
[ 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 1.]]

X has dimensions: (5, 5)
X is an object of type: class 'numpy.ndarray'
The elements in X are of type: float64

`np.diag()--`创建对角矩阵

可以看出，np.eye() 函数也默认地创建一个 dtype 为 float64 的数组。你可以使用关键字 dtype 更改数据类型。你将在这门课程的线性代数部分深入学习单位矩阵及其用途。我们还可以使用 np.diag() 函数创建对角矩阵。对角矩阵是仅在主对角线上有值的方形矩阵。np.diag() 函数会创建一个对应于对角矩阵的 ndarray，如以下示例所示：

# Create a 4 x 4 diagonal matrix that contains the numbers 10,20,30, and 50
# on its main diagonal
X = np.diag([10,20,30,50])

# We print X
print()
print('X = \n', X)
print()

X =
[[10 0 0 0]
[ 0 20 0 0]
[ 0 0 30 0]
[ 0 0 0 50]]

`np.arange()--`创建均匀分布的 ndarray

NumPy 还允许你创建在给定区间内值均匀分布的 ndarray。NumPy 的np.arange() 函数非常强大，可以传入一个参数、两个参数或三个参数。下面将介绍每种情况，以及如何创建不同种类的 ndarray。

先仅向 np.arange() 中传入一个参数。如果只传入一个参数，np.arange(N) 将创建一个秩为 1 的 ndarray，其中包含从 0 到 N - 1 的连续整数。因此，注意，如果我希望数组具有介于 0 到 9 之间的整数，则需要将 N 设为 10，而不是将 N 设为 9，如以下示例所示：

# We create a rank 1 ndarray that has sequential integers from 0 to 9
x = np.arange(10)

# We print the ndarray
print()
print('x = ', x)
print()

# We print information about the ndarray
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = [0 1 2 3 4 5 6 7 8 9]

x has dimensions: (10,)
x is an object of type: class 'numpy.ndarray'
The elements in x are of type: int64

如果传入两个参数，np.arange(start,stop) 将创建一个秩为 1 的 ndarray，其中包含位于半开区间 [start, stop) 内并均匀分布的值。也就是说，均匀分布的数字将包括 start 数字，但是不包括 stop 数字。我们来看一个示例

# We create a rank 1 ndarray that has sequential integers from 4 to 9. 
x = np.arange(4,10)

# We print the ndarray
print()
print('x = ', x)
print()

# We print information about the ndarray
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = [4 5 6 7 8 9]

x has dimensions: (6,)
x is an object of type: class 'numpy.ndarray'
The elements in x are of type: int64

可以看出，函数 np.arange(4,10) 生成了一个包含 4 但是不含 10 的整数序列。

最后，如果传入三个参数，np.arange(start,stop,step) 将创建一个秩为 1 的 ndarray，其中包含位于半开区间 [start, stop) 内并均匀分布的值，step 表示两个相邻值之间的差。我们来看一个示例：

# We create a rank 1 ndarray that has evenly spaced integers from 1 to 13 in steps of 3.
x = np.arange(1,14,3)

# We print the ndarray
print()
print('x = ', x)
print()

# We print information about the ndarray
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = [ 1 4 7 10 13]

x has dimensions: (5,)
x is an object of type: class 'numpy.ndarray'
The elements in x are of type: int64

可以看出，x 具有在 1 和 13 之间的序列整数，但是所有相邻值之间的差为 3。

`np.linspace(start, stop, N)`

虽然 np.arange() 函数允许间隔为非整数，例如 0.3，但是由于浮点数精度有限，输出通常不一致。因此，如果需要非整数间隔，通常建议使用函数 np.linspace()。np.linspace(start, stop, N)函数返回 N 个在闭区间 [start, stop] 内均匀分布的数字。即 start 和 stop 值都包括在内。此外注意，在调用 np.linspace() 函数时，必须至少以 np.linspace(start,stop) 的形式传入两个参数。在此示例中，指定区间内的默认元素数量为 N= 50。np.linspace() 比 np.arange() 效果更好，是因为 np.linspace() 使用我们希望在特定区间内的元素数量，而不是值之间的间隔。我们来看一些示例：

# We create a rank 1 ndarray that has 10 integers evenly spaced between 0 and 25.
x = np.linspace(0,25,10)

# We print the ndarray
print()
print('x = \n', x)
print()

# We print information about the ndarray
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = [ 0. 2.77777778 5.55555556 8.33333333 11.11111111 13.88888889 16.66666667 19.44444444 22.22222222 25. ]

x has dimensions: (10,)
x is an object of type: class 'numpy.ndarray'
The elements in x are of type: float64

从上述示例中可以看出，函数 np.linspace(0,25,10) 返回一个 ndarray，其中包含 10 个在闭区间 [0, 25] 内均匀分布的元素。还可以看出，在此示例中，起始和结束点 0 和 25 都包含在内。但是，可以不包含区间的结束点（就像 np.arange() 函数一样），方法是在 np.linspace() 函数中将关键字 endpoint 设为 False 。我们创建和上面一样的 x ndarray，但是这次不包含结束点：

# We create a rank 1 ndarray that has 10 integers evenly spaced between 0 and 25,
# with 25 excluded.
x = np.linspace(0,25,10, endpoint = False)

# We print the ndarray
print()
print('x = ', x)
print()

# We print information about the ndarray
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

x = [ 0. 2.5 5. 7.5 10. 12.5 15. 17.5 20. 22.5]

x has dimensions: (10,)
x is an object of type: class 'numpy.ndarray'
The elements in x are of type: float64

可以看出，因为排除了结束点，值之间的间隔需要更改，因为需要在给定区间内填充 10 个均匀分布的数字。

`np.reshape()`

目前为止，我们仅使用了内置函数 np.arange() 和 np.linspace() 来创建秩为 1 的 ndarray。但是，我们可以将这些函数与 np.reshape() 函数相结合，创建秩为 2 的任何形状 ndarray。np.reshape(ndarray, new_shape) 函数会将给定 ndarray 转换为指定的 new_shape。请务必注意：new_shape 应该与给定 ndarray 中的元素数量保持一致。例如，你可以将秩为 1 的 6 元素 ndarray 转换为秩为 2 的 3 x 2 ndarray，或秩为 2 的 2 x 3 ndarray，因为这两个秩为 2 的数组元素总数都是 6 个。但是，你无法将秩为 1 的 6 元素 ndarray 转换为秩为 2 的 3 x 3 ndarray，因为这个秩为 2 的数组将包含 9 个元素，比原始 ndarray 中的元素数量多。我们来看一些示例：

# We create a rank 1 ndarray with sequential integers from 0 to 19
x = np.arange(20)

# We print x
print()
print('Original x = ', x)
print()

# We reshape x into a 4 x 5 ndarray 
x = np.reshape(x, (4,5))

# We print the reshaped x
print()
print('Reshaped x = \n', x)
print()

# We print information about the reshaped x
print('x has dimensions:', x.shape)
print('x is an object of type:', type(x))
print('The elements in x are of type:', x.dtype)

Original x = [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]

Reshaped x =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]

x has dimensions: (4, 5)
x is an object of type: class 'numpy.ndarray'
The elements in x are of type: int64

NumPy 的一大特性是某些函数还可以当做方法使用。这样我们便能够在一行代码中按顺序应用不同的函数。ndarray 方法和 ndarray 属性相似，它们都使用点记法 (.)。我们来看看如何只用一行代码实现上述示例中的相同结果：

# We create a a rank 1 ndarray with sequential integers from 0 to 19 and
# reshape it to a 4 x 5 array 
Y = np.arange(20).reshape(4, 5)

# We print Y
print()
print('Y = \n', Y)
print()

# We print information about Y
print('Y has dimensions:', Y.shape)
print('Y is an object of type:', type(Y))
print('The elements in Y are of type:', Y.dtype)

Y =
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]
[15 16 17 18 19]]

Y has dimensions: (4, 5)
Y is an object of type: class 'numpy.ndarray' The elements in Y are of type: int64

可以看出，我们获得了和之前完全一样的结果。注意，当我们将 reshape() 当做方法使用时，它应用为 ndarray.reshape(new_shape)。这样会将 ndarray 转换为指定形状 new_shape。和之前一样，请注意，new_shape 应该与 ndarray 中的元素数量保持一致。在上述示例中，函数 np.arange(20) 创建了一个 ndarray 并当做将被 reshape() 方法调整形状的 ndarray。因此，如果将 reshape() 当做方法使用，我们不需要将 ndarray 当做参数传递给 reshape() 函数，只需传递 new_shape 参数。

同样，我们也可以使用 reshape() 与 np.linspace() 创建秩为 2 的数组，如以下示例所示。

# We create a rank 1 ndarray with 10 integers evenly spaced between 0 and 50,
# with 50 excluded. We then reshape it to a 5 x 2 ndarray
X = np.linspace(0,50,10, endpoint=False).reshape(5,2)

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)

X =
[[ 0. 5.]
[ 10. 15.]
[ 20. 25.]
[ 30. 35.]
[ 40. 45.]]

X has dimensions: (5, 2)
X is an object of type: class 'numpy.ndarray' The elements in X are of type: float64

我们将创建的最后一种 ndarray 是随机 ndarray。随机 ndarray 是包含随机数字的数组。在机器学习中，通常需要创建随机指标，例如，在初始化神经网络的权重时。NumPy 提供了各种随机函数来帮助我们创建任何形状的随机 ndarray。

`np.random模块`

np.random模块：https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.random.html

我们先使用 np.random.random(shape) 函数创建具有给定形状的 ndarray，其中包含位于半开区间 [0.0, 1.0) 内的随机浮点数。

# We create a 3 x 3 ndarray with random floats in the half-open interval [0.0, 1.0).
X = np.random.random((3,3))

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in x are of type:', X.dtype)

X =
[[ 0.12379926 0.52943854 0.3443525 ]
[ 0.11169547 0.82123909 0.52864397]
[ 0.58244133 0.21980803 0.69026858]]

X has dimensions: (3, 3)
X is an object of type: class 'numpy.ndarray' The elements in X are of type: float64

`np.random.randint(start, stop, size = shape)`

NumPy 还允许我们创建由特定区间内的随机整数构成的 ndarray。函数 np.random.randint(start, stop, size = shape) 会创建一个具有给定形状的 ndarray，其中包含在半开区间 [start, stop) 内的随机整数。我们来看一个示例：

# We create a 3 x 2 ndarray with random integers in the half-open interval [4, 15).
X = np.random.randint(4,15,size=(3,2))

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)

X =
[[ 7 11]
[ 9 11]
[ 6 7]]

X has dimensions: (3, 2)
X is an object of type: class 'numpy.ndarray' The elements in X are of type: int64

`np.random.normal(mean, standard deviation, size=shape)`

在某些情况下，你可能需要创建由满足特定统计学特性的随机数字组成的 ndarray。例如，你可能希望 ndarray 中的随机数字平均值为 0。NumPy 使你能够创建从各种概率分布中抽样的数字组成的随机 ndarray。例如，函数 np.random.normal(mean, standard deviation, size=shape) 会创建一个具有给定形状的 ndarray，其中包含从正态高斯分布（具有给定均值和标准差）中抽样的随机数字。我们来创建一个 1,000 x 1,000 ndarray，其中包含从正态分布（均值为 0，标准差为 0.1）中随机抽样的浮点数。

# We create a 1000 x 1000 ndarray of random floats drawn from normal (Gaussian) distribution
# with a mean of zero and a standard deviation of 0.1.
X = np.random.normal(0, 0.1, size=(1000,1000))

# We print X
print()
print('X = \n', X)
print()

# We print information about X
print('X has dimensions:', X.shape)
print('X is an object of type:', type(X))
print('The elements in X are of type:', X.dtype)
print('The elements in X have a mean of:', X.mean())
print('The maximum value in X is:', X.max())
print('The minimum value in X is:', X.min())
print('X has', (X < 0).sum(), 'negative numbers')
print('X has', (X > 0).sum(), 'positive numbers')

X =
[[ 0.04218614 0.03247225 -0.02936003 ..., 0.01586796 -0.05599115 -0.03630946]
[ 0.13879995 -0.01583122 -0.16599967 ..., 0.01859617 -0.08241612 0.09684025]
[ 0.14422252 -0.11635985 -0.04550231 ..., -0.09748604 -0.09350044 0.02514799]
...,
[-0.10472516 -0.04643974 0.08856722 ..., -0.02096011 -0.02946155 0.12930844]
[-0.26596955 0.0829783 0.11032549 ..., -0.14492074 -0.00113646 -0.03566034]
[-0.12044482 0.20355356 0.13637195 ..., 0.06047196 -0.04170031 -0.04957684]]

X has dimensions: (1000, 1000)
X is an object of type: class 'numpy.ndarray' The elements in X are of type: float64
The elements in X have a mean of: -0.000121576684405
The maximum value in X is: 0.476673923106
The minimum value in X is: -0.499114224706 X 具有 500562 个负数 X 具有 499438 个正数

可以看出，ndarray 中的随机数字的平均值接近 0，X 中的最大值和最小值与 0（平均值）保持对称，正数和负数的数量很接近。

`np.random.choice(start, stop, size = shape)`

NumPy 是 Python 科学计算的基础包，其部分功能如下：

Ndarray，快速高效的多维数组对象
直接对数组执行数学运算及对数组执行元素级计算的函数
线性代数运算、随机数生成以及傅里叶变换功能
将 C、C++、Fortran 代码集成到 Python 的工具

NumPy的ndarry：一种多维数组对象（NumPy最重要的一个特点），该对象是一个快速而灵活的大数据集容器。

ndarry中的所有元素必须是同种类型的，每个数组都有一个shape（一个表示各维度大小的元组）和一个dtype（一个用于说明数组类型的对象）

一、首先创建ndarrayy

方法1：使用array函数

import numpy as np
#创建ndarray
data1=[6, 7.5, 8, 0, 1]  #列表
arr1=np.array(data1) #array函数接受一切序列性的对象
arr1
array([6. , 7.5, 8. , 0. , 1. ])

data2=[[1, 2, 3, 4],[5, 6, 7, 8]]  #嵌套序列，二维列表
arr2=np.array(data2)  #二维数组
arr2 
array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

arr2.shape
(2, 4)

arr2.dtype
dtype('int32')

方法2：使用一些数组创建函数

如zeros和ones函数分别可以创建指定长度或形状的全0或全1数组，要用它们创建多维数组，只需传入一个表示形状的元组即可：

np.zeros(10)
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

np.zeros((3,6))
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

np.arange(15) #返回的是ndarray
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

np.ones((3,6),dtype=np.float)
array([[1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1., 1.]])

ndarray的数据类型转换

arr1=np.array([1,2,3],dtype=np.float64)
arr1.dtype
dtype('float64')

arr2=np.array([1,2,3],dtype=np.int32)
arr2.dtype
dtype('int32')

也可以通过ndarray的astype方法显式地转换其dtype：

arr=np.array([1,2,3,4,5])
arr.dtype
dtype('int32')

float_arr=arr.astype(np.float64)
float_arr.dtype
dtype('float64')

二、数组和标量之间的运算

数组可以让你不用编写循环即可对数据进行批量运算。这通常叫做矢量化（vectorzation），大小相等的数组之间的任何算术运算都会将运算应用到元素级。

arr=np.array([[1,2,3],[4,5,6]])
arr
array([[1, 2, 3],
       [4, 5, 6]])

arr*arr
array([[ 1,  4,  9],
       [16, 25, 36]])

arr-arr
array([[0, 0, 0],
       [0, 0, 0]])

1/arr  #数组与标量的运算，也会将标量值传播到各个元素
array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

不同大小的数组之间的运算叫做传播（broadcasting）。

三、基本的索引和切片

NumPy数组的索引是一个丰富的主题，因为选取数据子集或单个元素的方式有很多。（嗯？）

一维数组：

arr=np.arange(10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr[5]
5

arr[5:8]
array([5, 6, 7])

arr[5:8]=12
arr
array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

arr_slice=arr[5:8]
arr_slice[1]=12345
arr
array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

对于高维度数组：在一个二维数组中，各索引位置上的元素不再是标量而是一维数组：

arr2d=np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d[2]
array([7, 8, 9])

arr2d[0][2] #对单个元素进行递归访问
3

arr2d[0,2]
3

在多维数组中，如果省略了后面的索引，则返回对象会是一个维度低一点的ndarray（它含有高一维度上的所有数据），因此在2*2*3数组arr3d中：

arr3d=np.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,11,12]]])
arr3d
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

arr3d[0] #2*3数组
array([[1, 2, 3],
       [4, 5, 6]])

标量值和数组都可赋值给arr3d[0]:

切片索引

对于二维数组，可以看出切片是沿着一个轴向选取元素。一次可以传入多个切片，就像传入索引那样。

arr2d=np.array([[1,2,3],[4,5,6],[7,8,9]]) 
arr2d
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

arr2d[:2] #这种是切片
array([[1, 2, 3],
       [4, 5, 6]])

arr2d[:2,1:]
array([[2, 3],
       [5, 6]])

arr2d[1,:2]
array([4, 5])

arr2d[2,:1]
array([7])

arr2d[:,:1] #只有冒号，表示选取整个轴
array([[1],
       [4],
       [7]])

arr2d[:2,1:]=0 #对切片赋值
arr2d
array([[1, 0, 0],
       [4, 0, 0],
       [7, 8, 9]])

四、数组转置和轴对换

转置（transpose）是重塑的一种特殊形式，它返回的是源数据的视图？？？（不会进行任何复制操作）

1.特殊的T属性（简单的转置使用，就相当于进行轴对换）

arr=np.arange(15).reshape((3,5))
arr
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

arr.T
array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

arr=np.random.randn(6,3)
np.dot(arr.T,arr) #np.dot计算矩阵內积XTX
array([[ 4.73440561, -1.80831405,  1.97638433],
       [-1.80831405,  6.76842567, -2.01967333],
       [ 1.97638433, -2.01967333,  1.69620536]])

2.transpose

对于高维数组，transpose需要得到一个由轴编号组成的元组才能对这些轴进行转置。（有点难理解）

取个例子如下：

arr=np.arange(16).reshape((2,2,4))
arr
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

arr.transpose((1,0,2))
array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

arr.transpose((1,2,0))
array([[[ 0,  8],
        [ 1,  9],
        [ 2, 10],
        [ 3, 11]],

       [[ 4, 12],
        [ 5, 13],
        [ 6, 14],
        [ 7, 15]]])

arr.transpose((1,0,2))的1,0,2三个数分别代表shape()的三个数的顺序，初始的shape是（2,2,4），也就是2维的2 x 4矩阵，索引分别是shape的[0],[1],[2]，arr.transpose((1,0,2))之后，我们的索引就变成了shape[1][0][2],对应shape值是shape(2,2,4)，所以矩阵形状不变；arr.transpose((1,2,0))之后，我们的索引就变成了shape[1][2][0]，对应shape值是shape(2,4,2)，即2维的2 x 4矩阵，比如转置前4的索引为（0,1,0）,arr.transpose((1,0,2))之后为（1,0,0）

3.多维数组，还有个swapaxes方法，它只需接受一对轴编号。此时还是对shape索引进行更改。

arr=np.arange(16).reshape((2,2,4))
arr
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7]],

       [[ 8,  9, 10, 11],
        [12, 13, 14, 15]]])

arr.swapaxes(1,2) #shape[1][2]位置互换，即变换之前4的索引为（0,1,0）,swapaxes(1,2)之后为（0,0,1）

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

111111