《Python数据科学手册》---Numpy学习笔记(万字)-CSDN博客

本文链接：https://blog.csdn.net/weixin_74085818/article/details/141155088

Python数据分析三大件基础入门已经跟新完毕

其余两篇如下：
Matplotlib：Matplotlib基础入门
Pandas: 机器学习/数据分析–Pandas常用50个基础操作
欢迎收藏 + 点赞 + 关注，下一步将更新机器学习/数据分析相关案例

前言

该numpy是学习《Python数据科学手册》的时候做的笔记，课本图如下
该书结合Python的底层(C语言)角度逐步进行解释，详细介绍了numpy的基础知识。
详细请看目录，制作不易，欢迎点赞+关注+收藏

在这里插入图片描述

文章目录

查看numpy的版本

import numpy as np;
np.__version__

结果：

'1.26.4'

理解Python数据类型

Python列表不仅仅是一个列表

python中可变多元素容器的是列表

L = list(range(10))
L

结果：

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

type(L[0])

结果：

int

L2 = [str(c) for c in range(10)]
L2

结果：

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

type(L2[0])

结果：

str

由于列表是动态类型，我们可以创建异构列表：

L3 = [True, 1, 'wy']
L3

结果：

[True, 1, 'wy']

t1 = type(L3[0])
t2 = type(L3[1])
print(f't1: {t1}, t2: {t2}')

t1: <class 'bool'>, t2: <class 'int'>

在python中，列表里面元素无论是一样的还是不同的，元素储存都不是单单存储它本身类型，还会存储其他附加信息，这也说明内存开销较大的特点。在numpy中可以创建动态内存和固定元素风格的存储方式，如下：

import numpy as np
np.array([1,2,3,4])

结果：

array([1, 2, 3, 4])

因为每个列表元素都是包含数据和类型信息的完整结构，所以列表可以填充任何所需类型的数据。而固定类型numpy风格的数组缺乏这种灵活性，但在存储和操作数据方面要高效得多。

Python固定数组

import array
L = list(range(10))
A = array.array('i', L)
A

结果：

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

'i’代表整形，但是numpy创建数组效率更高

numpy 创建列表开始创建

#头文件
import numpy as np

api: np.array

np.array([range(10)])

结果：

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])

类型不同的时候，会按照规则制动提升类型，如：整形提升为浮点型

A = np.array([3.14, 2, 6, 7])
type(A[2])  #输出为浮点型

结果：

numpy.float64

dtype: 显示类型关键字，可以显示类型

np.array([1,2,3,4,5], dtype='float')

结果：

array([1., 2., 3., 4., 5.])

numpy还可以创建多维数组

np.array([range(i, i + 3) for i in [2, 4, 6]])

结果：

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

numpy 从0开始创建

数据量很大的时候，从0开始创建会更有效，例子如下：

#创建全0数组
np.zeros(10, dtype=int)

结果：

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

#创建全为1的数组
np.ones((2,3), dtype=int)

结果：

array([[1, 1, 1],
       [1, 1, 1]])

#创建全一样值的数组
np.full((2,3), 3.14)

结果：

array([[3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14]])

#创建用线性序列填充的数组
np.arange(1,20,2)

结果：

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

#在1-2的范围内创建5个数组
np.linspace(1,2,5)

结果：

array([1.  , 1.25, 1.5 , 1.75, 2.  ])

#创建随机数，数值在0-1
np.random.random((3,3))

结果：

array([[0.91873672, 0.64912807, 0.01946589],
       [0.85384851, 0.65577145, 0.06647055],
       [0.00466619, 0.42542704, 0.2041542 ]])

#创建一个3*3矩阵，数值符合正态分布，平均值为0，标准差为1
np.random.normal(0,1,(3,3))

结果：array([[-0.48149252,  0.78678621,  1.8334589 ],
       [ 0.11196594, -1.12348086,  1.44976757],
       [-1.16345692, -0.36978232, -0.14611997]])

#创建一个3*3矩阵，数值范围在0-10
np.random.randint(0,10,(3,3))

结果：

array([[4, 0, 7],
       [0, 1, 6],
       [4, 4, 5]])

#创建单位矩阵
np.eye(3)

结果：

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

#创建多维空矩阵,  numpy.empty() 函数可以帮助我们在开始添加元素前，先占用需要的内存。
np.empty((3,3))

结果：

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

numpy标准数据类型

在这里插入图片描述

Numpy数组基础

Python所有元素在numpy、pandas都是以数组的形式存储，相当于线性代数中的矩阵，我们将介绍几个数组基本操作：
1、数组的属性：决定数组的大小、形状、内存消耗和数据类型
2、索引数组：获取和设置单个数组元素的值
3、数组的切片：获取子数组
4、数组的重塑：改变数组的形状，如数组的合并于拆分

Numpy数组的属性

前置：使用Numpy的随机数生成器，生成n维数组
讨论：一维、二维和三维

import numpy as np
#设置随机种子
rng = np.random.default_rng(seed=1701)

#生成n 维度[0,10)之间的数字
x1 = rng.integers(10, size=6) #一维数组
x2 = rng.integers(10, size=(3,4)) #二维数组
x3 = rng.integers(10, size=(3,4,5)) #三维数组
x3

结果：

array([[[4, 3, 5, 5, 0],
        [8, 3, 5, 2, 2],
        [1, 8, 8, 5, 3],
        [0, 0, 8, 5, 8]],

       [[5, 1, 6, 2, 3],
        [1, 2, 5, 6, 2],
        [5, 2, 7, 9, 3],
        [5, 6, 0, 2, 0]],

       [[2, 9, 4, 3, 9],
        [9, 2, 2, 4, 0],
        [0, 3, 0, 0, 2],
        [3, 2, 7, 4, 7]]], dtype=int64)

每个数组属性：数组维度、每个维度大小、数组总大小、数组数据类型

#x3为例
print(f'x3 维度: {x3.ndim}')
print(f'x3 维度大小: {x3.shape}')
print(f'x3 数组总大小: {x3.size}')
print(f'x3 数组数据类型： {x3.dtype}')

x3 维度: 3
x3 维度大小: (3, 4, 5)
x3 数组总大小: 60
x3 数组数据类型： int64

数组索引，访问单个元素

在Numpy数组索引从0开始，既数组下标从0开始

x1

结果：

array([9, 4, 0, 3, 8, 6], dtype=int64)

# 第一位数字
x1[0]

结果：

索引从数组的后面开始，为负数

x1[-1]

结果：

二维数组索引，中间逗号隔开，三维及以上，以此类推

x2

结果：

array([[3, 1, 3, 7],
       [4, 0, 2, 3],
       [0, 0, 6, 9]], dtype=int64)

x2[0, 0]

结果：

通过索引来改变数组值

x2[0,0] = 9

x2

结果：

array([[9, 1, 3, 7],
       [4, 0, 2, 3],
       [0, 0, 6, 9]], dtype=int64)

注意：numpy是固定类型数组，如果在x2中复制一个值为浮点类型的值，则结果会被折断

x2[1, 1] = 3.14

x2

结果：

array([[9, 1, 3, 7],
       [4, 3, 2, 3],
       [0, 0, 6, 9]], dtype=int64)

复合索引

import numpy as np
rng = np.random.default_rng(seed = 1701)
x = rng.integers(0, 10, (3, 4))
x

结果：

array([[9, 4, 0, 3],
       [8, 6, 3, 1],
       [3, 7, 4, 0]], dtype=int64)

b = x[2, [1,2,3]]
b

结果：

array([7, 4, 0], dtype=int64)

数组切片，访问子数组

语法：x[start : stop : step]，既[开始，结束，步长]，左闭右开，如果其中任何一个未指定，则默认值为start=0, stop=(size of dimension)， step=1。

一维子数组

x1

结果：

array([9, 4, 0, 3, 8, 6], dtype=int64)

#要前三个元素
x1[:3]

结果：

array([9, 4, 0], dtype=int64)

#要索引为3后面的元素
x1[3:]

结果：

array([3, 8, 6], dtype=int64)

#中间截取
x1[1 : 3]

结果：

array([4, 0], dtype=int64)

#每隔两个元素截取
x1[1::2]  # 索引1 : 末尾 : 间隔为2

结果：

array([4, 3, 6], dtype=int64)

注意：在numpy中截取子数组，如果步长为负数的时候，默认会将start和stop调换，同时这样成为反转数组的快捷方式

x1[::-1]   #默认调换

结果：

array([6, 8, 3, 0, 4, 9], dtype=int64)

x1[4::-1]

结果：

array([8, 3, 0, 4, 9], dtype=int64)

x1[4: 2 : -1]

结果：

array([8, 3], dtype=int64)

综上，如果步长为负数，则数组，start，stop都反转

多维子数组

x2

结果：

array([[9, 1, 3, 7],
       [4, 3, 2, 3],
       [0, 0, 6, 9]], dtype=int64)

#2行，3列
x2[:2, :3]

结果：

array([[9, 1, 3],
       [4, 3, 2]], dtype=int64)

#三行，每隔两个作为一列
x2[:3, ::2]

结果：

array([[9, 3],
       [4, 2],
       [0, 6]], dtype=int64)

#全部行，列，反转
x2[::-1,::-1]

结果：

array([[9, 6, 0, 0],
       [3, 2, 3, 4],
       [7, 3, 1, 9]], dtype=int64)

访问行，列，通过 :

# 访问前两行
x2[:2, :]

结果：

array([[9, 1, 3, 7],
       [4, 3, 2, 3]], dtype=int64)

#访问前两列
x2[:, :2]

结果：

array([[9, 1],
       [4, 3],
       [0, 0]], dtype=int64)

行访问的情况下，可以用数组

x2[0]

结果：

array([9, 1, 3, 7], dtype=int64)

数组拷贝的是视图

numpy中数组拷贝不用于Python中List对拷贝，numpy是视图拷贝，既改变拷贝数组，原数组也会被修改

print(x2)

[[9 1 3 7]
 [4 3 2 3]
 [0 0 6 9]]

结果：

# 拷贝
x2_view_copy = x2[:2, :3]
print(x2_view_copy)

[[9 1 3]
 [4 3 2]]

结果：

#修改拷贝数组
x2_view_copy[0, 0] = 99
print(x2)
print(x2_view_copy)

[[99  1  3  7]
 [ 4  3  2  3]
 [ 0  0  6  9]]
[[99  1  3]
 [ 4  3  2]]

优点：处理大量数据的时候，无须复制底层缓冲区

创建拷贝数组

尽管数组视图有很好的特性，但有时显式地复制数组或子数组中的数据还是有用的。这可以很容易地用copy方法完成。

x2_sub_copy = x2[:2, :3].copy()
print(x2_sub_copy)

结果：
[[99  1  3]
 [ 4  3  2]]

x2_sub_copy[0,0] = 9
print(x2_sub_copy)
print(x2)

结果：
[[9 1 3]
 [4 3 2]]
[[99  1  3  7]
 [ 4  3  2  3]
 [ 0  0  6  9]]

数组重塑

数组的重塑，这可以通过重塑方法来完成。例如，把数字1到9放在3 * 3矩阵中：

grid = np.arange(1, 10).reshape(3,3)
print(grid)

结果：

[[1 2 3]
 [4 5 6]
 [7 8 9]]

注意，初始数组的大小必须与重塑后的数组的大小匹配，并且在大多数情况下，重塑方法将返回初始数组的无复制视图。

还可以使用切片新特性语法 np.newaxis

x = np.array([1,2,3,4,5])  
print(x.shape)
x = x[np.newaxis, :]  #行增加一个维度
print(x)
print(x.shape)
x = x[:, np.newaxis]
print(x)
print(x.shape)

结果：
(5,)
[[1 2 3 4 5]]
(1, 5)
[[[1 2 3 4 5]]]
(1, 1, 5)

数组连接和分割

数组连接

numpy提供了将多个数组合并为一个数组的工具，也提供了将一个数组拆分为多个数组的工具

x = np.array([1,2,3])
y = np.array([3,2,1])
np.concatenate([x,y])

结果：

array([1, 2, 3, 3, 2, 1])

grid = np.array([[1,2,3],
                 [4,5,6]])
print(np.concatenate([grid, grid]))

结果：

[[1 2 3]
 [4 5 6]
 [1 2 3]
 [4 5 6]]

#沿着第二个轴(零索引)
np.concatenate([grid, grid], axis = 1)

结果：

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

对于高维数组，np.vstack和np.hstack将沿着第三个轴堆叠数组。

np.vstack([x, grid])

结果：

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

y = np.array([[99],
              [99]])
np.hstack([grid, y])

结果：

array([[ 1,  2,  3, 99],
       [ 4,  5,  6, 99]])

分割数组

与串联相反的是拆分，它由函数np.split,np.hsplit和np.vsplit。对于其中的每一个，我们都可以传递一个给出分裂点的索引列表:

x = [1,2,3,99,99,3,2,1]
x1, x2, x3 = np.split(x, [3, 5])  #分割点3，5
print(x1, x2, x3)

结果：
[1 2 3] [99 99] [3 2 1]

grid = np.arange(16).reshape((4,4))
grid

结果：

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

结果：
[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]

upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

结果：
[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]

Numpy计算通用函数

ufunc是universal function的缩写，意思是这些函数能够作用于narray对象的每一个元素上，而不是针对narray对象操作，numpy提供了大量的ufunc的函数。
在这里插入图片描述

ufunc函数很多，用的时候可以查相应资料，如：

聚合:最小值、最大值和介于两者之间的所有值

探索任何数据集的第一步通常是计算各种汇总统计数据。也许最常见的汇总统计是平均值和标准偏差，它们允许您汇总数据集中的“典型”值，但其他聚合也很有用(总和，乘积，中位数，最小值和最大值，分位数等)。

对数组中数字求和

import numpy as np
rng = np.random.default_rng(seed=1701)

%timeit 是一个 IPython 的魔法命令，用于测量代码块的执行时间。它对于性能测试和代码优化特别有用。当你使用 %timeit 命令时，IPython 会多次运行你指定的代码块，并计算平均运行时间，从而为你提供一个相对准确的性能评估。

L = rng.random(100)
np.sum(L)

结果：

47.13890796604283

numpy函数和python自带的函数效率进行比较

test = rng.random(100)
%timeit sum(test)
%timeit np.sum(test)

7.11 μs ± 163 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
3.42 μs ± 475 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

最大值和最小值

mi = min(test)
ma = max(test)
print(f'mi: {mi}, mx: {ma}')

结果：
mi: 0.0033390406641771175, mx: 0.9905671759242861

#numpy中也有
np.min(test)
np.max(test)

结果：

0.9905671759242861

多维聚合

M = rng.integers(0, 10, (3,4))
M

结果：

array([[8, 3, 5, 0],
       [7, 6, 7, 5],
       [4, 3, 6, 5]], dtype=int64)

Numpy聚合适合多维

M.sum()

结果：

聚合函数接受一个附加参数，该参数指定计算聚合的轴。例如，我们可以通过指定axis=0找到每列中的最小值，axis=1可以指定每一列的最小值

r = M.min(axis=0)
c = M.min(axis=1)
print(r)
print(c)

[4 3 5 0]
[0 5 3]

numpy还提供了很多运算函数，具体可以查资料

数组上的计算：广播

广播简介

import numpy as np
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
c = a + b
d = a + 5
print(c, d)

结果：
[ 6  8 10 12] [6 7 8 9]

广播允许在不同大小的数组上执行这些类型的二进制操作——例如，我们可以很容易地向数组中添加标量.

一维数组加到二维数组：

M = np.ones((4, 4))
M

结果：

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

M + a

结果：

array([[2., 3., 4., 5.],
       [2., 3., 4., 5.],
       [2., 3., 4., 5.],
       [2., 3., 4., 5.]])

虽然这些示例相对容易理解，但更复杂的情况可能涉及两个数组的广播。考虑下面的例子:

a = np.arange(3)
b = np.arange(3)[:, np.newaxis]
print(a)
print(b)

结果：
[0 1 2]
[[0]
 [1]
 [2]]

a + b

结果：

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

广播规则

在对两个数组进行操作时，NumPy 会逐个元素比较它们的形状。它从尾随（即最右侧）维度开始，向左进行比较。当满足以下条件时，两个维度兼容
1. 它们相等，或
2. 其中一个为 1。
如果不满足这些条件，则会抛出 ValueError: operands could not be broadcast together 异常，表明数组具有不兼容的形状。

  Cell In[633], line 1
    规则1:如果两个数组的维数不同，那么维数较少的数组的形状将在其前边(左)填充1。
                   ^
SyntaxError: invalid character '，' (U+FF0C)

#例子一：
a = np.arange(3).reshape((3,1))
b = np.arange(3)
print(a)
print(b)  # b广播成 (3,1) ，但是内存中依然是一维数组

结果：
[[0]
 [1]
 [2]]
[0 1 2]

a + b  
# b被广播成
b (broadcasted) =
[[0, 1, 2],
 [0, 1, 2],
 [0, 1, 2]]

结果：

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

#例子2：
a = np.arange(4)
b = np.arange(12).reshape(4,3)
print(a)
print(b)

结果：
[0 1 2 3]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]

a + b  #报错   a广播成 (4,1)维度，不可计算

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[355], line 1
----> 1 a + b

ValueError: operands could not be broadcast together with shapes (4,) (4,3)

就是一维一维加，a变成(4,1)维度，与a不相同

布尔逻辑

作为Ufuncs的比较操作符，例如：

x = np.array([1,2,3,4,5])

x < 3

结果：

array([ True,  True, False, False, False])

x <= 3

结果：

array([ True,  True,  True, False, False])

x != 3

结果：

array([ True,  True, False,  True,  True])

rng = np.random.default_rng()

a = rng.integers(0, 10, (3, 4))
a > 5

结果：

array([[ True, False, False, False],
       [ True, False, False,  True],
       [False, False, False, False]])

支持的运算符有：=、==、!=、<=、>=等

使用Bool数组

x = rng.integers(0, 10, (3, 4))
x

结果：

array([[9, 9, 6, 5],
       [5, 8, 3, 8],
       [0, 8, 9, 0]], dtype=int64)

计算Bool值的数量

np.count_nonzero(x < 6)

结果：

np的好处：像sum可以和其他NumPy聚合函数一样，这个求和也可以沿着行或列进行:

a = np.sum(x < 6)
b = np.sum(x < 3, axis = 1)
print(a)
print(b)

5
[0 0 2]

我们可以使用np.any或np.all:

a = np.any(x > 8)   #只要有一个满足就行
b = np.all(x > 8)
c = np.any(x > 8, axis = 1)
print(a)
print(b)
print(c)

True
False
[ True False  True]

逻辑计算

&、|、^、~ 和C/C++一样

注意：因此，请记住:and和or对整个对象执行单个布尔值计算，而&和|对对象的内容(单个位或字节)执行多个布尔值计算。对于布尔NumPy数组，后者几乎总是理想的操作。

x = np.arange(10)
(x > 4) & (x < 3)

结果：

array([False, False, False, False, False, False, False, False, False,
       False])

(x > 4) and (x < 3)   #错误

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Cell In[373], line 1
----> 1 (x > 4) and (x < 3)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

排序

库函数排序

x = np.array([2, 5, 7, 4, 1, 3, 9, 6])
x

结果：

array([2, 5, 7, 4, 1, 3, 9, 6])

sorted(x)

结果：

[1, 2, 3, 4, 5, 6, 7, 9]

np.sort(x)

结果：

array([1, 2, 3, 4, 5, 6, 7, 9])

分区排序

x = np.array([7, 2, 3, 1, 6, 5, 4])
np.partition(x, 3)

结果：

array([2, 1, 3, 4, 6, 5, 7])

多维排序

X = rng.integers(0, 10, (4, 6))
X

结果：

array([[2, 3, 0, 0, 6, 9],
       [4, 3, 5, 5, 0, 8],
       [3, 5, 2, 2, 1, 8],
       [8, 5, 3, 0, 0, 8]], dtype=int64)

a = np.sort(X, axis=0)
b = np.sort(X, axis=1)
print(a)
print(b)

[[2 3 0 0 0 8]
 [3 3 2 0 0 8]
 [4 5 3 2 1 8]
 [8 5 5 5 6 9]]
[[0 0 2 3 6 9]
 [0 3 4 5 5 8]
 [1 2 2 3 5 8]
 [0 0 3 5 8 8]]

结构化数组

name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 86.5, 68.0, 61.5]

这样numpy数组看不出一个整体，numpy提供了结构化数组操作(自己定义类型)

data = np.zeros(4, dtype={'names':('name','age','weight'),
                          'formats':('U10', 'i4', 'f8')})
data['name'] = name
data['age'] = age
data['weight'] = weight

data[0][0]

结果：

'Alice'

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

矩阵运算

import numpy as np

矩阵乘法，dot函数

A = np.array([[1, 2, 3], [4, 5, 6]])
B = A.T

np.dot(A,B)

结果：

array([[14, 32],
       [32, 77]])

矩阵转置

A.T

结果：

array([[1, 4],
       [2, 5],
       [3, 6]])

求迹

E = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

np.trace(E)

结果：

求解行列式

E = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
F = np.array([[1, 2], [1, 3]])

np.linalg.det(E)

逆矩阵/伴随矩阵

A = np.array([[1, -2, 1], [0, 2, -1], [1, 1, -2]])

A_abs = np.linalg.det(A)  #求解行列式
B = np.linalg.inv(A)      #求解逆矩阵

#利用公式求伴随矩阵
A_bansui = B * A_abs
A_bansui

结果：

array([[-3., -3., -0.],
       [-1., -3.,  1.],
       [-2., -3.,  2.]])

求解多元一次方程

1、写出系数

a = np.array([[1, 2, 1], [2, -1, 3], [3, 1, 2]])

2、写常数项

b = np.array([7, 7, 18])

3、求解

np.linalg.solve(a,b)

结果：

array([ 7.,  1., -2.])