Python基础+数据科学入门（十）numpy库

最新推荐文章于 2024-03-10 11:04:35 发布

小明同学的杂货铺

最新推荐文章于 2024-03-10 11:04:35 发布

阅读量181

点赞数

分类专栏： python基础（数据学科入门）

本文链接：https://blog.csdn.net/qq_38425288/article/details/113848686

版权

python基础（数据学科入门）专栏收录该内容

11 篇文章 2 订阅

订阅专栏

声明：该博客参考深度之眼的视频课程，如有侵权请联系小编删除博文，谢谢！若总结有所失误，还请见谅，并欢迎及时指出。
在这里插入图片描述

1 为什么要用numpy

1.1 低效的python for循环

例：求100万个数的倒数

def compute_reciprocals(values):
    res = []
    for value in values:
        res.append(1/value)
    return res

values = list(range(1, 10000))
%timeit compute_reciprocals(values)          # %timeit: ipython中统计运行时间的魔术方法（多次运行获取平均值）
#1.86 ms ± 91.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

import numpy as np
values = np.arange(1, 10000)
%timeit 1/values
#实现相同计算，时间远低于for循环  28.3 µs ± 165 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

1.2 numpy为什么如此高效

numpy是由c语言编写的

编译型语言VS解释型语言
C语言执行时，对代码进行整体编译，速度更快
连续单一类型存储VS分散多变类型存储
numpy数组内的数据类型必须是统一的，如全部是浮点类型，而python列表支持任意类型数据的填充
numpy数组内的数据连续存储在内存中，而python列表的数据分散在内存中
这种存储结构，与一些更加高效的底层处理方式更加契合
多线程VS线程锁
python语言执行时有线程锁，无法真正实现多线程并行，而C语言可以

1.3 什么时候用numpy

在数据处理过程中，遇到使用“python for 循环”实现一些向量化、矩阵化操作的时候，要优先考虑使用numpy，如：两个向量的点乘、矩阵乘法等

2 numpy数组的创建

2.1 从列表开始创建

import numpy as np
x = np.array([1,2,3,4,5])
print(x)
print(type(x))
print(type(x[0]))
print(x.shape)
'''
[1 2 3 4 5]
<class 'numpy.ndarray'>
<class 'numpy.int32'>
(5,)
'''

设置数组的数据类型

x = np.array([1,2,3,4,5], dtype="float32")
print(x)              #[1. 2. 3. 4. 5.]
print(type(x[0]))     #<class 'numpy.float32'>

二维数组

x = np.array([[1,2,3],
             [4,5,6],
             [7,8,9]])
print(x)                 
print(type(x))
'''
[[1 2 3]
 [4 5 6]
 [7 8 9]]
<class 'numpy.ndarray'>
'''

2.2 从头创建数组

创建长度为5的数组，值都为0

np.zeros(5, dtype=int)  #array([0, 0, 0, 0, 0])

创建一个2*4的浮点型数组，值都为1

np.ones((2, 4), dtype=float)
'''
array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])
'''

创建一个3*5的数组，值都为8.8

np.full((3, 5), 8.8)
'''
array([[8.8, 8.8, 8.8, 8.8, 8.8],
       [8.8, 8.8, 8.8, 8.8, 8.8],
       [8.8, 8.8, 8.8, 8.8, 8.8]])
'''

创建一个3*3的单位矩阵

np.eye(3)
'''
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])
'''

创建一个线性序列数组，从1开始，到15结束，步长为2

np.arange(1, 15, 2)  #array([ 1,  3,  5,  7,  9, 11, 13])

创建一个4个元素的数组，这四个数均匀的分配到0-1

np.linspace(0, 1, 4)  #array([0.        , 0.33333333, 0.66666667, 1.        ])

创建一个10个元素的数组，形成1-10^9的等比数列

np.logspace(0, 9, 5)  #array([1.00000000e+00, 1.77827941e+02, 3.16227766e+04, 5.62341325e+06,1.00000000e+09])

创建一个3*3的，在0-1之间均匀分布的随机数构成的数组

np.random.random((3, 3))
'''
array([[0.15953838, 0.30331087, 0.05056098],
       [0.70745671, 0.69659685, 0.43136975],
       [0.12896815, 0.22934018, 0.86723514]])
'''

创建一个3*3的，均值为0，标准差为1的随机数构成的数组

np.random.normal(0,1,(3,3))
'''
array([[ 0.49620453, -0.25830437,  0.1877723 ],
       [ 1.08625394, -0.38841006, -0.96904936],
       [ 0.42627348, -0.72023391, -0.12896755]])
'''

创建一个3*3的，在[0,10)之间随机整数构成的数组

np.random.randint(0,10,(3,3))
'''
array([[8, 7, 1],
       [5, 3, 8],
       [4, 7, 5]])
'''

随机重排列

x = np.array([10,20,30,40])
np.random.permutation(x)    #array([20, 40, 30, 10])  产生新的列表

np.random.shuffle(x)
print(x)             #[30 10 40 20]   修改原列表

随机采样

#按指定形状采样
x = np.arange(10, 25, dtype = float)
x     #array([10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.,23., 24.])

np.random.choice(x, size=(4, 3))   #从x数组中随机抽取元素采样
'''
array([[12., 22., 17.],
       [21., 16., 15.],
       [10., 24., 15.],
       [21., 13., 14.]])
'''

#按概率采样
np.random.choice(x, size=(4, 3), p=x/np.sum(x))
'''
array([[18., 12., 17.],
       [11., 16., 15.],
       [14., 15., 20.],
       [24., 19., 19.]])
'''

3 numpy数组的性质

3.1 数组的属性

x = np.random.randint(10, size=(3,4))
x
'''
array([[4, 0, 4, 8],
       [7, 1, 9, 0],
       [3, 7, 3, 8]])
'''

数组的形状shape

x.shape   #(3, 4)

数组的维度ndim

x.ndim   #2
y = np.arange(10)
print(y)     #[0 1 2 3 4 5 6 7 8 9]  
y.ndim       #1

数组的大小size

x.size  #12

数组的数据类型dtype

x.dtype  #dtype('int32')

3.2 数组索引

一维数组的索引