numpy基础

程默子弹

已于 2022-09-18 20:20:19 修改

阅读量216

点赞数

分类专栏： Python数据分析篇文章标签： numpy python

于 2021-03-13 11:13:13 首次发布

本文链接：https://blog.csdn.net/li_19186535821/article/details/114730927

版权

Python数据分析篇专栏收录该内容

18 篇文章 0 订阅

订阅专栏

numpy基础

一、创建ndarray

1. 使用np.array()由python list创建

参数为列表： [1, 4, 2, 5, 3]

注意：

numpy默认ndarray的所有元素的类型是相同的
如果传进来的列表中包含不同的类型，则统一为同一类型，优先级：str>float>int

l = [1, 2, 3, 4.2]
# ndarray默认所有的元素类型都是相同的.
n = np.array(1) 
# out:array([1. , 2. , 4.2])

2. 使用np的routines函数创建

routines常规方法

np.ones(shape, dtype=None, order=‘C’)

# 全是1
np.ones(shape=(8,8), dtype=int)
'''
out:array([[1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1]])
'''

np.zeros(shape, dtype=float, order=‘C’)

np.zeros(shape=(5,5),dtype=int)

np.full(shape, fill_value, dtype=None, order=‘C’)

# 用指定的值创建ndarray
np.full(fill_value=8, shape=(8,8,8))

np.eye(N, M=None, k=0, dtype=float) 对角线为1其他位置为0

np.eye(5,8)
'''
out:array([[1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0.]])
'''
# 方阵,主对角线上全是1,其他位置全为0
np.eye(8)

np.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None)

np,linspace(0, 100, endpoint=False, retstep=True, num=50)
'''
out:(array([ 0.,  2.,  4.,  6.,  8., 10., 12., 14., 16., 18., 20., 22., 24.,
        26., 28., 30., 32., 34., 36., 38., 40., 42., 44., 46., 48., 50.,
        52., 54., 56., 58., 60., 62., 64., 66., 68., 70., 72., 74., 76.,
        78., 80., 82., 84., 86., 88., 90., 92., 94., 96., 98.]),
 2.0)
'''

np.arange([start, ]stop, [step, ]dtype=None)

# 跟python原生的range一样
range(0, 100, 2)
np.arange(0, 100, 2)
'''
out:array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,
       34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66,
       68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98])
'''

np.random.randint(low, high=None, size=None, dtype=‘I’)

# 随机整数
np.random.randint(0, 150, size=(3, 4, 2))
'''
out: array([[[128,  77],
        [ 11,  42],
        [110, 113],
        [ 26,  40]],

       [[ 71, 128],
        [ 77, 131],
        [ 71,  72],
        [ 78,  14]],

       [[ 11,  85],
        [ 82,  13],
        [ 40,  36],
        [130,  57]]])
'''

np.random.randn(d0, d1, …, dn) 标准正态分布

# 均值为0, 方差为1的正态分布,叫做标准正太分布
np.random.randn(100, 2000, 10).var()  # var:方差
# out: 1.0070201898833269

np.random.normal(loc=0.0, scale=1.0, size=None)

n = np.random.normal(loc=8, scal=6, size=(10000, 10))
n.mean()
# out: 8.02618904627719
n.std()
# out: 6.003223508678745

np.random.random(size=None) 生成0到1的随机数, 左闭右开

np.random.random(size=(3,4))
'''
out: array([[0.77282949, 0.77578251, 0.14921542, 0.65891146],
       [0.24924182, 0.73022886, 0.64788291, 0.47571044],
       [0.59239831, 0.31601596, 0.58448978, 0.42369911]])
'''

np.random.rand(d0, d1, d2, …, dn) 0到1直接的随机数

np.random.rand(3, 4)
'''
out: array([[0.55701668, 0.9831173 , 0.60006435, 0.67471355],
       [0.70913735, 0.09874854, 0.34569506, 0.87802436],
       [0.9548947 , 0.70426577, 0.57740935, 0.97104347]])
'''

二、ndarray的属性

4个必记参数： ndim：维度、 shape：形状（各维度的长度）、size：总长度、dtype：元素类型

n = np.random.randint(0, 100, size=(4,5))
n
'''
out: array([[12, 95,  9, 43, 55],
       [10, 91, 72, 96, 53],
       [75, 57, 99, 25, 74],
       [28, 47,  2, 19, 99]])
'''
# dimension 维度
n.ndim
# out: 2
n.shape
# out: (4, 5)
n.size
# out: 20
n.dtype
# out: dtype('int32')

三、ndarray的基本操作

1. 索引

一维与列表完全一致多维时同理

n = np.random.randint(0, 100, size=(4,5))
n
'''
out: array([[12, 95,  9, 43, 55],
       [10, 91, 72, 96, 53],
       [75, 57, 99, 25, 74],
       [28, 47,  2, 19, 99]])
'''
n[1][2] # 不建议
n[1, 2]
# out: 72

# 根据索引修改数据
n[1, 2] = 88
n
'''
out: array([[12, 95,  9, 43, 55],
       [10, 91, 88, 96, 53],
       [75, 57, 99, 25, 74],
       [28, 47,  2, 19, 99]])
'''

2. 切片

一维与列表完全一致多维时同理

# 利用列表创建ndarray
l = [1, 2, 3, 4, 5, 6]
n = np.array(l)
n # out: array([1, 2, 3, 4, 5, 6])
n[2:4]  # out: array([3, 4])

将数据反转, 例如: [1, 2, 3]—>[3, 2, 1]

# 一维
l[::-1]  # out: [6, 5, 4, 3, 2, 1]
n[::-1]  # out: array([6, 5, 4, 3, 2, 1])
# 二维
n = np.random.randint(1, 10, size=(4,5))
n
'''
out: array([[6, 4, 6, 9, 7],
       [5, 9, 9, 6, 3],
       [7, 6, 8, 2, 3],
       [1, 6, 2, 6, 2]])
'''
n[:, ::-1]
'''
out: array([[7, 9, 6, 4, 6],
       [3, 6, 9, 9, 5],
       [3, 2, 8, 6, 7],
       [2, 6, 2, 6, 1]])
'''

3. 变形

使用reshape函数，注意参数是一个tuple！

# 变形要注意,变形之后的size不能变
n.shape  # out: (4, 5)
n
'''
out: array([[6, 4, 6, 9, 7],
       [5, 9, 9, 6, 3],
       [7, 6, 8, 2, 3],
       [1, 6, 2, 6, 2]])
'''
n.reshape(5, 4)
'''
out: array([[6, 4, 6, 9],
       [7, 5, 9, 9],
       [6, 3, 7, 6],
       [8, 2, 3, 1],
       [6, 2, 6, 2]])
'''
n.reshape(2, 10)
'''
out: array([[6, 4, 6, 9, 7, 5, 9, 9, 6, 3],
       [7, 6, 8, 2, 3, 1, 6, 2, 6, 2]])
'''
n.reshape(-1, 1)  # =np.reshape(20, 1)
n.reshape(1, -1)  # =np.reshape(1, 20)
# out: array([[6, 4, 6, 9, 7, 5, 9, 9, 6, 3, 7, 6, 8, 2, 3, 1, 6, 2, 6, 2]])

4. 级联

np.concatenate() 级联需要注意的点：

级联的参数是列表：一定要加中括号或小括号
维度必须相同
形状相符
【重点】级联的方向默认是shape这个tuple的第一个值所代表的维度方向
可通过axis参数改变级联的方向

n1 = np.random.randint(0, 100, size=(4, 6))
n2 = np.random.randint(0, 100, size=(4, 6))
display(n1, n2)
'''
out:
array([[99, 51, 70, 37, 32, 38],
       [42, 79, 82, 81, 52, 72],
       [82, 22,  3, 89, 55, 53],
       [11, 18, 97, 38, 30, 94]])
array([[25,  6, 69, 94, 83, 31],
       [65, 19, 53, 74, 44, 76],
       [44,  2, 56, 26,  7, 94],
       [19, 25, 78, 73, 81, 59]])
'''
# 默认axis=0, 默认是垂直级联,增加行数
np.concatenate((n1, n2))
'''
out:
array([[99, 51, 70, 37, 32, 38],
       [42, 79, 82, 81, 52, 72],
       [82, 22,  3, 89, 55, 53],
       [11, 18, 97, 38, 30, 94],
       [25,  6, 69, 94, 83, 31],
       [65, 19, 53, 74, 44, 76],
       [44,  2, 56, 26,  7, 94],
       [19, 25, 78, 73, 81, 59]])
'''
# 想改变级联方向可以通过修改axis
np.concatenate((n1, n2), axis=1)
array([[99, 51, 70, 37, 32, 38, 25,  6, 69, 94, 83, 31],
       [42, 79, 82, 81, 52, 72, 65, 19, 53, 74, 44, 76],
       [82, 22,  3, 89, 55, 53, 44,  2, 56, 26,  7, 94],
       [11, 18, 97, 38, 30, 94, 19, 25, 78, 73, 81, 59]])
# 总结: 如果是垂直级联,那么列数要相同; 如果是水平级联,那么行数要相同

np.hstack与np.vstack 水平级联与垂直级联,处理自己，进行维度的变更

# 水平级联
np.hstack((n1, n2))
'''
out:
array([[64, 57, 16, 55, 36, 84,  1,  5,  6, 49],
       [16, 69, 67, 85, 49, 67, 57,  1, 83, 13],
       [14, 41, 18, 97, 11, 95, 84, 12,  7, 95],
       [39, 62,  3, 47, 26, 62, 83, 37, 37, 72]])
'''
# 垂直级联
np.vstack((n1, n2))
'''
out:
array([[64, 57, 16, 55, 36],
       [16, 69, 67, 85, 49],
       [14, 41, 18, 97, 11],
       [39, 62,  3, 47, 26],
       [84,  1,  5,  6, 49],
       [67, 57,  1, 83, 13],
       [95, 84, 12,  7, 95],
       [62, 83, 37, 37, 72]])
'''

5. 切分

与级联类似，三个函数完成切分工作：

np.split
np.vsplit
np.hsplit

n = np.random.randint(0, 100, size=(6, 6))
n
'''
out:
array([[27, 63, 70, 95, 10,  7],
       [44, 38, 77, 94, 73, 85],
       [19, 82, 86, 43,  4, 53],
       [ 1, 23, 88, 18, 87, 77],
       [ 7, 82, 30, 13, 44, 22],
       [16, 13, 33, 17, 57, 53]])
'''
np.split(n, [2, 4], axis=0)  # 同np.vsplit(n, [2, 4])
'''
out:
[array([[27, 63, 70, 95, 10,  7],
        [44, 38, 77, 94, 73, 85]]),
 array([[19, 82, 86, 43,  4, 53],
        [ 1, 23, 88, 18, 87, 77]]),
 array([[ 7, 82, 30, 13, 44, 22],
        [16, 13, 33, 17, 57, 53]])]
'''
np.split(n,[2, 4], axis=1)  # 同np.hsplit(n, [2,4])
'''
out:
[array([[27, 63],
        [44, 38],
        [19, 82],
        [ 1, 23],
        [ 7, 82],
        [16, 13]]),
 array([[70, 95],
        [77, 94],
        [86, 43],
        [88, 18],
        [30, 13],
        [33, 17]]),
 array([[10,  7],
        [73, 85],
        [ 4, 53],
        [87, 77],
        [44, 22],
        [57, 53]])]
'''

6. 副本

所有赋值运算不会为ndarray的任何元素创建副本。对赋值后的对象的操作也对原来的对象生效。

# 可使用copy()函数创建副本

四、ndarray的聚合操作

1. 求和np.sum

2. 最大最小值：np.max/ np.min

同理

n = np.random.randint(0, 100, size=(6,6))
n
'''
out: array([[88, 63, 70, 95, 10,  7],
       [44, 38, 77, 94, 73, 85],
       [19, 66, 86, 43,  4, 53],
       [ 1, 23, 88, 18, 87, 77],
       [ 7, 82, 30, 13, 44, 22],
       [16, 13, 33, 17, 57, 53]])
'''
# ndarray的聚合默认会把所有的维度聚掉.
n.sum()  # out: 1516
# 可以选择聚合的方向(维度), 计算的是每一列的和(axis=0把行聚合掉)
n.sum(axis=0)
# out: array([175, 285, 384, 280, 275, 297])
n.sum(axis=1)
# out: array([333, 411, 271, 294, 198, 189])

n = np.array([1, 2, 3, 4])
np.prod(n)  # out: 24
np.argmin(n)  # 最小元素的下标 out: 0
np.argmax(n)  # 最大元素的下标 out: 3
np.min(n)  # out: 1
np.nam(n)  # out: 4

3. 其他聚合操作

Function Name    NaN-safe Version    Description
np.sum    np.nansum    Compute sum of elements
np.prod    np.nanprod    Compute product of elements
np.mean    np.nanmean    Compute mean of elements
np.std    np.nanstd    Compute standard deviation
np.var    np.nanvar    Compute variance
np.min    np.nanmin    Find minimum value
np.max    np.nanmax    Find maximum value
np.argmin    np.nanargmin    Find index of minimum value
np.argmax    np.nanargmax    Find index of maximum value
np.median    np.nanmedian    Compute median of elements
np.percentile    np.nanpercentile    Compute rank-based statistics of elements
np.any    N/A    Evaluate whether any elements are true
np.all    N/A    Evaluate whether all elements are true
np.power 幂运算

a = np.arange(0, 100)
np.percentile(a, [0, 25, 50, 100])  # out: array([ 0.  , 24.75, 49.5 , 99.  ])
np.percentile(a, 50)  # out: 49.5

n = np.array([1, 2, 3, 4, 5, np.nan])
type(np.nan)  # out: float
# np.nan是float型,和任何数字进行运算,结果总是nan
1 + np.nan  # out: nan
np.nansum(n)  # out: 15.0
np.sum(n) # out: nan 
## 结论: np.sum和np.nansum的区别  nan not a number

五、ndarray的矩阵操作

1. 基本矩阵操作

算术运算符：

加减乘除

# 矩阵和单个数字运算
n = np.random.randint(0, 10, size=(3,5))
n 
'''
out: array([[7, 5, 1, 1, 2],
       [0, 9, 4, 1, 2],
       [9, 6, 8, 7, 1]])
'''
# 每个元素和单个数字分别进行运算
n + 1
'''
out: array([[ 8,  6,  2,  2,  3],
       [ 1, 10,  5,  2,  3],
       [10,  7,  9,  8,  2]])
'''

# 矩阵与矩阵的运算, 对应位置的元素进行运算
n1 = np.random.randint(0, 10, size=(3, 5))
n2 = np.random.randint(0, 10, size=(3, 5))
display(n1, n2)
'''
out: 
array([[0, 2, 3, 1, 9],
       [3, 8, 3, 3, 1],
       [6, 5, 4, 0, 1]])
array([[3, 4, 8, 7, 3],
       [3, 3, 3, 0, 0],
       [0, 6, 5, 6, 1]])
'''
n1 * n2
'''
out: array([[ 0,  8, 24,  7, 27],
       [ 9, 24,  9,  0,  0],
       [ 0, 30, 20,  0,  1]])
'''

# 形状不一样的矩阵的运算
n1 = np.random.randint(0, 10, size=(5,4))
n2 = np.random.randint(0, 10, size=(4,1))
display(n1, n2)
'''
out:
array([[4, 0, 4, 8],
       [8, 4, 5, 9],
       [4, 7, 7, 9],
       [0, 4, 0, 2],
       [4, 0, 2, 9]])
array([[1],
       [6],
       [8],
       [9]])
'''
# 对于形状不一致的矩阵,会先进行广播,然后再进行运算,如果广播失败了就会报错
n1 + n2  # 报错

矩阵积np.dot()

n1 = np.random.randint(0, 10, size=(3, 4))
n2 = np.random.randint(0, 10, size=(4, 5))
display(n1, n2)
'''
out:
array([[3, 4, 2, 0],
       [5, 5, 4, 1],
       [1, 2, 5, 6]])
array([[7, 2, 6, 7, 7],
       [4, 6, 1, 5, 7],
       [8, 3, 9, 8, 3],
       [8, 8, 6, 3, 4]])
'''
np.dot(n1, n2)
'''
out: 
array([[ 53,  36,  40,  57,  55],
       [ 95,  60,  77,  95,  86],
       [103,  77,  89,  75,  60]])
'''

2. 广播机制

广播机制的目的: 为了让两个shape不一样的矩阵的shape变得一致, 从而可以顺利的运算

【重要】ndarray广播机制的两条规则

规则一：为缺失的维度补1
规则二：假定缺失元素用已有值填充

例1： m = np.ones((2, 3)) a = np.arange(3) 求m+a

m = np.ones((2, 3))
a = np.arange(3)
display(m, a)
'''
out:array([[1., 1., 1.],
       [1., 1., 1.]])
array([0, 1, 2])
'''
m + a
# a缺维度, a加一个维度 a还缺数据,缺的数据用已有的数据填充.
# a ---> [[0, 1, 2], [0, 1, 3]]
'''
out: array([[1., 2., 3.],
       [1., 2., 3.]])
'''

例2： a = np.arange(3).reshape((3, 1)) b = np.arange(3) 求a+b

a = np.arange(3).reshape(3, 1)
b = np.arange(3)
display(a, b)
'''
out: array([[0],
       [1],
       [2]])
array([0, 1, 2])
'''
# b缺维度,给b补维度 b ---> [[0, 1, 2]]
# 谁缺数据, a,b都缺.
# 给a补数据 a ---> [[0, 0, 0],[1, 1, 1], [2, 2, 2]]
# 给b补数据 b ---> [[0, 1, 2], [0, 1, 2], [0, 1, 2]]
a + b
'''
out: array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])
'''

六、ndarray的排序

小测验：使用以上所学numpy的知识，对一个ndarray对象进行选择排序。

def Sortn(x):

代码越短越好

排序方法: 冒泡, 快速, 选择,插入,希尔,水桶,归并…

冒泡排序

# 排序只对一维数据有意义.
# 两层循环, 第一层是遍历每一个元素.
# 第二层循环,让两两之间进行比较交换.
# 时间复杂度: O(n^2)
# 空间复杂度: O(1)
# 稳定性: 稳定的
def bubble_sort(arr):
    for i in range(len(arr) - 1):
        switched = False
        for j in range(len(arr) - i - 1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
                # 说明数组本身是无序的,
                switched = True
        if not switched:
            break
    return arr

arr = [10, 3, 2, 6, 9, 1]
bubble_sort(arr)

选择排序

# 选择后面的最小的和当前的元素进行对比
# 时间复杂度: O(n^2)
# 空间复杂度: O(1)
# 稳定性: 不稳定
def select_sort(arr):
    for i in range(len(arr) - 1):
        # 把当前第i个元素认为是最小值.
        index = i
        for j in range(i + 1, len(arr)):
            if arr[j] < arr[index]:
                index = j
        # 交换
        arr[i], arr[index] = arr[index], arr[i]
    return arr

arr = [10, 3, 2, 6, 9, 1]
select_sort(arr)

快速排序

# 思路: 选择一个基准值, 然后比它小的放到一个数组中, 比它大的放到另一个数组中, 递归这个操作. 
# 时间复杂度: O(nlog2(n))
# 空间复杂度: O(nlog2(n))
# 稳定性: 不稳定的.
def quick_sort(arr):
    # 递归的退出条件
    # 如果数组中只有一个元素,就可以退出了
    if len(arr) <= 1:
        return arr
    
    # 基准值
    pivot = arr.pop()
    # 创建两个数组
    greater = []
    less = []
    for i in range(len(arr)):
        if arr[i] > pivot:
            greater.append(arr[i])
        else:
            less.append(arr[i])
            
    return quick_sort(less) + [pivot] + quick_sort(greater)

arr = [10, 3, 2, 6, 9, 1]
quick_sort(arr)

插入排序

# 认为第一个已经排好序, 从后面依次循环选择元素和排好序的元素做对比,找到合适的位置,插入.
# 时间复杂度: O(n^2)
# 空间复杂度: O(1)
# 稳定性: 稳定的
def insert_sort(arr):
    for i in range(1, len(arr)):
        loop_index = i
        while loop_index > 0 and arr[loop_index] < arr[loop_index - 1]:
            # 交换位置
            arr[loop_index], arr[loop_index -1] = arr[loop_index - 1], arr[loop_index]
            loop_index -= 1
    return arr

arr = [10, 3, 2, 6, 9, 1]
insert_sort(arr)

对一个ndarray对象进行选择排序

def select_sort(arr):
    for i in range(len(arr) - 1):
        index = np.argmin(arr[i+1:]) + i + 1
        if arr[i] > arr[index]:
            arr[i], arr[index] = arr[index], arr[i]
            
      return arr

1. 快速排序

np.sort()与ndarray.sort()都可以，但有区别：

np.sort()不改变输入
ndarray.sort()本地处理，不占用空间，但改变输入

arr = np.array([10, 3, 2, 6, 9, 1])
# 不会修改原始
np.sort(arr)  # out: array([ 1,  2,  3,  6,  9, 10])
arr.sort()  # out:(没有输出)
arr  # out: array([ 1,  2,  3,  6,  9, 10])
## 结论: 凡是有输出的命令, 都没有修改原始数据;没有输出的,会改变原始数据.

2.部分排序

partition

n = np.random.randint(0, 10, size=10)
n  # array([3, 8, 1, 7, 3, 0, 5, 6, 2, 3])
# 找到10个中最大的5个.
np.partition(n, kth=-5)  # array([0, 3, 1, 2, 3, 3, 5, 6, 7, 8])
# 最小的5个.
np.partition(n, kth=5)  # array([0, 3, 1, 2, 3, 3, 5, 6, 7, 8])

七、numpy的其他函数

读取外部数据表: np.genfromtxt(‘filename’, delimiter=‘,’, encoding=‘utf-8’)
删除nan的方法:

np.delet(data, np.where(np.isnan(data))[0], axis=0/1)
条件选择函数: np.where()

np.where(score_sum>60)

程默子弹

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
numpy基础

例2： a = np.arange(3).reshape((3, 1)) b = np.arange(3) 求a+b。4个必记参数： ndim：维度、 shape：形状（各维度的长度）、size：总长度、dtype：元素类型。例1： m = np.ones((2, 3)) a = np.arange(3) 求m+a。将数据反转, 例如: [1, 2, 3]—>[3, 2, 1]排序方法: 冒泡, 快速, 选择,插入,希尔,水桶,归并…参数为列表： [1, 4, 2, 5, 3]routines常规方法。..
复制链接

扫一扫