Numpy

最新推荐文章于 2024-02-10 10:30:44 发布

FatPuffer

最新推荐文章于 2024-02-10 10:30:44 发布

阅读量1.1k

点赞数 5

分类专栏：数据分析文章标签： Numpy

本文链接：https://blog.csdn.net/qq_42517220/article/details/103207177

版权

数据分析专栏收录该内容

2 篇文章 0 订阅

订阅专栏

一、Numpy的特点

1、擅长数值计算
2、足够高的运算能力
3、支持矢量化运算
4、免费、开源

二、Numpy的数组

1、Numpy中的数组是ndarray类类型的对象，将实际数据和元数据分开存放，独立操作，以此提升性能
2、Numpy数组的元素类型必须相同----同质性，提高元素查找效率
3、Numpy数组的元素可以通过基0的下标单独访问，size个元素的数组，合理的下标范围：[0 到 size-1]
4、Numpy数组通过dtype和shape属性表示元素的类型和维度其中维度的类型是元组，按照从高到低的顺序来排列每一维的大小
5、创建数组的方法：
numpy.arange(起始值、终止值、步长)
numpy.array(任意可以被解释为数组的序列)
numpy.zeros(数组元素个数，dtype=‘数据类型’)
numpy.ones(数组元素个数，dtype=‘数据类型’)
numpy.zeros_like(np对象)
numpy.ones_like(np对象)

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import numpy as np

# 使用arange创建数组
a = np.arange(1, 3)
print(a)
print(a.shape)
print(a.dtype)

# [1 2]
# (2,)
# int32

# 使用array创建数组
b = np.array([1, 2, 3])
print(b)
print(b.shape)
print(b.dtype)

# [1 2 3]
# (3,)
# # int32

# 使用zeros
c = np.zeros(10, dtype='int32')
print(c)
print(c.shape)
print(c.dtype)

# [0 0 0 0 0 0 0 0 0 0]
# (10,)
# int32

# 使用ones
d = np.ones(10, dtype='int64')
print(d)
print(d.shape)
print(d.dtype)

# [1 1 1 1 1 1 1 1 1 1]
# (10,)
# int64

# 使用zeros_like 和 ones_like
x = np.array([[1, 2, 3], [4, 5, 6]])
print(x)
print('==============================')

print(np.zeros_like(x))
print('==============================')
print(np.ones_like(x))

"""
[[1 2 3]
 [4 5 6]]
==============================
[[0 0 0]
 [0 0 0]]
==============================
[[1 1 1]
 [1 1 1]]
"""

6、维度说明

当np_obj.shape返回元祖只有一个元素时，说明是一维数组，元素代表一维数组中的列数。

当np_obj.shape返回元祖有两个元素时，说明是二维数组，从左往右，第一个元素代表二维数组中的行数，第二个元素代表列数。

当np_obj.shape返回元祖有三个元素时，说明是三维数组，从左往右，第一个元素代表三维数组中的页数(或块数)，第二个元素代表行数，第三个元素代表列数。

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import numpy as np


# 一维数组
np_obj = np.array([1, 2, 3])
print(np_obj)
print(np_obj.shape)
print('============================')

# 二维数组
np_obj1 = np.array([[1, 2, 3], [4, 5, 6]])
print(np_obj1)
print(np_obj1.shape)
print('============================')

# 三维数组
np_obj2 = np.array([[[1, 2, 3], [4, 5, 6]], [[4, 5, 6], [7, 8, 9]]])
print(np_obj2)
print(np_obj2.shape)

在这里插入图片描述

7、数组元素类型

np_obj.dtype - - - - - - > 查看数组类型
np_obj.astype(str) - - - - - > 改变数组类型为字符型
8、数组元素的个数

np_obj.size- - - - - - > 查看数组元素的个数
9、数组变维

np_obj.reshape(x, y)- - - -> 改变维度

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import numpy as np


# 一维数组
one_shape = np.arange(16)
print(one_shape)
print(one_shape.shape)

print('=================一维变二维=================')
two_shape1 = one_shape.reshape(2, 8)
print(two_shape1)
print(two_shape1.shape)
print('-------------------------------------')
two_shape2 = one_shape.reshape(4, 4)
print(two_shape2)
print(two_shape2.shape)

print('==================一维变三维================')
three_shape = one_shape.reshape(2, 2, 4)
print(three_shape)
print(three_shape.shape)

在这里插入图片描述

三、Numpy的内置数据类型

1. 基本数据类型

类型名	类型表示符	字符码
布尔型	bool_	?
有符号整型	int8(-128-127)/int16/int32/int64	i1/i2/i4/i8
无符号整型	uint8(0-255)/uint16/uint32/uint64	u1/u2/u4/u8
浮点型	float16/ float32/ float64	f2/f4/f8
复数型	complex64/ complex128	c8/c16
字符型	str_每个字符用32位Unicode编码表示	U<字符数>
事件类型	datetime64	M8[Y]/M8[M]/M8[D]/M8[h]/M8[m]/M8[s]

2. 复合类型数据

第一种设置dtype方式

import numpy as np

data = [
    ('zs', [90, 70, 88], 15),
    ('ls', [91, 73, 80], 16),
    ('ww', [87, 80, 82], 17)
]

ary = np.array(data)
print(ary)
print(ary.dtype)

"""
[['zs' list([90, 70, 88]) 15]
 ['ls' list([91, 73, 80]) 16]
 ['ww' list([87, 80, 82]) 17]]
object
"""

import numpy as np

data = [
    ('zs', [90, 70, 88], 15),
    ('ls', [91, 73, 80], 16),
    ('ww', [87, 80, 82], 17)
]

# 第一种设置dtype方法
# U2:代表两个Unicode32类型 64位2进制存储
# 3int32:代表3个int32整形  96位2进制存储
ary = np.array(data, dtype='U2, 3int32, int32')
print(ary)
print(ary.dtype)
# 获取王五年龄
print(ary[2][2])
print(ary[2]['f2'])  # f:代表field字段
"""
[('zs', [90, 70, 88], 15) ('ls', [91, 73, 80], 16)
 ('ww', [87, 80, 82], 17)]

[('f0', '<U2'), ('f1', '<i4', (3,)), ('f2', '<i4')]

17
17
"""

第二种设置dtype方式

import numpy as np

data = [
    ('zs', [90, 70, 88], 15),
    ('ls', [91, 73, 80], 16),
    ('zs', [87, 80, 82], 17)
]

ary = np.array(data)
print(ary)
print(ary.dtype)

"""
[['zs' list([90, 70, 88]) 15]
 ['ls' list([91, 73, 80]) 16]
 ['zs' list([87, 80, 82]) 17]]
object
"""

# 第二种设置dtype的方式
# 若字段较多，则可以使用
ary1 = np.array(data,
                dtype=[
                    ('name', 'str', 2),
                    ('score', 'int32', 3),
                    ('age', 'int32', 1),
                ])
print(ary1)
print(ary1.dtype)
# 获取王五年龄
print(ary1[2][2])
print(ary1[2]['age'])

"""
[('zs', [90, 70, 88], 15) ('ls', [91, 73, 80], 16)
 ('zs', [87, 80, 82], 17)]

[('name', '<U2'), ('score', '<i4', (3,)), ('age', '<i4')]

17
17
"""

第三种设置dtype方式

import numpy as np

data = [
    ('zs', [90, 70, 88], 15),
    ('ls', [91, 73, 80], 16),
    ('zs', [87, 80, 82], 17)
]

# U2:代表两个Unicode32类型 64位2进制存储
# 3int32:代表3个int32整形  96位2进制存储
ary = np.array(data, dtype='U2, 3int32, int32')
print(ary)
print(ary.dtype)
# 获取王五年龄
print(ary[2][2])
print(ary[2]['f2'])

"""
[['zs' list([90, 70, 88]) 15]
 ['ls' list([91, 73, 80]) 16]
 ['zs' list([87, 80, 82]) 17]]
object
"""

# 第三种设置dtype方式
ary1 = np.array(data,
                dtype={
                    'names': ['name', 'scores', 'age'],
                    'formats': ['U2', '3int32', 'int32']  # 3int32 也可简写为3i4
                })
print(ary1)
print(ary1.dtype)
# 获取王五年龄
print(ary1[2][2])
print(ary1[2]['age'])

"""
[('zs', [90, 70, 88], 15) ('ls', [91, 73, 80], 16)
 ('zs', [87, 80, 82], 17)]

[('name', '<U2'), ('scores', '<i4', (3,)), ('age', '<i4')]

17
17
"""

第四种设置dtype方式

import numpy as np

data = [
    ('zs', [90, 70, 88], 15),
    ('ls', [91, 73, 80], 16),
    ('zs', [87, 80, 82], 17)
]

# U2:代表两个Unicode32类型 64位2进制存储
# 3int32:代表3个int32整形  96位2进制存储
ary = np.array(data, dtype='U2, 3int32, int32')
print(ary)
print(ary.dtype)
# 获取王五年龄
print(ary[2][2])
print(ary[2]['f2'])

"""
[['zs' list([90, 70, 88]) 15]
 ['ls' list([91, 73, 80]) 16]
 ['zs' list([87, 80, 82]) 17]]
object
"""

# 第四种设置dtype方式
ary1 = np.array(data,
                dtype={
                    # 0代表字节数, 起始位从0字节开始, U占32位, U2占64位, 1字节8位, 所以U2占8字节
                    'name': ('U2', 0),
                    # 所以第二个元素起始字节数应该就是从第一个元素所占字节数开始的
                    'scores': ('3int32', 8),
                    # 同理, 当然这个字节数也可以不那么精确，但是必须能够确保前一个存储空间够用
                    'age': ('int32', 20)
                })
print(ary1)
print(ary1.dtype)
# 获取王五年龄
print(ary1[2][2])
print(ary1[2]['age'])

"""
[('zs', [90, 70, 88], 15) ('ls', [91, 73, 80], 16)
 ('zs', [87, 80, 82], 17)]

[('name', '<U2'), ('scores', '<i4', (3,)), ('age', '<i4')]

17
17
"""

日期数据类型datetime64

import numpy as np

data = ['2019', '2019-01-01', '2019-01-02',
        '2019-02-01']

date = np.array(data)
print(date)
print(date.dtype)

"""
['2019' '2019-01-01' '2019-01-02' '2019-02-01']
<U10  Unicode字符类型
"""

# 精确到Day的datetime64类型
# dates = date.astype('datetime64')
dates = date.astype('M8[D]')  # M8[M]精确到月
print(dates)
print(dates.dtype)
print(dates[3] - dates[0])
print(dates[3] > dates[0])

"""
['2019-01-01' '2019-01-01' '2019-01-02' '2019-02-01']
datetime64[D]  datetime64类型
31 days
True
"""

四、维度变化

1、就地变维：直接改变原数组对象的维度，不返回新的数组

nu_obj = np.arange(1, 13).reshape(3, 4)
"""
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])
"""
new_array = nu_obj.resize(2, 2, 3)
print(new_array)
"""
None
"""
print(nu_obj)
"""
array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])
"""

2、视图变维：（数据共享）reshape 、ravel

import numpy as np

a = np.arange(1, 9)
print(a)  # [1 2 3 4 5 6 7 8]

# 视图变维：变为2维数组
b = a.reshape(2, 4)
print(b)
"""
[[1 2 3 4]
 [5 6 7 8]]
"""
# 修改b中的元素, 同时会修改a数组
b[0][0] = 100
print(b)
"""
[[100 2 3 4]
 [5 6 7 8]]
"""
print(a)  # [100 2 3 4 5 6 7 8]

# 视图变维 变为1维数组：ravel将多维数据抻平，从左至右，从上向下
c = b.ravel()
print(c)  # [100   2   3   4   5   6   7   8]

3、复制变维：（数据独立）flatten

import numpy as np

a = np.array([
        np.arange(0, 5),
        np.arange(5,10)
])
print(a)
"""
[[0 1 2 3 4]
 [5 6 7 8 9]]
"""

# 复制变维, 改变原有数组或现数组，都不会影响到其他数组
b = a.flatten()
print(b)  # [0 1 2 3 4 5 6 7 8 9]
# 改变b数组元素
b[0] = 100
print(b)  # [100 1 2 3 4 5 6 7 8 9]
print(a)
"""
[[0 1 2 3 4]
 [5 6 7 8 9]]
"""

五、数组切片

1、语法

数组对象[起始位置:终止位置:步长]

2、一维数组切片示例

import numpy as np

a = np.arange(1, 10)
print(a)  # [1 2 3 4 5 6 7 8 9]
print(a[:3])  # [1 2 3]
print(a[3:6])  # [4 5 6]
print(a[6:])  # [7 8 9]
print(a[::-1])  # [9 8 7 6 5 4 3 2 1]
print(a[:-4:-1])  # [9 8 7]
print(a[-4:-7:-1])  # [6 5 4]
print(a[-7::-1])  # [3 2 1]
print(a[::])  # [1 2 3 4 5 6 7 8 9]
print(a[:])  # [1 2 3 4 5 6 7 8 9]
print(a[::3])  # [1 4 7]
print(a[1::3])  # [2 5 8]
print(a[2::3])  # [3 6 9]

3、多维数组切片示例

import numpy as np

a = np.arange(1, 28)
a.resize(3, 3, 3)
# print(a)
"""
[[[ 1  2  3]
  [ 4  5  6]
  [ 7  8  9]]

 [[10 11 12]
  [13 14 15]
  [16 17 18]]

 [[19 20 21]
  [22 23 24]
  [25 26 27]]]
"""

# 切出一页
print(a[1, :, :])
"""
[[10 11 12]
 [13 14 15]
 [16 17 18]]
"""

# 切出一页的前两行
print(a[1, 0:2, :])
"""
[[10 11 12]
 [13 14 15]]
"""

# 切出所有页的1行
print(a[:, 1, :])
"""
[[ 4  5  6]
 [13 14 15]
 [22 23 24]]
"""

# 切除所有页的1列
print(a[:, :, 1])
"""
[[ 2  5  8]
 [11 14 17]
 [20 23 26]]
"""

六、ndarray数组的掩码操作

1、说明：返回掩码数组中为true的位置对应的元素

import numpy as np

a = np.arange(1, 11)
mask = [True, False, True, False, True, False, True, False, True, False]
print(a[mask])  # [1 3 5 7 9]

2、示例：返回100以内能被3整除的数

import numpy as np

a = np.arange(1, 100)
print(a[a % 3 == 0])

"""
[ 3  6  9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99]
"""

# 既能被3整除又能被7整除
print(a[(a % 3 == 0) & (a % 7 == 0)])  # [21 42 63 84]

3、使用掩码把数组中的元素重新排序

import numpy as np

a = np.array(['A', 'B', 'C', 'D', 'E'])
# 此处mask中的元素代表ndarray数组中的元素下标，按照这个下标顺序生成新数组
# 注意：mask中的元素值不能超过数组真实长度
mask = [0, 3, 2, 4, 1, 2, 3, 2, 1]
print(a[mask])  # ['A' 'D' 'C' 'E' 'B' 'C' 'D' 'C' 'B']

七、多维数组的组合与拆分

1、垂直方向操作

import numpy as np

a = np.arange(1, 7).reshape(2, 3)
b = np.arange(7, 13).reshape(2, 3)
# 垂直方向完成组合操作，生成新数组
c = np.vstack((a, b))
print(c)
"""
[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]
"""
# 垂直方向完成拆分，生成连个数组
d, e = np.vsplit(c, 2)
print(d)
"""
[[1 2 3]
 [4 5 6]]
"""
print(e)
"""
[[ 7  8  9]
 [10 11 12]]
"""

2、水平方向操作

import numpy as np

a = np.arange(1, 7).reshape(2, 3)
b = np.arange(7, 13).reshape(2, 3)
# 水平方向完成组合操作，生成新数组
c = np.hstack((a, b))
print(c)
"""
[[ 1  2  3  7  8  9]
 [ 4  5  6 10 11 12]]
"""
# 水平方向完成拆分，生成连个数组
d, e = np.hsplit(c, 2)
print(d)
"""
[[1 2 3]
 [4 5 6]]
"""
print(e)
"""
[[ 7  8  9]
 [10 11 12]]
"""

3、深度方向操作

import numpy as np

a = np.arange(1, 7).reshape(2, 3)
b = np.arange(7, 13).reshape(2, 3)
# 深度方向（3维）完成组合操作，生成新数组
# 前后合并，视角从上右上角
c = np.dstack((a, b))
print(c)
"""
[[[ 1  7]
  [ 2  8]
  [ 3  9]]

 [[ 4 10]
  [ 5 11]
  [ 6 12]]]
"""
# 深度方向（3维）完成拆分，生成连个数组
d, e = np.dsplit(c, 2)
print(d)
"""
[[[1]
  [2]
  [3]]

 [[4]
  [5]
  [6]]]
"""
print(e)
"""
[[[ 7]
  [ 8]
  [ 9]]

 [[10]
  [11]
  [12]]]
"""

在这里插入图片描述
4、长度不等操作

import numpy as np

a = np.array([1, 2, 3, 4, 5])
b = np.array([1, 2, 3, 4])

# 填充b数组使其长度与a数组相同
# pad_width:元组，原数组前面补0个后面补1个
# mod:类型, constance常量
# constant_values：常量值 -1
b = np.pad(b, pad_width=(0, 1), mode='constant', constant_values=-1)
print(b)  # [ 1  2  3  4 -1]

# 垂直方向完成组合操作，生成新数组
c = np.vstack((a, b))
print(c)
"""
[[ 1  2  3  4  5]
 [ 1  2  3  4 -1]]
"""

5、多维数组组合与拆分的相关函数

通过axis作为关键字参数指定组合的方向
若等待组合的数组都是二维数组
0代表垂直方向组合，1代表水平方向组合
若等待组合的数组都是三维数组
0代表垂直方向组合，1代表水平方向组合，2代表深度方向组合

np.concatenate((a, b), axis=0)

# 通过给出的数组与要拆分的分数，按照某个方向进行拆分，axis的取值同上
np.split(c, 2, axis=0)

6、简单一维数组组合方案

a = np.arange(1, 9)
b = np.arange(9, 17)

# 把两个数组垒在一起
c = np.row_stack((a, b))
print(c)
"""
[[ 1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16]]
"""
# 把两个数组组合在一起成两列
d = np.cloumn_stack((a, b))
print(d)
"""
[[ 1  9]
 [ 2 10]
 [ 3 11]
 [ 4 12]
 [ 5 13]
 [ 6 14]
 [ 7 15]
 [ 8 16]]
"""

八、Numpy其他属性

1. shape:维度
2. dtype:元素类型
3. size:元素数量
4. ndim:维数
5. itemsize:元素字节数
6. nbytes:总字节数 = size * itemsize
7. real:复数数组的实部数组
8. imag:复数数组的虚部数组

import numpy as np

a = np.array([
    [1 + 1j, 2 + 4j, 3 + 7j],
    [4 + 2j, 5 + 5j, 6 + 8j],
    [7 + 3j, 8 + 6j, 9 + 9j],
])

print(a)
print('======================')
print(a.real)
print('======================')
print(a.imag)

[[1.+1.j 2.+4.j 3.+7.j]
 [4.+2.j 5.+5.j 6.+8.j]
 [7.+3.j 8.+6.j 9.+9.j]]
======================
[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]
======================
[[1. 4. 7.]
 [2. 5. 8.]
 [3. 6. 9.]]

9. T:数组对象的转置视图

有返回值，和原数组共享数据
获取数组的行和列

import numpy as np

b = np.array([np.arange(1, 9), np.arange(9, 17)])
print(b)

print('==============================')
# 取行
for i in range(b.shape[0]):
    print(b[i])

print('==============================')
# 取列
for j in range(b.shape[1]):
    col = []
    for h in range(b.shape[0]):
        col.append(b[h][j])
    print(col)

[[ 1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16]]
==============================
[1 2 3 4 5 6 7 8]
[ 9 10 11 12 13 14 15 16]
==============================
[1, 9]
[2, 10]
[3, 11]
[4, 12]
[5, 13]
[6, 14]
[7, 15]
[8, 16]

通过数组视图转置方式获取数组行和列

import numpy as np

b = np.array([np.arange(1, 9), np.arange(9, 17)])
print(b)

print('==============================')
# 取行
for i in range(len(b)):
    print(b[i])

print('==============================')
# 取列
for j in range(len(b.T)):
    print(b.T[j])

[[ 1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16]]
==============================
[1 2 3 4 5 6 7 8]
[ 9 10 11 12 13 14 15 16]
==============================
[1 9]
[ 2 10]
[ 3 11]
[ 4 12]
[ 5 13]
[ 6 14]
[ 7 15]
[ 8 16]

10. flat:扁平迭代器

类似于flatten()

import numpy as np

b = np.array([np.arange(1, 9), np.arange(9, 17)])
print(b)

for i in b.flat:
    print(i)

[[ 1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16]]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

九、数组的计算

广播原则：如果两个数组的后缘维度（即从末尾开始算起的维度）的轴长度相符或其中一方长度为1，则认为他们是广播兼容。广播会在缺失或长度为1的维度上进行。
例如：（3， 3，2）数组和（3，2）后缘维度为（3，2），此时就广播兼容
例如：（3， 3，2）数组和（1，2）或（3，1）一方长度为1，此时他们广播也兼容

1. 数组与单个数字进行运算

new_array = np.arange(0, 24).reshape(4, 6)
"""
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
"""

# 加法
print(new_array + 2)
"""
array([[ 2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25]])
"""

# 减法
print(new_array - 2)
"""
array([[-2, -1,  0,  1,  2,  3],
       [ 4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21]])
"""

# 乘法
print(new_array * 2)
"""
array([[ 0,  2,  4,  6,  8, 10],
       [12, 14, 16, 18, 20, 22],
       [24, 26, 28, 30, 32, 34],
       [36, 38, 40, 42, 44, 46]])
"""

# 除法
print(new_array / 2)
"""
array([[ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5],
       [ 3. ,  3.5,  4. ,  4.5,  5. ,  5.5],
       [ 6. ,  6.5,  7. ,  7.5,  8. ,  8.5],
       [ 9. ,  9.5, 10. , 10.5, 11. , 11.5]])
"""

特殊情况

print(new_array / 0)
"""
__main__:1: RuntimeWarning: divide by zero encountered in true_divide
__main__:1: RuntimeWarning: invalid value encountered in true_divide
array([[nan, inf, inf, inf, inf, inf],
       [inf, inf, inf, inf, inf, inf],
       [inf, inf, inf, inf, inf, inf],
       [inf, inf, inf, inf, inf, inf]])
"""

nan：not a number
inf：infinite 无穷大

2. 同纬度数组与数组计算

array1 = np.arange(0, 24).reshape(4, 6)
array2 = np.arange(100, 124).reshape(4, 6)

"""
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

array([[100, 101, 102, 103, 104, 105],
       [106, 107, 108, 109, 110, 111],
       [112, 113, 114, 115, 116, 117],
       [118, 119, 120, 121, 122, 123]])
"""

print(array1 + array2)

"""
array([[100, 102, 104, 106, 108, 110],
       [112, 114, 116, 118, 120, 122],
       [124, 126, 128, 130, 132, 134],
       [136, 138, 140, 142, 144, 146]])
"""

3. 在单维度相同的数组与数组计算

array1 = np.arange(0, 24).reshape(4, 6)
array2 = np.arange(0, 7)
array3 = np.arange(4).reshape(4, 1)

"""
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

array([0, 1, 2, 3, 4, 5, 6])

array([[0],
       [1],
       [2],
       [3]])
"""

# 行运算
print(array1 - array2)
"""
array([[ 0,  0,  0,  0,  0,  0],
       [ 6,  6,  6,  6,  6,  6],
       [12, 12, 12, 12, 12, 12],
       [18, 18, 18, 18, 18, 18]])
"""
print(array2 - array1)
"""
array([[  0,   0,   0,   0,   0,   0],
       [ -6,  -6,  -6,  -6,  -6,  -6],
       [-12, -12, -12, -12, -12, -12],
       [-18, -18, -18, -18, -18, -18]])
"""

# 列运算
print(array1 - array3)
"""
array([[ 0,  1,  2,  3,  4,  5],
       [ 5,  6,  7,  8,  9, 10],
       [10, 11, 12, 13, 14, 15],
       [15, 16, 17, 18, 19, 20]])
"""
print(array3 - array1)
"""
array([[  0,  -1,  -2,  -3,  -4,  -5],
       [ -5,  -6,  -7,  -8,  -9, -10],
       [-10, -11, -12, -13, -14, -15],
       [-15, -16, -17, -18, -19, -20]])
"""

4. 实际应用

对数组每列减去该列平均值

array1 = np.arange(0, 24).reshape(4, 6)
"""
   	  数学 语文 英语 化学 物理 政治
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])
"""

# 首先数组转置读取到每列数组，求其平均值
avg = list()
for i in array1.T:
	avg.append(sum(i) / len(i))
"""
avg
[9.0, 10.0, 11.0, 12.0, 13.0, 14.0]
"""
# 将每列平均值组成新的数组，然后使用原数组减去该数组
new_avg = np.array(avg)
"""
new_avg 
array([ 9., 10., 11., 12., 13., 14.])
"""
print(array1 - new_avg )
"""
[[-9. -9. -9. -9. -9. -9.]
 [-3. -3. -3. -3. -3. -3.]
 [ 3.  3.  3.  3.  3.  3.]
 [ 9.  9.  9.  9.  9.  9.]]
"""

十、读取数据

1. 函数或方法

np.loadtxt(frame, dtype=np.float, delimiter=None, dkiprows=0, usecols=None, unpack=False)

参数	解释
frame	文件、字符串或产生器，可以是`.gz`或`bz2`压缩文件
dtype	数据类型，可选，CSV的字符串是以什么数据类型读入数组中，默认`np.float`
delimiter	分割字符串，默认是任何空格，改为逗号
skiprows	跳过前x行，一般跳过第一行表头，索引从1开始
usecols	读取指定列，索引，元组类型
unpack	如果True，读入属性将分别写入不同数组变量，False读入数据只写入一个数组变量，默认False。（可以实现数组转置）

numpy中二维数组有三种转置方法

array.transpose()
array.T
array.swapaxes(1, 0)

2. 数组的切片

数据

click,like,dislike,comment
4394029,320053,5931,46245
7860119,185853,26679,0
5845909,576597,39774,170708
2642103,24975,4542,12829
1168130,96666,568,6666
1311445,34507,544,3040
666169,9985,297,1071
1728614,74062,2180,15297
1338533,69687,678,5643
1056891,29943,878,4046
859289,34485,726,1914
452477,28050,405,2745
...
2379689,24008,4727,7665
483496,1369,1645,1115
4672,234,0,0
142463,4231,148,279
2162240,41032,1384,4737
515000,34727,195,4722

数据读取

import numpy as np

us_file_path = "data/US_video_data_numbers.csv"

us_data = np.loadtxt(us_file_path, delimiter=",", dtype="int", skiprows=1)

[[4394029  320053    5931   46245]
 [7860119  185853   26679       0]
 [5845909  576597   39774  170708]
 ...
 [ 142463    4231     148     279]
 [2162240   41032    1384    4737]
 [ 515000   34727     195    4722]]

取单行

print(us_data[1])

[7860119  185853   26679       0]

取连续多行（2-5行（不包含5））

print(us_data[2: 5])

[[5845909  576597   39774  170708]
 [2642103   24975    4542   12829]
 [1168130   96666     568    6666]]

取不连续多行（3、6、9行）

print(us_data[[3, 6, 9]])

[[2642103   24975    4542   12829]
 [ 666169    9985     297    1071]
 [1056891   29943     878    4046]]

取单列

print(us_data[:, 0])

[4394029 7860119 5845909 ...  142463 2162240  515000]

取连续多列（0-3列（不包含3））

print(us_data[:, 0: 3])

[[4394029  320053    5931]
 [7860119  185853   26679]
 [5845909  576597   39774]
 ...
 [ 142463    4231     148]
 [2162240   41032    1384]
 [ 515000   34727     195]]

取不连续多列（0、1、3列）

print(us_data[:, [0, 1, 3]])

[[4394029  320053   46245]
 [7860119  185853       0]
 [5845909  576597  170708]
 ...
 [ 142463    4231     279]
 [2162240   41032    4737]
 [ 515000   34727    4722]]

取单个单元格的值（3行4列）

# 单行单列交叉取值
print(us_data[2, 3])

取多个不连续单元格的值（2行2列，7行1列，9行3列）

# 多行多列交叉取值
print(us_data[[2, 7, 9], [2, 1, 3]])

[39774 74062  4046]

取单行连续多列（2行，1-3列）

print(us_data[1, 1: 4])

[185853  26679      0]

取单行不连续多列（3行，2, 4列）

print(us_data[2, [1, 3]])

[576597 170708]

取单列连续多行（4列，1-3行）

print(us_data[1: 4, 3])

[     0 170708  12829]

取单列不连续多行（3列，2, 4行）

print(us_data[[2, 4], 2])

[39774   568]

取连续多行多列（1-9行，1-3列）

print(us_data[1: 9, 1: 3])

[[185853  26679]
 [576597  39774]
 [ 24975   4542]
 [ 96666    568]
 [ 34507    544]
 [  9985    297]
 [ 74062   2180]
 [ 69687    678]]

取不连续多行不连续多列（即取块）（1，3，5，7行，0，2列）

# 先取行再取列
print(us_data[[1, 3, 5, 7]])
print()
print(us_data[[1, 3, 5, 7]][:, [0, 2]])
print()
# 先取列再取行
print(us_data[:, [0, 2]])
print()
print(us_data[:, [0, 2]][[1, 3, 5, 7]])

[[7860119  185853   26679       0]
 [2642103   24975    4542   12829]
 [1311445   34507     544    3040]
 [1728614   74062    2180   15297]]

[[7860119   26679]
 [2642103    4542]
 [1311445     544]
 [1728614    2180]]

[[4394029    5931]
 [7860119   26679]
 [5845909   39774]
 ...
 [ 142463     148]
 [2162240    1384]
 [ 515000     195]]

[[7860119   26679]
 [2642103    4542]
 [1311445     544]
 [1728614    2180]]

取连续多行，不连续多列（1-9行(不包含9)，0，3列(不包含3)）

print(us_data[1: 9, [0, 2]])

[[7860119   26679]
 [5845909   39774]
 [2642103    4542]
 [1168130     568]
 [1311445     544]
 [ 666169     297]
 [1728614    2180]
 [1338533     678]]

取连续多列，不连续多行（1，3，9行，0-3列(包含3)）

print(us_data[[1, 3, 9], 0: 3])

[[7860119  185853   26679]
 [2642103   24975    4542]
 [1056891   29943     878]]

3. 数组修改值

修改2-4列的值为0

us_data[:, 2: 4] = 0
print(us_data)

[[4394029  320053       0       0]
 [7860119  185853       0       0]
 [5845909  576597       0       0]
 ...
 [ 142463    4231       0       0]
 [2162240   41032       0       0]
 [ 515000   34727       0       0]]
[7860119  185853       0       0]

修改数组中元素值小于10的元素为100

arr2 = np.arange(0, 24).reshape(4, 6)

print(arr2)
print(arr2 < 10)
print('*' * 20)
arr2[arr2 < 10] = 100
print(arr2)

[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]
 [12 13 14 15 16 17]
 [18 19 20 21 22 23]]
[[ True  True  True  True  True  True]
 [ True  True  True  True False False]
 [False False False False False False]
 [False False False False False False]]
********************
[[100 100 100 100 100 100]
 [100 100 100 100  10  11]
 [ 12  13  14  15  16  17]
 [ 18  19  20  21  22  23]]

修改数组中元素值小于12的元素为0，大于等于12的元素为50（与一个数作比较np.where()）

arr2 = np.arange(0, 24).reshape(4, 6)

# 不会改变原数组
resp = np.where(arr2 < 10, 0, 50)
print(resp)

[[ 0  0  0  0  0  0]
 [ 0  0  0  0 50 50]
 [50 50 50 50 50 50]
 [50 50 50 50 50 50]]

修改数组中小于10的元素替换为10，大于18的元素替换为18，但是nan没有被替换，那么nan是什么？（与两个数作比较array.clip()）

arr2 = np.arange(0, 24).reshape(4, 6)

# nan是float类型，所以必须将数组类型转换成float类型再对元素赋值nan
arr2 = arr2.astype(float)
arr2[3, 3] = np.nan

result = arr2.clip(10, 18)
print(result)

[[10. 10. 10. 10. 10. 10.]
 [10. 10. 10. 10. 10. 11.]
 [12. 13. 14. 15. 16. 17.]
 [18. 18. 18. nan 18. 18.]]

FatPuffer

关注

5
点赞
踩
10

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录