Numpy学习Task1——数据类型和数组创建

最新推荐文章于 2022-07-27 19:09:02 发布

程序员狐小李

最新推荐文章于 2022-07-27 19:09:02 发布

阅读量502

点赞数

文章标签： numpy

原文链接：https://github.com/datawhalechina/team-learning-program/tree/master/IntroductionToNumpy/task01%20%E6%95%B0%E6%8D%AE%E7%B1%BB%E5%9E%8B%E5%8F%8A%E6%95%B0%E7%BB%84%E5%88%9B%E5%BB%BA

版权

常量（了解）

1.numpy.nan

表示空值。值得注意的是，numpy.nan相互之间是不相等的

import numpy as np

print(np.nan == np.nan)
print(np.nan != np.nan)

False
True

扩展: numpy.isnan(x, *args, **kwargs) 用来判断是否为空值，返回布尔类型

import numpy as np

x = np.array([1, 1, 8, np.nan, 10])
print(x)

y = np.isnan(x)
print(y)

z = np.count_nonzero(y)

[ 1.  1.  8. nan 10.]
[False False False  True False]

2.numpy.inf

表示无穷大值。值得注意的是，numpy.nan相互之间是相等的

print(np.inf == np.inf)
print(np.inf != np.inf)

True
False

创建数据类型（了解）

numpy 的数值类型实际上是 dtype 对象的实例。

class dtype(object):
    def __init__(self, obj, align=False, copy=False):
        pass
import numpy as np

a = np.dtype('b1')
print(a.type)  # <class 'numpy.bool_'>
print(a.itemsize)  # 1

a = np.dtype('i1')
print(a.type)  # <class 'numpy.int8'>
print(a.itemsize)  # 1
a = np.dtype('i2')
print(a.type)  # <class 'numpy.int16'>
print(a.itemsize)  # 2
a = np.dtype('i4')
print(a.type)  # <class 'numpy.int32'>
print(a.itemsize)  # 4
a = np.dtype('i8')
print(a.type)  # <class 'numpy.int64'>
print(a.itemsize)  # 8

a = np.dtype('u1')
print(a.type)  # <class 'numpy.uint8'>
print(a.itemsize)  # 1
a = np.dtype('u2')
print(a.type)  # <class 'numpy.uint16'>
print(a.itemsize)  # 2
a = np.dtype('u4')
print(a.type)  # <class 'numpy.uint32'>
print(a.itemsize)  # 4
a = np.dtype('u8')
print(a.type)  # <class 'numpy.uint64'>
print(a.itemsize)  # 8

a = np.dtype('f2')
print(a.type)  # <class 'numpy.float16'>
print(a.itemsize)  # 2
a = np.dtype('f4')
print(a.type)  # <class 'numpy.float32'>
print(a.itemsize)  # 4
a = np.dtype('f8')
print(a.type)  # <class 'numpy.float64'>
print(a.itemsize)  # 8

a = np.dtype('S')
print(a.type)  # <class 'numpy.bytes_'>
print(a.itemsize)  # 0
a = np.dtype('S3')
print(a.type)  # <class 'numpy.bytes_'>
print(a.itemsize)  # 3

a = np.dtype('U3')
print(a.type)  # <class 'numpy.str_'>
print(a.itemsize)  # 12

<class 'numpy.bool_'>
1
<class 'numpy.int8'>
1
<class 'numpy.int16'>
2
<class 'numpy.int32'>
4
<class 'numpy.int64'>
8
<class 'numpy.uint8'>
1
<class 'numpy.uint16'>
2
<class 'numpy.uint32'>
4
<class 'numpy.uint64'>
8
<class 'numpy.float16'>
2
<class 'numpy.float32'>
4
<class 'numpy.float64'>
8
<class 'numpy.bytes_'>
0
<class 'numpy.bytes_'>
3
<class 'numpy.str_'>
12

时间日期和时间增量（了解）

时间日期

import numpy as np

a = np.datetime64('2020-03-01')
print(a, a.dtype)  # 2020-03-01 datetime64[D]

a = np.datetime64('2020-03')
print(a, a.dtype)  # 2020-03 datetime64[M]

a = np.datetime64('2020-03-08 20:00:05')
print(a, a.dtype)  # 2020-03-08T20:00:05 datetime64[s]

a = np.datetime64('2020-03-08 20:00')
print(a, a.dtype)  # 2020-03-08T20:00 datetime64[m]

a = np.datetime64('2020-03-08 20')
print(a, a.dtype)  # 2020-03-08T20 datetime64[h]

2020-03-01 datetime64[D]
2020-03 datetime64[M]
2020-03-08T20:00:05 datetime64[s]
2020-03-08T20:00 datetime64[m]
2020-03-08T20 datetime64[h]

时间增量

import numpy as np

a = np.datetime64('2020-03-08') - np.datetime64('2020-03-07')
b = np.datetime64('2020-03-08') - np.datetime64('202-03-07 08:00')
c = np.datetime64('2020-03-08') - np.datetime64('2020-03-07 23:00', 'D')

print(a, a.dtype)  # 1 days timedelta64[D]
print(b, b.dtype)  # 956178240 minutes timedelta64[m]
print(c, c.dtype)  # 1 days timedelta64[D]

a = np.datetime64('2020-03') + np.timedelta64(20, 'D')
b = np.datetime64('2020-06-15 00:00') + np.timedelta64(12, 'h')
print(a, a.dtype)  # 2020-03-21 datetime64[D]
print(b, b.dtype)  # 2020-06-15T12:00 datetime64[m]

1 days timedelta64[D]
956178240 minutes timedelta64[m]
1 days timedelta64[D]
2020-03-21 datetime64[D]
2020-06-15T12:00 datetime64[m]

数组的创建

numpy 提供的最重要的数据结构是ndarray，它是 python 中list的扩展。

1.依据现有数据来创建 ndarray

通过array()函数进行创建

# 创建一维数组
a = np.array([0, 1, 2, 3, 4])
b = np.array((0, 1, 2, 3, 4))
print(a, type(a))
print(b, type(b))

# 创建二维数组
c = np.array([[11, 12, 13, 14, 15],
              [16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25],
              [26, 27, 28, 29, 30],
              [31, 32, 33, 34, 35]])
print(c, type(c))


# 创建三维数组
d = np.array([[(1.5, 2, 3), (4, 5, 6)],
              [(3, 2, 1), (4, 5, 6)]])
print(d, type(d))

[0 1 2 3 4] <class 'numpy.ndarray'>
[0 1 2 3 4] <class 'numpy.ndarray'>
[[11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]
 [26 27 28 29 30]
 [31 32 33 34 35]] <class 'numpy.ndarray'>
[[[1.5 2.  3. ]
  [4.  5.  6. ]]

 [[3.  2.  1. ]
  [4.  5.  6. ]]] <class 'numpy.ndarray'>

通过asarray()函数进行创建

array()和asarray()都可以将结构数据转化为 ndarray，但是array()和asarray()主要区别就是当数据源是ndarray 时，array()仍然会 copy 出一个副本，占用新的内存，但不改变 dtype 时 asarray()不会。

#相同点
import numpy as np

x = [[1, 1, 1], [1, 1, 1], [1, 1, 1]]
y = np.array(x)
z = np.asarray(x)
x[1][2] = 2
print(x,type(x))

print(y,type(y))

print(z,type(z))

[[1, 1, 1], [1, 1, 2], [1, 1, 1]] <class 'list'>
[[1 1 1]
 [1 1 1]
 [1 1 1]] <class 'numpy.ndarray'>
[[1 1 1]
 [1 1 1]
 [1 1 1]] <class 'numpy.ndarray'>

#区别
import numpy as np

x = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])
y = np.array(x)
z = np.asarray(x)
w = np.asarray(x, dtype=np.int)
x[1][2] = 2
print(x,type(x),x.dtype)


print(y,type(y),y.dtype)#y copy出新的副本，与x的修改无关

print(z,type(z),z.dtype)

print(w,type(w),w.dtype)

[[1 1 1]
 [1 1 2]
 [1 1 1]] <class 'numpy.ndarray'> int32
[[1 1 1]
 [1 1 1]
 [1 1 1]] <class 'numpy.ndarray'> int32
[[1 1 1]
 [1 1 2]
 [1 1 1]] <class 'numpy.ndarray'> int32
[[1 1 1]
 [1 1 2]
 [1 1 1]] <class 'numpy.ndarray'> int32

通过fromfunction()函数进行创建

给函数绘图的时候可能会用到fromfunction()，该函数可通过函数来创建数组。

import numpy as np

def f(x, y):
    return 10 * x + y

x = np.fromfunction(f, (5, 4), dtype=int)
print(x)


x = np.fromfunction(lambda i, j: i == j, (3, 3), dtype=int)
print(x)


x = np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
print(x)

[[ 0  1  2  3]
 [10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]
 [40 41 42 43]]
[[ True False False]
 [False  True False]
 [False False  True]]
[[0 1 2]
 [1 2 3]
 [2 3 4]]

2. 依据 ones 和 zeros 填充方式

在机器学习任务中经常做的一件事就是初始化参数，需要用常数值或者随机值来创建一个固定大小的矩阵。

零数组

zeros()函数：返回给定形状和类型的零数组。
zeros_like()函数：返回与给定数组形状和类型相同的零数组。

import numpy as np

x = np.zeros(5)
print(x)  

x = np.zeros([2, 3])
print(x)

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.zeros_like(x)
print(y)

[0. 0. 0. 0. 0.]
[[0. 0. 0.]
 [0. 0. 0.]]
[[0 0 0]
 [0 0 0]]

1数组

与0数组同理：

import numpy as np

x = np.ones(5)
print(x)  

x = np.ones([2, 3])
print(x)

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.ones_like(x)
print(y)

[1. 1. 1. 1. 1.]
[[1. 1. 1.]
 [1. 1. 1.]]
[[1 1 1]
 [1 1 1]]

空数组

与0数组同理，数组元素为随机数

import numpy as np

x = np.empty(5)
print(x)

x = np.empty((3, 2))
print(x)

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.empty_like(x)
print(y)

[1. 1. 1. 1. 1.]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[0 0 0]
 [0 0 0]]

单位数组

eye()函数：返回一个对角线上为1，其它地方为零的单位数组。
identity()函数：返回一个方的单位数组。

import numpy as np

x = np.eye(4)
print(x)

x = np.eye(2, 3)
print(x)


x = np.identity(4)
print(x)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
[[1. 0. 0.]
 [0. 1. 0.]]
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

对角数组

import numpy as np

x = np.arange(9).reshape((3, 3))
print(x)
# [[0 1 2]
#  [3 4 5]
#  [6 7 8]]
print(np.diag(x))  # [0 4 8]
print(np.diag(x, k=1))  # [1 5]
print(np.diag(x, k=-1))  # [3 7]

v = [1, 3, 5, 7]
x = np.diag(v)
print(x)
# [[1 0 0 0]
#  [0 3 0 0]
#  [0 0 5 0]
#  [0 0 0 7]]

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[0 4 8]
[1 5]
[3 7]
[[1 0 0 0]
 [0 3 0 0]
 [0 0 5 0]
 [0 0 0 7]]

常数数组

full()函数：返回一个常数数组。
full_like()函数：返回与给定数组具有相同形状和类型的常数数组。

import numpy as np

x = np.full((2,), 7)
print(x)
# [7 7]

x = np.full(2, 8)
print(x)
# [7 7]

x = np.full((2, 5), 9)
print(x)
# [[7 7 7 7 7 7 7]
#  [7 7 7 7 7 7 7]]

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.full_like(x, 10)
print(y)
# [[7 7 7]
#  [7 7 7]]

[7 7]
[8 8]
[[9 9 9 9 9]
 [9 9 9 9 9]]
[[10 10 10]
 [10 10 10]]

3. 利用数值范围来创建ndarray

arange()函数：返回给定间隔内的均匀间隔的值。
linspace()函数：返回指定间隔内的等间隔数字。
logspace()函数：返回数以对数刻度均匀分布。
numpy.random.rand() 返回一个由[0,1)内的随机数组成的数组。

import numpy as np

x = np.arange(5)
print(x)  # [0 1 2 3 4]

x = np.arange(3, 7, 2)
print(x)  # [3 5]

x = np.linspace(start=0, stop=2, num=9)
print(x)  
# [0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]

x = np.logspace(0, 1, 5)
print(np.around(x, 2))
# [ 1.    1.78  3.16  5.62 10.  ]            
                                    #np.around 返回四舍五入后的值，可指定精度。
                                   # around(a, decimals=0, out=None)
                                   # a 输入数组
                                   # decimals 要舍入的小数位数。 默认值为0。 如果为负，整数将四舍五入到小数点左侧的位置


x = np.linspace(start=0, stop=1, num=5)
x = [10 ** i for i in x]
print(np.around(x, 2))
# [ 1.    1.78  3.16  5.62 10.  ]

x = np.random.random(5)
print(x)
# [0.41768753 0.16315577 0.80167915 0.99690199 0.11812291]

x = np.random.random([2, 3])
print(x)
# [[0.41151858 0.93785153 0.57031309]
#  [0.13482333 0.20583516 0.45429181]]

[0 1 2 3 4]
[3 5]
[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]
[ 1.    1.78  3.16  5.62 10.  ]
[ 1.    1.78  3.16  5.62 10.  ]
[6.13548439e-04 1.79520976e-01 6.31648852e-01 8.36680726e-01
 8.43057002e-01]
[[0.2023474  0.01798275 0.76780257]
 [0.28220352 0.83947324 0.86106467]]

4. 结构数组的创建

结构数组，首先需要定义结构，然后利用np.array()来创建数组，其参数dtype为定义的结构。

（a）利用字典来定义结构

【例】

import numpy as np

personType = np.dtype({
    'names': ['name', 'age', 'weight'],
    'formats': ['U30', 'i8', 'f8']})

a = np.array([('Liming', 24, 63.9), ('Mike', 15, 67.), ('Jan', 34, 45.8)],
             dtype=personType)
print(a, type(a))
# [('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)]
# <class 'numpy.ndarray'>

（b）利用包含多个元组的列表来定义结构

【例】

import numpy as np

personType = np.dtype([('name', 'U30'), ('age', 'i8'), ('weight', 'f8')])
a = np.array([('Liming', 24, 63.9), ('Mike', 15, 67.), ('Jan', 34, 45.8)],
             dtype=personType)
print(a, type(a))
# [('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)]
# <class 'numpy.ndarray'>

# 结构数组的取值方式和一般数组差不多，可以通过下标取得元素：
print(a[0])
# ('Liming', 24, 63.9)

print(a[-2:])
# [('Mike', 15, 67. ) ('Jan', 34, 45.8)]

# 我们可以使用字段名作为下标获取对应的值
print(a['name'])
# ['Liming' 'Mike' 'Jan']
print(a['age'])
# [24 15 34]
print(a['weight'])
# [63.9 67.  45.8]

数组的属性

在使用 numpy 时，你会想知道数组的某些信息。很幸运，在这个包里边包含了很多便捷的方法，可以给你想要的信息。

numpy.ndarray.ndim用于返回数组的维数（轴的个数）也称为秩，一维数组的秩为 1，二维数组的秩为 2，以此类推。
numpy.ndarray.shape表示数组的维度，返回一个元组，这个元组的长度就是维度的数目，即 ndim 属性(秩)。
numpy.ndarray.size数组中所有元素的总量，相当于数组的shape中所有元素的乘积，例如矩阵的元素总量为行与列的乘积。
numpy.ndarray.dtype ndarray 对象的元素类型。
numpy.ndarray.itemsize以字节的形式返回数组中每一个元素的大小。

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a.shape)  # (5,)
print(a.dtype)  # int32
print(a.size)  # 5
print(a.ndim)  # 1
print(a.itemsize)  # 4

b = np.array([[1, 2, 3], [4, 5, 6.0]])
print(b.shape)  # (2, 3)
print(b.dtype)  # float64
print(b.size)  # 6
print(b.ndim)  # 2
print(b.itemsize)  # 8

(5,)
int32
5
1
4
(2, 3)
float64
6
2
8

什么是numpy？

如何安装numpy？

什么是n维数组对象？

如何区分一维、二维、多维？

以下表达式运行的结果分别是什么?

(提示: NaN = not a number, inf = infinity)

0 * np.nan

np.nan == np.nan

np.inf > np.nan

np.nan - np.nan

0.3 == 3 * 0.1

将numpy的datetime64对象转换为datetime的datetime对象。

dt64 = np.datetime64('2020-02-25 22:10:10')

【知识点：时间日期和时间增量】

如何将numpy的datetime64对象转换为datetime的datetime对象？

作业

什么是numpy？

答：NumPy是Python语言的一个扩展程序库。支持高阶大量的维度数组与矩阵运算，此外也针对数组运算提供大量的数学函数库。

如何安装numpy？

答：pip install numpy

什么是n维数组对象？

答：n维数组（ndarray）对象，是一系列同类数据的集合，可以进行索引、切片、迭代操作。 numpy中可以使用array函数创建数组:

如何区分一维、二维、多维？

n维数组（ndarray）对象的ndim属性，代表的是数组的维数（轴的个数），也称为秩，一维数组的秩为 1，二维数组的秩为 2，以此类推。

以下表达式运行的结果分别是什么?

(提示: NaN = not a number, inf = infinity)

0 * np.nan

np.nan == np.nan

np.inf > np.nan

np.nan - np.nan

0.3 == 3 * 0.1

print(np.nan * 0)
print(np.nan == np.nan)
print(np.inf > np.nan)
print(np.nan - np.nan)
print(0.3 == 3 * 0.1)

nan
False
False
nan
False

将numpy的datetime64对象转换为datetime的datetime对象。

dt64 = np.datetime64('2020-02-25 22:10:10')

【知识点：时间日期和时间增量】

如何将numpy的datetime64对象转换为datetime的datetime对象？

import numpy as np
import datetime

dt64 = np.datetime64('2020-02-25 22:10:10')
dt = dt64.astype(datetime.datetime)
print(dt,type(dt))

2020-02-25 22:10:10 <class 'datetime.datetime'>

给定一系列不连续的日期序列。填充缺失的日期，使其成为连续的日期序列。

dates = np.arange('2020-02-01', '2020-02-10', 2, np.datetime64)

【知识点：时间日期和时间增量、数学函数】

如何填写不规则系列的numpy日期中的缺失日期？

dates = np.arange('2020-02-01', '2020-02-10', 1, np.datetime64)
print(dates)

['2020-02-01' '2020-02-02' '2020-02-03' '2020-02-04' '2020-02-05'
 '2020-02-06' '2020-02-07' '2020-02-08' '2020-02-09']

如何得到昨天，今天，明天的的日期

【知识点：时间日期】

(提示: np.datetime64, np.timedelta64)

yesterday = np.datetime64('today', 'D') - np.timedelta64(1,'D')
today = np.datetime64('today', 'D')
tomorrow = np.datetime64('today', 'D') + np.timedelta64(1,'D')
print(yesterday)
print(today)
print(tomorrow)

2021-03-30
2021-03-31
2021-04-01

创建从0到9的一维数字数组。

【知识点：数组的创建】

如何创建一维数组？

a = np.arange(5)
print(a)

[0 1 2 3 4]

创建一个元素全为`True`的 3×3 数组。

【知识点：数组的创建】

如何创建一个布尔数组？

x = np.full((3,3), True, dtype=np.bool)
print(x)

[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]

创建一个长度为10并且除了第五个值为1的空向量

【知识点：数组的创建】

(提示: array[4])

arr = np.empty(10)
arr[4] = 1
print(arr)

[2.12199579e-314 6.36598737e-314 2.33419537e-313 2.75859453e-313
 1.00000000e+000 4.88059032e-313 6.57818695e-313 7.00258611e-313
 8.70018274e-313 9.12458190e-313]

创建一个值域范围从10到49的向量

【知识点：创建数组】

(提示: np.arange)

arr = np.arange(10,50,1)
print(arr)

[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]

创建一个 3x3x3的随机数组

【知识点：创建数组】

(提示: np.random.random)

arr = np.random.random((3,3,3))
print(arr)

[[[0.18351654 0.4906874  0.59926037]
  [0.81717206 0.34834451 0.06653195]
  [0.54986058 0.19039203 0.75184303]]

 [[0.24577338 0.01284756 0.74547441]
  [0.86072884 0.96877457 0.50612486]
  [0.20145443 0.58272803 0.61607391]]

 [[0.75030721 0.28277926 0.95061938]
  [0.98361801 0.13900866 0.14310479]
  [0.00801274 0.7266891  0.29833818]]]

创建一个二维数组，其中边界值为1，其余值为0

【知识点：二维数组的创建】

(提示: array[1:-1, 1:-1])

Z = np.ones((10,10))
Z[1:-1, 1:-1] = 0
print(Z)

[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]

创建长度为10的numpy数组，从5开始，在连续的数字之间的步长为3。

【知识点：数组的创建与属性】

如何在给定起始点、长度和步骤的情况下创建一个numpy数组序列？

arr = np.arange(5, 35,3)
print(arr)

[ 5  8 11 14 17 20 23 26 29 32]

将本地图像导入并将其转换为numpy数组。

【知识点：数组的创建与属性】

如何将图像转换为numpy数组？

import numpy as np
from PIL import Image

img1 = Image.open('test.jpg')
a = np.array(img1)

print(a.shape, a.dtype)

(959, 959, 3) uint8

程序员狐小李

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Numpy学习Task1——数据类型和数组创建

常量（了解）1.numpy.nan表示空值。值得注意的是，numpy.nan相互之间是不相等的import numpy as npprint(np.nan == np.nan)print(np.nan != np.nan)FalseTrue扩展: numpy.isnan(x, *args, **kwargs) 用来判断是否为空值，返回布尔类型import numpy as npx = np.array([1, 1, 8, np.nan, 10])print(x)y = np
复制链接

扫一扫