第一章 numpy的数据类型及数组创建+练习题

最新推荐文章于 2023-11-01 18:03:21 发布

꧁ᝰ苏苏ᝰ꧂

最新推荐文章于 2023-11-01 18:03:21 发布

阅读量2.2k

点赞数

分类专栏： numpy 文章标签：字符串 numpy 数据分析

numpy 专栏收录该内容

9 篇文章 1 订阅

订阅专栏

一、常量

1.numpy.nan

表示空值。

nan = NaN = NAN

#导numpy库
import numpy as np

# 【例】两个`numpy.nan`是不相等的。
print(np.nan == np.nan)  # False
print(np.nan != np.nan)  # True

False
True

numpy.isnan(x, *args, **kwargs) 对NaN逐个元素进行测试，然后将结果作为布尔数组返回。

x = np.array([1, 1, 8, np.nan, 10])
print(x)

[ 1.  1.  8. nan 10.]

y = np.isnan(x)
print(y)

[False False False  True False]

np.isnan()（只有数组数值运算时可使用）

注意：numpy模块的isnan方法仅支持对数值进行判断，因此传入的如果是字符串类型会报错

z = np.count_nonzero(y)
print(z)

numpy.count_nonzero是用于统计数组中非零元素的个数

详细用法： numpy.count_nonzero(a, axis=None, *, keepdims=False)

a: 为需要统计数组名

axis: 为统计的轴,当axis=0时统计数组y轴（每列）非零元素个数，当axis=1时统计数组每 x轴(每行) 非零元素个数, 另外，axis可以为元组

2.numpy.inf

表示正无穷大。

Inf = inf = infty = Infinity = PINF

3.numpy.pi

表示圆周率

pi = 3.1415926535897932384626433
print(pi)

3.141592653589793

4.numpy.e

表示自然常数

e = 2.71828182845904523536028747135266249775724709369995
print(e)

2.718281828459045

二、数据类型

1.常见数据类型

Python 原生的数据类型相对较少， bool、int、float、str等。这在不需要关心数据在计算机中表示的所有方式的应用中是方便的。然而，对于科学计算，通常需要更多的控制。为了加以区分 numpy 在这些类型名称末尾都加了“_”。

下表列举了常用 numpy 基本类型。

类型	备注	说明
bool_ = bool8	8位	布尔类型
int8 = byte	8位	整型
int16 = short	16位	整型
int32 = intc	32位	整型
int_ = int64 = long = int0 = intp	64位	整型
uint8 = ubyte	8位	无符号整型
uint16 = ushort	16位	无符号整型
uint32 = uintc	32位	无符号整型
uint64 = uintp = uint0 = uint	64位	无符号整型
float16 = half	16位	浮点型
float32 = single	32位	浮点型
float_ = float64 = double	64位	浮点型
str_ = unicode_ = str0 = unicode		Unicode 字符串
datetime64		日期时间类型
timedelta64		表示两个时间之间的间隔

2.创建数据类型

numpy 的数值类型实际上是 dtype 对象的实例。

class dtype(object):
    def __init__(self, obj, align=False, copy=False):
        pass

每个内建类型都有一个唯一定义它的字符代码，如下：

字符	对应类型	备注
b	boolean	‘b1’
i	signed integer	‘i1’, ‘i2’, ‘i4’, ‘i8’
u	unsigned integer	‘u1’, ‘u2’ ,‘u4’ ,‘u8’
f	floating-point	‘f2’, ‘f4’, ‘f8’
c	complex floating-point
m	timedelta64	表示两个时间之间的间隔
M	datetime64	日期时间类型
O	object
S	(byte-)string	S3表示长度为3的字符串
U	Unicode	Unicode 字符串
V	void

### ---------------【例】--------------

import numpy as np

# a.itemsize,输出array元素的字节数

a = np.dtype('b1')
print(a.type)  # <class 'numpy.bool_'>
print(a.itemsize)  # 1

a = np.dtype('i1')
print(a.type)  # <class 'numpy.int8'>
print(a.itemsize)  # 1
a = np.dtype('i2')
print(a.type)  # <class 'numpy.int16'>
print(a.itemsize)  # 2
a = np.dtype('i4')
print(a.type)  # <class 'numpy.int32'>
print(a.itemsize)  # 4
a = np.dtype('i8')
print(a.type)  # <class 'numpy.int64'>
print(a.itemsize)  # 8

a = np.dtype('u1')
print(a.type)  # <class 'numpy.uint8'>
print(a.itemsize)  # 1
a = np.dtype('u2')
print(a.type)  # <class 'numpy.uint16'>
print(a.itemsize)  # 2
a = np.dtype('u4')
print(a.type)  # <class 'numpy.uint32'>
print(a.itemsize)  # 4
a = np.dtype('u8')
print(a.type)  # <class 'numpy.uint64'>
print(a.itemsize)  # 8

a = np.dtype('f2')
print(a.type)  # <class 'numpy.float16'>
print(a.itemsize)  # 2
a = np.dtype('f4')
print(a.type)  # <class 'numpy.float32'>
print(a.itemsize)  # 4
a = np.dtype('f8')
print(a.type)  # <class 'numpy.float64'>
print(a.itemsize)  # 8

a = np.dtype('S')
print(a.type)  # <class 'numpy.bytes_'>
print(a.itemsize)  # 0
a = np.dtype('S3')
print(a.type)  # <class 'numpy.bytes_'>
print(a.itemsize)  # 3

a = np.dtype('U3')
print(a.type)  # <class 'numpy.str_'>
print(a.itemsize)  # 12

<class 'numpy.bool_'>
1
<class 'numpy.int8'>
1
<class 'numpy.int16'>
2
<class 'numpy.int32'>
4
<class 'numpy.int64'>
8
<class 'numpy.uint8'>
1
<class 'numpy.uint16'>
2
<class 'numpy.uint32'>
4
<class 'numpy.uint64'>
8
<class 'numpy.float16'>
2
<class 'numpy.float32'>
4
<class 'numpy.float64'>
8
<class 'numpy.bytes_'>
0
<class 'numpy.bytes_'>
3
<class 'numpy.str_'>
12

3.数据类型信息

Python 的浮点数通常是64位浮点数，几乎等同于 np.float64。

NumPy和Python整数类型的行为在整数溢出方面存在显着差异，与 NumPy 不同，Python 的int 是灵活的。这意味着Python整数可以扩展以容纳任何整数并且不会溢出。

整数类型的机器限制。

class iinfo(object):
    def __init__(self, int_type):
        pass
    def min(self):
        pass
    def max(self):
        pass

import numpy as np

ii16 = np.iinfo(np.int16)
print(ii16.min)  # -32768
print(ii16.max)  # 32767

ii32 = np.iinfo(np.int32)
print(ii32.min)  # -2147483648
print(ii32.max)  # 2147483647

-32768
32767
-2147483648
2147483647

浮点类型的机器限制。

class finfo(object):
    def _init(self, dtype):
         pass

import numpy as np

ff16 = np.finfo(np.float16)
print(ff16.bits)  # 16
print(ff16.min)  # -65500.0
print(ff16.max)  # 65500.0
print(ff16.eps)  # 0.000977

ff32 = np.finfo(np.float32)
print(ff32.bits)  # 32
print(ff32.min)  # -3.4028235e+38
print(ff32.max)  # 3.4028235e+38
print(ff32.eps)  # 1.1920929e-07

16
-65500.0
65500.0
0.000977
32
-3.4028235e+38
3.4028235e+38
1.1920929e-07

三、时间日期和时间增量

1.datetime64 基础

在 numpy 中，我们很方便的将字符串转换成时间日期类型 datetime64（datetime 已被 python 包含的日期时间库所占用）。

datatime64是带单位的日期时间类型，其单位如下：

日期单位	代码含义	时间单位	代码含义
Y	年	h	小时
M	月	m	分钟
W	周	s	秒
D	天	ms	毫秒
-	-	us	微秒
-	-	ns	纳秒
-	-	ps	皮秒
-	-	fs	飞秒
-	-	as	阿托秒

注意：

1秒 = 1000 毫秒（milliseconds）
1毫秒 = 1000 微秒（microseconds）

【例】从字符串创建 datetime64 类型时，默认情况下，numpy 会根据字符串自动选择对应的单位。

import numpy as np

a = np.datetime64('2020-03-01')
print(a, a.dtype)  # 2020-03-01 datetime64[D]

a = np.datetime64('2020-03')
print(a, a.dtype)  # 2020-03 datetime64[M]

a = np.datetime64('2020-03-08 20:00:05')
print(a, a.dtype)  # 2020-03-08T20:00:05 datetime64[s]

a = np.datetime64('2020-03-08 20:00')
print(a, a.dtype)  # 2020-03-08T20:00 datetime64[m]

a = np.datetime64('2020-03-08 20')
print(a, a.dtype)  # 2020-03-08T20 datetime64[h]

2020-03-01 datetime64[D]
2020-03 datetime64[M]
2020-03-08T20:00:05 datetime64[s]
2020-03-08T20:00 datetime64[m]
2020-03-08T20 datetime64[h]

【例】从字符串创建 datetime64 类型时，可以强制指定使用的单位。

import numpy as np

a = np.datetime64('2020-03', 'D')
print(a, a.dtype)  # 2020-03-01 datetime64[D]

a = np.datetime64('2020-03', 'Y')
print(a, a.dtype)  # 2020 datetime64[Y]

print(np.datetime64('2020-03') == np.datetime64('2020-03-01'))  # True
print(np.datetime64('2020-03') == np.datetime64('2020-03-02'))  #False

2020-03-01 datetime64[D]
2020 datetime64[Y]
True
False

由上例可以看出，2019-03 和 2019-03-01 所表示的其实是同一个时间。
事实上，如果两个 datetime64 对象具有不同的单位，它们可能仍然代表相同的时刻。并且从较大的单位（如月份）转换为较小的单位（如天数）是安全的。

【例】从字符串创建 datetime64 数组时，如果单位不统一，则一律转化成其中最小的单位。

import numpy as np

a = np.array(['2020-03', '2020-03-08', '2020-03-08 20:00'], dtype='datetime64')
print(a, a.dtype)
# ['2020-03-01T00:00' '2020-03-08T00:00' '2020-03-08T20:00'] datetime64[m]

['2020-03-01T00:00' '2020-03-08T00:00' '2020-03-08T20:00'] datetime64[m]

【例】使用arange()创建 datetime64 数组，用于生成日期范围。

import numpy as np

a = np.arange('2020-08-01', '2020-08-10', dtype=np.datetime64)
print(a)
# ['2020-08-01' '2020-08-02' '2020-08-03' '2020-08-04' '2020-08-05'
#  '2020-08-06' '2020-08-07' '2020-08-08' '2020-08-09']
print(a.dtype)  # datetime64[D]

a = np.arange('2020-08-01 20:00', '2020-08-10', dtype=np.datetime64)
print(a)
# ['2020-08-01T20:00' '2020-08-01T20:01' '2020-08-01T20:02' ...
#  '2020-08-09T23:57' '2020-08-09T23:58' '2020-08-09T23:59']
print(a.dtype)  # datetime64[m]

a = np.arange('2020-05', '2020-12', dtype=np.datetime64)
print(a)
# ['2020-05' '2020-06' '2020-07' '2020-08' '2020-09' '2020-10' '2020-11']
print(a.dtype)  # datetime64[M]

['2020-08-01' '2020-08-02' '2020-08-03' '2020-08-04' '2020-08-05'
 '2020-08-06' '2020-08-07' '2020-08-08' '2020-08-09']
datetime64[D]
['2020-08-01T20:00' '2020-08-01T20:01' '2020-08-01T20:02' ...
 '2020-08-09T23:57' '2020-08-09T23:58' '2020-08-09T23:59']
datetime64[m]
['2020-05' '2020-06' '2020-07' '2020-08' '2020-09' '2020-10' '2020-11']
datetime64[M]

2.datetime64 和 timedelta64 运算

【例】timedelta64 表示两个 datetime64 之间的差。timedelta64 也是带单位的，并且和相减运算中的两个 datetime64 中的较小的单位保持一致。

import numpy as np

a = np.datetime64('2020-03-08') - np.datetime64('2020-03-07')
b = np.datetime64('2020-03-08') - np.datetime64('202-03-07 08:00')
c = np.datetime64('2020-03-08') - np.datetime64('2020-03-07 23:00', 'D')

print(a, a.dtype)  # 1 days timedelta64[D]
print(b, b.dtype)  # 956178240 minutes timedelta64[m]
print(c, c.dtype)  # 1 days timedelta64[D]

a = np.datetime64('2020-03') + np.timedelta64(20, 'D')
b = np.datetime64('2020-06-15 00:00') + np.timedelta64(12, 'h')
print(a, a.dtype)  # 2020-03-21 datetime64[D]
print(b, b.dtype)  # 2020-06-15T12:00 datetime64[m]

1 days timedelta64[D]
956178240 minutes timedelta64[m]
1 days timedelta64[D]
2020-03-21 datetime64[D]
2020-06-15T12:00 datetime64[m]

【例】生成 timedelta64时，要注意年（‘Y’）和月（‘M’）这两个单位无法和其它单位进行运算（一年有几天？一个月有几个小时？这些都是不确定的）。

import numpy as np

a = np.timedelta64(1, 'Y')
b = np.timedelta64(a, 'M')
print(a)  # 1 years
print(b)  # 12 months

c = np.timedelta64(1, 'h')
d = np.timedelta64(c, 'm')
print(c)  # 1 hours
print(d)  # 60 minutes

print(np.timedelta64(a, 'D'))
# TypeError: Cannot cast NumPy timedelta64 scalar from metadata [Y] to [D] according to the rule 'same_kind'

print(np.timedelta64(b, 'D'))
# TypeError: Cannot cast NumPy timedelta64 scalar from metadata [M] to [D] according to the rule 'same_kind'

1 years
12 months
1 hours
60 minutes



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-1-1f458319478e> in <module>
     11 print(d)  # 60 minutes
     12 
---> 13 print(np.timedelta64(a, 'D'))
     14 # TypeError: Cannot cast NumPy timedelta64 scalar from metadata [Y] to [D] according to the rule 'same_kind'
     15 


TypeError: Cannot cast NumPy timedelta64 scalar from metadata [Y] to [D] according to the rule 'same_kind'

【例】timedelta64 的运算。

import numpy as np

a = np.timedelta64(1, 'Y')
b = np.timedelta64(6, 'M')
c = np.timedelta64(1, 'W')
d = np.timedelta64(1, 'D')
e = np.timedelta64(10, 'D')

print(a)  # 1 years
print(b)  # 6 months
print(a + b)  # 18 months
print(a - b)  # 6 months
print(2 * a)  # 2 years
print(a / b)  # 2.0
print(c / d)  # 7.0
print(c % e)  # 7 days

1 years
6 months
18 months
6 months
2 years
2.0
7.0
7 days

【例】numpy.datetime64 与 datetime.datetime 相互转换

import numpy as np
import datetime

dt = datetime.datetime(year=2020, month=6, day=1, hour=20, minute=5, second=30)
dt64 = np.datetime64(dt, 's')
print(dt64, dt64.dtype)
# 2020-06-01T20:05:30 datetime64[s]

dt2 = dt64.astype(datetime.datetime)
print(dt2, type(dt2))
# 2020-06-01 20:05:30 <class 'datetime.datetime'>

2020-06-01T20:05:30 datetime64[s]
2020-06-01 20:05:30 <class 'datetime.datetime'>

3.datetime64 的应用

为了允许在只有一周中某些日子有效的上下文中使用日期时间，NumPy包含一组“busday”（工作日）功能。

numpy.busday_offset(dates, offsets, roll='raise', weekmask='1111100', holidays=None, busdaycal=None, out=None) 首先根据滚动规则将日期调整为有效日期，然后将偏移量应用于在有效日期中计算的给定日期。

参数roll：{‘raise’, ‘nat’, ‘forward’, ‘following’, ‘backward’, ‘preceding’, ‘modifiedfollowing’, ‘modifiedpreceding’}

‘raise’ means to raise an exception for an invalid day.
‘nat’ means to return a NaT (not-a-time) for an invalid day.
‘forward’ and ‘following’ mean to take the first valid day later in time.
‘backward’ and ‘preceding’ mean to take the first valid day earlier in time.

【例】将指定的偏移量应用于工作日，单位天（‘D’）。计算下一个工作日，如果当前日期为非工作日，默认报错。可以指定 forward 或 backward 规则来避免报错。（一个是向前取第一个有效的工作日，一个是向后取第一个有效的工作日）

import numpy as np

# 2020-07-10 星期五
a = np.busday_offset('2020-07-10', offsets=1)
print(a)  # 2020-07-13

a = np.busday_offset('2020-07-11', offsets=1)
print(a)
# ValueError: Non-business day date in busday_offset

a = np.busday_offset('2020-07-11', offsets=0, roll='forward')
b = np.busday_offset('2020-07-11', offsets=0, roll='backward')
print(a)  # 2020-07-13
print(b)  # 2020-07-10

a = np.busday_offset('2020-07-11', offsets=1, roll='forward')
b = np.busday_offset('2020-07-11', offsets=1, roll='backward')
print(a)  # 2020-07-14
print(b)  # 2020-07-13

可以指定偏移量为 0 来获取当前日期向前或向后最近的工作日，当然，如果当前日期本身就是工作日，则直接返回当前日期。

numpy.is_busday(dates, weekmask='1111100', holidays=None, busdaycal=None, out=None) 计算给定日期中的哪些是有效日期，哪些不是。

【例】返回指定日期是否是工作日。

import numpy as np

# 2020-07-10 星期五
a = np.is_busday('2020-07-10')
b = np.is_busday('2020-07-11')
print(a)  # True
print(b)  # False

True
False

【例】统计一个 datetime64[D] 数组中的工作日天数。

import numpy as np

# 2020-07-10 星期五
begindates = np.datetime64('2021-03-10')
enddates = np.datetime64('2021-03-20')
a = np.arange(begindates, enddates, dtype='datetime64')
b = np.count_nonzero(np.is_busday(a))
print(a)
print(b)

['2021-03-10' '2021-03-11' '2021-03-12' '2021-03-13' '2021-03-14'
 '2021-03-15' '2021-03-16' '2021-03-17' '2021-03-18' '2021-03-19']
8

【例】自定义周掩码值，即指定一周中哪些星期是工作日。

import numpy as np

# 2020-07-10 星期五
a = np.is_busday('2020-07-10', weekmask=[1, 1, 1, 1, 1, 0, 0])
b = np.is_busday('2020-07-10', weekmask=[1, 1, 1, 1, 0, 0, 1])
print(a)  # True
print(b)  # False

True
False

numpy.busday_count(begindates, enddates, weekmask='1111100', holidays=[], busdaycal=None, out=None)计算开始日期和结束日期之间的有效天数，不包括结束日期。

weekmask：： str 或 array_like of bool, 可选参数

一个seven-element数组，指示星期一至星期日中的哪个是有效日期。可以指定为length-seven列表或数组，例如[1,1,1,1,1,0,0]； length-seven字符串，例如‘1111100’；或类似“ Mon Tue Wed Thu Fri”的字符串，该字符串由工作日的3个字符的缩写组成，并可选用空格隔开。有效的缩写是：周一周二周三周四周五周六周日

holidays：： array_like of datetime64[D], 可选参数

不论日期属于哪个工作日，都应将其视为无效日期的数组。可以按任何顺序指定假期日期，而忽略NaT(not-a-time)日期。此列表以适合快速计算有效日期的规范化格式保存。

【例】返回两个日期之间的工作日数量。

import numpy as np

# 2021-03-10 星期三
begindates = np.datetime64('2021-03-10')
enddates = np.datetime64('2021-03-20')
a = np.busday_count(begindates, enddates)
b = np.busday_count(enddates, begindates)
print(a)  # 8
print(b)  # -8

8
-8

参考图文

https://www.jianshu.com/p/336cd77d9914
https://www.cnblogs.com/gl1573/p/10549547.html#h2datetime64
https://www.numpy.org.cn/reference/arrays/datetime.html#%E6%97%A5%E6%9C%9F%E6%97%B6%E9%97%B4%E5%8D%95%E4%BD%8D

四、数组的创建

导入 numpy。

import numpy as np

numpy 提供的最重要的数据结构是ndarray，它是 python 中list的扩展。

1. 依据现有数据来创建 ndarray

（a）通过array()函数进行创建。

def array(p_object, dtype=None, copy=True, order=‘K’, subok=False, ndmin=0):

array（）：生成一个数组

object ：数组类型（为任何对象提供一个数组接口，对象应__array__ method 返回一个数组）

dtype : data-type, optional（想得到的数据类型）如果未给出，则该类型将被确定为保持序列中的对象所需的最小类型。

copy : bool, optional，如果为true(默认)，则复制对象。否则，如果__array__ 返回一个副本，

order : {‘K’, ‘A’, ‘C’, ‘F’}, optional 详情见源码

subok : bool, optional 如果为真，则子类将被传递，否则返回的数组将被迫为基类数组(默认)。

ndmin : int, optional 指定产生的数组应该具有的最小维数。这些维数将根据需要预先提交给形状，以满足这一要求。

import numpy as np

# 创建一维数组
a = np.array([0, 1, 2, 3, 4])
b = np.array((0, 1, 2, 3, 4))
print(a, type(a))
# [0 1 2 3 4] <class 'numpy.ndarray'>
print(b, type(b))
# [0 1 2 3 4] <class 'numpy.ndarray'>

# 创建二维数组
c = np.array([[11, 12, 13, 14, 15],
              [16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25],
              [26, 27, 28, 29, 30],
              [31, 32, 33, 34, 35]])
print(c, type(c))
# [[11 12 13 14 15]
#  [16 17 18 19 20]
#  [21 22 23 24 25]
#  [26 27 28 29 30]
#  [31 32 33 34 35]] <class 'numpy.ndarray'>

# 创建三维数组
d = np.array([[(1.5, 2, 3), (4, 5, 6)],
              [(3, 2, 1), (4, 5, 6)]])
print(d, type(d))
# [[[1.5 2.  3. ]
#   [4.  5.  6. ]]

#  [[3.  2.  1. ]
#   [4.  5.  6. ]]] <class 'numpy.ndarray'>

[0 1 2 3 4] <class 'numpy.ndarray'>
[0 1 2 3 4] <class 'numpy.ndarray'>
[[11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]
 [26 27 28 29 30]
 [31 32 33 34 35]] <class 'numpy.ndarray'>
[[[1.5 2.  3. ]
  [4.  5.  6. ]]

 [[3.  2.  1. ]
  [4.  5.  6. ]]] <class 'numpy.ndarray'>

（b）通过asarray()函数进行创建

array()和asarray()都可以将结构数据转化为 ndarray，但是array()和asarray()主要区别就是当数据源是ndarray 时，array()仍然会 copy 出一个副本，占用新的内存，但不改变 dtype 时 asarray()不会。

def asarray(a, dtype=None, order=None):
    return array(a, dtype, copy=False, order=order)

【例】array()和asarray()都可以将结构数据转化为 ndarray

import numpy as np

x = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])
y = np.array(x)
z = np.asarray(x)
w = np.asarray(x, dtype=np.int)
x[1][2] = 2
print(x,type(x),x.dtype)
# [[1 1 1]
#  [1 1 2]
#  [1 1 1]] <class 'numpy.ndarray'> int32

print(y,type(y),y.dtype)
# [[1 1 1]
#  [1 1 1]
#  [1 1 1]] <class 'numpy.ndarray'> int32

print(z,type(z),z.dtype)
# [[1 1 1]
#  [1 1 2]
#  [1 1 1]] <class 'numpy.ndarray'> int32

print(w,type(w),w.dtype)
# [[1 1 1]
#  [1 1 2]
#  [1 1 1]] <class 'numpy.ndarray'> int32

[[1 1 1]
 [1 1 2]
 [1 1 1]] <class 'numpy.ndarray'> int32
[[1 1 1]
 [1 1 1]
 [1 1 1]] <class 'numpy.ndarray'> int32
[[1 1 1]
 [1 1 2]
 [1 1 1]] <class 'numpy.ndarray'> int32
[[1 1 1]
 [1 1 2]
 [1 1 1]] <class 'numpy.ndarray'> int32

【例】更改为较大的dtype时，其大小必须是array的最后一个axis的总大小（以字节为单位）的除数

import numpy as np

x = np.array([[1, 1, 1], [1, 1, 1], [1, 1, 1]])
print(x, x.dtype)
# [[1 1 1]
#  [1 1 1]
#  [1 1 1]] int32
x.dtype = np.float

# ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

[[1 1 1]
 [1 1 1]
 [1 1 1]] int32

（c）通过fromfunction()函数进行创建

给函数绘图的时候可能会用到fromfunction()，该函数可从函数中创建数组。

def fromfunction(function, shape, **kwargs):

【例】通过在每个坐标上执行一个函数来构造数组。

import numpy as np

def f(x, y):
    return 10 * x + y

x = np.fromfunction(f, (5, 4), dtype=int)
print(x)
# [[ 0  1  2  3]
#  [10 11 12 13]
#  [20 21 22 23]
#  [30 31 32 33]
#  [40 41 42 43]]

x = np.fromfunction(lambda i, j: i == j, (3, 3), dtype=int)
print(x)
# [[ True False False]
#  [False  True False]
#  [False False  True]]

x = np.fromfunction(lambda i, j: i + j, (3, 3), dtype=int)
print(x)
# [[0 1 2]
#  [1 2 3]
#  [2 3 4]]

[[ 0  1  2  3]
 [10 11 12 13]
 [20 21 22 23]
 [30 31 32 33]
 [40 41 42 43]]
[[ True False False]
 [False  True False]
 [False False  True]]
[[0 1 2]
 [1 2 3]
 [2 3 4]]

2. 依据 ones 和 zeros 填充方式

在机器学习任务中经常做的一件事就是初始化参数，需要用常数值或者随机值来创建一个固定大小的矩阵。

（a）零数组

zeros()函数：返回给定形状和类型的零数组。
zeros_like()函数：返回与给定数组形状和类型相同的零数组。

def zeros(shape, dtype=None, order=‘C’):
def zeros_like(a, dtype=None, order=‘K’, subok=True, shape=None):

import numpy as np

x = np.zeros(5)
print(x)  # [0. 0. 0. 0. 0.]
x = np.zeros([2, 3])
print(x)
# [[0. 0. 0.]
#  [0. 0. 0.]]

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.zeros_like(x)
print(y)
# [[0 0 0]
#  [0 0 0]]

[0. 0. 0. 0. 0.]
[[0. 0. 0.]
 [0. 0. 0.]]
[[0 0 0]
 [0 0 0]]

（b）1数组

ones()函数：返回给定形状和类型的1数组。
ones_like()函数：返回与给定数组形状和类型相同的1数组。

def ones(shape, dtype=None, order=‘C’):
def ones_like(a, dtype=None, order=‘K’, subok=True, shape=None):

import numpy as np

x = np.ones(5)
print(x)  # [1. 1. 1. 1. 1.]
x = np.ones([2, 3])
print(x)
# [[1. 1. 1.]
#  [1. 1. 1.]]

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.ones_like(x)
print(y)
# [[1 1 1]
#  [1 1 1]]

[1. 1. 1. 1. 1.]
[[1. 1. 1.]
 [1. 1. 1.]]
[[1 1 1]
 [1 1 1]]

（c）空数组

empty()函数：返回一个空数组，数组元素为随机数。
empty_like函数：返回与给定数组具有相同形状和类型的新数组。

def empty(shape, dtype=None, order=‘C’):
def empty_like(prototype, dtype=None, order=‘K’, subok=True, shape=None):

import numpy as np

x = np.empty(5)
print(x)
# [1.95821574e-306 1.60219035e-306 1.37961506e-306 
#  9.34609790e-307 1.24610383e-306]

x = np.empty((3, 2))
print(x)
# [[1.60220393e-306 9.34587382e-307]
#  [8.45599367e-307 7.56598449e-307]
#  [1.33509389e-306 3.59412896e-317]]

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.empty_like(x)
print(y)
# [[  7209029   6422625   6619244]
#  [      100 707539280       504]]

[1. 1. 1. 1. 1.]
[[1. 1.]
 [1. 1.]
 [1. 1.]]
[[0 0 0]
 [0 0 0]]

（d）单位数组

eye()函数：返回一个对角线上为1，其它地方为零的单位数组。
identity()函数：返回一个方的单位数组。

def eye(N, M=None, k=0, dtype=float, order=‘C’):
def identity(n, dtype=None):

import numpy as np

x = np.eye(4)
print(x)
# [[1. 0. 0. 0.]
#  [0. 1. 0. 0.]
#  [0. 0. 1. 0.]
#  [0. 0. 0. 1.]]

x = np.eye(2, 3)
print(x)
# [[1. 0. 0.]
#  [0. 1. 0.]]

x = np.identity(4)
print(x)
# [[1. 0. 0. 0.]
#  [0. 1. 0. 0.]
#  [0. 0. 1. 0.]
#  [0. 0. 0. 1.]]

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
[[1. 0. 0.]
 [0. 1. 0.]]
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]

（e）对角数组

diag()函数：提取对角线或构造对角数组。

def diag(v, k=0):

import numpy as np

x = np.arange(9).reshape((3, 3))
print(x)
# [[0 1 2]
#  [3 4 5]
#  [6 7 8]]
print(np.diag(x))  # [0 4 8]
print(np.diag(x, k=1))  # [1 5]
print(np.diag(x, k=-1))  # [3 7]

v = [1, 3, 5, 7]
x = np.diag(v)
print(x)
# [[1 0 0 0]
#  [0 3 0 0]
#  [0 0 5 0]
#  [0 0 0 7]]

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[0 4 8]
[1 5]
[3 7]
[[1 0 0 0]
 [0 3 0 0]
 [0 0 5 0]
 [0 0 0 7]]

（f）常数数组

full()函数：返回一个常数数组。
full_like()函数：返回与给定数组具有相同形状和类型的常数数组。

def full(shape, fill_value, dtype=None, order=‘C’):
def full_like(a, fill_value, dtype=None, order=‘K’, subok=True, shape=None):

import numpy as np

x = np.full((2,), 7)
print(x)
# [7 7]

x = np.full(2, 7)
print(x)
# [7 7]

x = np.full((2, 7), 7)
print(x)
# [[7 7 7 7 7 7 7]
#  [7 7 7 7 7 7 7]]

x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.full_like(x, 7)
print(y)
# [[7 7 7]
#  [7 7 7]]

[7 7]
[7 7]
[[7 7 7 7 7 7 7]
 [7 7 7 7 7 7 7]]
[[7 7 7]
 [7 7 7]]

3. 利用数值范围来创建ndarray

arange()函数：返回给定间隔内的均匀间隔的值。
linspace()函数：返回指定间隔内的等间隔数字。
logspace()函数：返回数以对数刻度均匀分布。
numpy.random.rand() 返回一个由[0,1)内的随机数组成的数组。

def arange([start,] stop[, step,], dtype=None):
def linspace(start, stop, num=50, endpoint=True, retstep=False,
dtype=None, axis=0):
def logspace(start, stop, num=50, endpoint=True, base=10.0,
dtype=None, axis=0):
def rand(d0, d1, …, dn):

import numpy as np

x = np.arange(5)
print(x)  # [0 1 2 3 4]

x = np.arange(3, 7, 2)
print(x)  # [3 5]

x = np.linspace(start=0, stop=2, num=9)
print(x)  
# [0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]

x = np.logspace(0, 1, 5)
print(np.around(x, 2))
# [ 1.    1.78  3.16  5.62 10.  ]            
                                    #np.around 返回四舍五入后的值，可指定精度。
                                   # around(a, decimals=0, out=None)
                                   # a 输入数组
                                   # decimals 要舍入的小数位数。 默认值为0。 如果为负，整数将四舍五入到小数点左侧的位置


x = np.linspace(start=0, stop=1, num=5)
x = [10 ** i for i in x]
print(np.around(x, 2))
# [ 1.    1.78  3.16  5.62 10.  ]

x = np.random.random(5)
print(x)
# [0.41768753 0.16315577 0.80167915 0.99690199 0.11812291]

x = np.random.random([2, 3])
print(x)
# [[0.41151858 0.93785153 0.57031309]
#  [0.13482333 0.20583516 0.45429181]]

[0 1 2 3 4]
[3 5]
[0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ]
[ 1.    1.78  3.16  5.62 10.  ]
[ 1.    1.78  3.16  5.62 10.  ]
[0.21002365 0.43774345 0.34136319 0.46849059 0.63624847]
[[0.55080614 0.37969211 0.18192981]
 [0.8517472  0.39575293 0.9517301 ]]

4. 结构数组的创建

结构数组，首先需要定义结构，然后利用np.array()来创建数组，其参数dtype为定义的结构。

（a）利用字典来定义结构

import numpy as np

personType = np.dtype({
    'names': ['name', 'age', 'weight'],
    'formats': ['U30', 'i8', 'f8']})

a = np.array([('Liming', 24, 63.9), ('Mike', 15, 67.), ('Jan', 34, 45.8)],
             dtype=personType)
print(a, type(a))
# [('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)]
# <class 'numpy.ndarray'>

[('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)] <class 'numpy.ndarray'>

（b）利用包含多个元组的列表来定义结构

import numpy as np

personType = np.dtype([('name', 'U30'), ('age', 'i8'), ('weight', 'f8')])
a = np.array([('Liming', 24, 63.9), ('Mike', 15, 67.), ('Jan', 34, 45.8)],
             dtype=personType)
print(a, type(a))
# [('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)]
# <class 'numpy.ndarray'>

# 结构数组的取值方式和一般数组差不多，可以通过下标取得元素：
print(a[0])
# ('Liming', 24, 63.9)

print(a[-2:])
# [('Mike', 15, 67. ) ('Jan', 34, 45.8)]

# 我们可以使用字段名作为下标获取对应的值
print(a['name'])
# ['Liming' 'Mike' 'Jan']
print(a['age'])
# [24 15 34]
print(a['weight'])
# [63.9 67.  45.8]

[('Liming', 24, 63.9) ('Mike', 15, 67. ) ('Jan', 34, 45.8)] <class 'numpy.ndarray'>
('Liming', 24, 63.9)
[('Mike', 15, 67. ) ('Jan', 34, 45.8)]
['Liming' 'Mike' 'Jan']
[24 15 34]
[63.9 67.  45.8]

五、数组的属性

在使用 numpy 时，你会想知道数组的某些信息。很幸运，在这个包里边包含了很多便捷的方法，可以给你想要的信息。

numpy.ndarray.ndim用于返回数组的维数（轴的个数）也称为秩，一维数组的秩为 1，二维数组的秩为 2，以此类推。
numpy.ndarray.shape表示数组的维度，返回一个元组，这个元组的长度就是维度的数目，即 ndim 属性(秩)。
numpy.ndarray.size数组中所有元素的总量，相当于数组的shape中所有元素的乘积，例如矩阵的元素总量为行与列的乘积。
numpy.ndarray.dtype ndarray 对象的元素类型。
numpy.ndarray.itemsize以字节的形式返回数组中每一个元素的大小。

class ndarray(object):
    shape = property(lambda self: object(), lambda self, v: None, lambda self: None)
    dtype = property(lambda self: object(), lambda self, v: None, lambda self: None)
    size = property(lambda self: object(), lambda self, v: None, lambda self: None)
    ndim = property(lambda self: object(), lambda self, v: None, lambda self: None)
    itemsize = property(lambda self: object(), lambda self, v: None, lambda self: None)

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a.shape)  # (5,)
print(a.dtype)  # int32
print(a.size)  # 5
print(a.ndim)  # 1
print(a.itemsize)  # 4

b = np.array([[1, 2, 3], [4, 5, 6.0]])
print(b.shape)  # (2, 3)
print(b.dtype)  # float64
print(b.size)  # 6
print(b.ndim)  # 2
print(b.itemsize)  # 8

(5,)
int32
5
1
4
(2, 3)
float64
6
2
8

在ndarray中所有元素必须是同一类型，否则会自动向下转换，int->float->str。

import numpy as np

a = np.array([1, 2, 3, 4, 5])
print(a)  # [1 2 3 4 5]
b = np.array([1, 2, 3, 4, '5'])
print(b)  # ['1' '2' '3' '4' '5']
c = np.array([1, 2, 3, 4, 5.0])
print(c)  # [1. 2. 3. 4. 5.]

[1 2 3 4 5]
['1' '2' '3' '4' '5']
[1. 2. 3. 4. 5.]

六、练习题

什么是numpy？

numpy是python中基于数组对象的科学计算库。
提炼关键字，可以得出numpy以下三大特点：
拥有n维数组对象;
拥有广播功能（后面讲到）；
拥有各种科学计算API，任你调用；

如何安装numpy？

因为numpy是一个python库，所以使用python包管理工具pip或者conda都可以安装。安装python后，打开cmd命令行，输入：pip install numpy

什么是n维数组对象？

n维数组（ndarray）对象，是一系列同类数据的集合，可以进行索引、切片、迭代操作。 numpy中可以使用array函数创建数组:

import numpy as np
np.array([1,2,3])

array([1, 2, 3])

如何区分一维、二维、多维？

判断一个数组是几维，主要是看它有几个轴（axis）。

一个轴表示一维数组，两个轴表示二维数组，以此类推。

每个轴都代表一个一维数组。

比如说，二维数组第一个轴里的每个元素都是一个一维数组，也就是第二个轴。

一维数组一个轴：

[1,2,3]

[1, 2, 3]

二维数组两个轴：

[[0, 1, 2],
 [3, 4, 5]]

[[0, 1, 2], [3, 4, 5]]

三维数组三个轴：

[[[ 0,  1,  2],
  [ 3,  4,  5]],

 [[ 6,  7,  8],
  [ 9, 10, 11]]]

[[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]]

以此类推n维数组。

以下表达式运行的结果分别是什么?

(提示: NaN = not a number, inf = infinity)

0 * np.nan

np.nan == np.nan

np.inf > np.nan

np.nan - np.nan

0.3 == 3 * 0.1

print(0 * np.nan)
print(np.nan == np.nan)
print(np.inf > np.nan)
print(np.nan - np.nan)
print(0.3 == 3 * 0.1)

nan
False
False
nan
False

将numpy的datetime64对象转换为datetime的datetime对象。

dt64 = np.datetime64('2020-02-25 22:10:10')

【知识点：时间日期和时间增量】

如何将numpy的datetime64对象转换为datetime的datetime对象？

import numpy as np
import datetime

dt64 = np.datetime64('2020-02-25 22:10:10')
dt = dt64.astype(datetime.datetime)
print(dt, type(dt))
# 2020-02-25 22:10:10 <class 'datetime.datetime'>

2020-02-25 22:10:10 <class 'datetime.datetime'>

给定一系列不连续的日期序列。填充缺失的日期，使其成为连续的日期序列。

dates = np.arange('2020-02-01', '2020-02-10', 2, np.datetime64)

【知识点：时间日期和时间增量、数学函数】

如何填写不规则系列的numpy日期中的缺失日期？

import numpy as np

dates = np.arange('2020-02-01', '2020-02-10', 2, np.datetime64)
print(dates)
# ['2020-02-01' '2020-02-03' '2020-02-05' '2020-02-07' '2020-02-09']

out = []
for date, d in zip(dates, np.diff(dates)):
    out.extend(np.arange(date, date + d))
fillin = np.array(out)
output = np.hstack([fillin, dates[-1]])
print(output)
# ['2020-02-01' '2020-02-02' '2020-02-03' '2020-02-04' '2020-02-05'
#  '2020-02-06' '2020-02-07' '2020-02-08' '2020-02-09']

['2020-02-01' '2020-02-03' '2020-02-05' '2020-02-07' '2020-02-09']
['2020-02-01' '2020-02-02' '2020-02-03' '2020-02-04' '2020-02-05'
 '2020-02-06' '2020-02-07' '2020-02-08' '2020-02-09']

如何得到昨天，今天，明天的的日期

【知识点：时间日期】

(提示: np.datetime64, np.timedelta64)

yesterday = np.datetime64('today', 'D') - np.timedelta64(1, 'D')
today     = np.datetime64('today', 'D')
tomorrow  = np.datetime64('today', 'D') + np.timedelta64(1, 'D')
print ("Yesterday is " + str(yesterday))
print ("Today is " + str(today))
print ("Tomorrow is "+ str(tomorrow))

Yesterday is 2021-04-20
Today is 2021-04-21
Tomorrow is 2021-04-22

创建从0到9的一维数字数组。

【知识点：数组的创建】

如何创建一维数组？

#【答案】

import numpy as np

arr = np.arange(10)
print(arr)
# [0 1 2 3 4 5 6 7 8 9]

[0 1 2 3 4 5 6 7 8 9]

创建一个元素全为True的 3×3 数组。

【知识点：数组的创建】

如何创建一个布尔数组？

#答案
import numpy as np

arr = np.full([3, 3], True, dtype=np.bool)
print(arr)

[[ True  True  True]
 [ True  True  True]
 [ True  True  True]]

创建一个长度为10并且除了第五个值为1的空向量

【知识点：数组的创建】

(提示: array[4])

Z = np.zeros(10)
Z[4] = 1
print(Z)

[0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]

创建一个值域范围从10到49的向量

【知识点：创建数组】

(提示: np.arange)

Z = np.arange(10,50)
print(Z)

[10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49]

创建一个 3x3x3的随机数组

【知识点：创建数组】

(提示: np.random.random)

Z = np.random.random((3,3,3))
print(Z)

[[[0.72695053 0.80421381 0.02320482]
  [0.35509264 0.18069697 0.41695397]
  [0.35143171 0.31527891 0.32568238]]

 [[0.85771974 0.79373475 0.2713051 ]
  [0.15674402 0.83542799 0.6767048 ]
  [0.27875941 0.00588554 0.62802242]]

 [[0.73334509 0.56780821 0.07970181]
  [0.75296088 0.47631618 0.24653112]
  [0.15197489 0.70235184 0.9510641 ]]]

创建一个二维数组，其中边界值为1，其余值为0

【知识点：二维数组的创建】

(提示: array[1:-1, 1:-1])

Z = np.ones((10,10))
Z[1:-1,1:-1] = 0
print(Z)

[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]

创建长度为10的numpy数组，从5开始，在连续的数字之间的步长为3。

【知识点：数组的创建与属性】

如何在给定起始点、长度和步骤的情况下创建一个numpy数组序列？

import numpy as np

start = 5
step = 3
length = 10
a = np.arange(start, start + step * length, step)
print(a)

[ 5  8 11 14 17 20 23 26 29 32]

将本地图像导入并将其转换为numpy数组。

【知识点：数组的创建与属性】

如何将图像转换为numpy数组？

import numpy as np
from PIL import Image

img1 = Image.open('test.jpg')
a = np.array(img1)

print(a.shape, a.dtype)

(959, 959, 3) uint8

꧁ᝰ苏苏ᝰ꧂

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
第一章 numpy的数据类型及数组创建+练习题

一、常量1.numpy.nan表示空值。nan = NaN = NAN#导numpy库import numpy as np # 【例】两个`numpy.nan`是不相等的。print(np.nan == np.nan) # Falseprint(np.nan != np.nan) # TrueFalseTruenumpy.isnan(x, *args, **kwargs) 对NaN逐个元素进行测试，然后将结果作为布尔数组返回。x = np.array([1, 1,
复制链接

扫一扫

专栏目录