彻底剖析numpy的数据类型

最新推荐文章于 2024-07-02 23:05:28 发布

烧煤的快感

最新推荐文章于 2024-07-02 23:05:28 发布

阅读量3.7k

点赞数

分类专栏：机器学习文章标签： numpy python 数据分析机器学习人工智能

本文链接：https://blog.csdn.net/gg_18826075157/article/details/78609532

版权

机器学习专栏收录该内容

10 篇文章 3 订阅

订阅专栏

彻底剖析numpy的数据类型

1.类型及其转换

numpy中，array的许多生成函数默认使用的是float64数据类型：

>>> a = np.ones((3, 3))
>>> a.dtype
dtype('float64')

但是，对于传入参数为list的构造方式，则会视情况而进行自动类型确认：

>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int32')
>>> a = np.array([1., 2., 3.])
>>> a.dtype
dtype('float64')

生成后的array还会根据需要自动进行向上类型转换：

>>> a = np.array([1, 2, 3])
>>> a = a + 1.5
>>> a.dtype
dtype('float64')

但是，赋值操作却不会引起array的类型发生变化：

>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')
>>> a[0] = 1.9     # <-- 浮点数1.9会被强制转换为int64类型
>>> a
array([1, 2, 3])

强制类型转换操作：

>>> a = np.array([1.7, 1.2, 1.6])
>>> b = a.astype(int)   # <-- 强制类型转换
>>> b.dtype
dtype('int32')

四舍五入取整：

>>> a = np.array([1.2, 1.5, 1.6, 2.5, 3.5, 4.5])
>>> b = np.around(a)
>>> b.dtype     # 取整后得到的array仍然是浮点类型
dtype('float64')
>>> c = np.around(a).astype(int)
>>> c.dtype
dtype('int32')

2.每种数据类型的实际大小

2.1.整型（有符号）：

数据类型	实际大小
int8	8bit
int16	16bit
int32	32bit
int64	64bit

>>> np.array([1], dtype=int).dtype
dtype('int64')
>>> np.iinfo(np.int32).max, 2**31 - 1
(2147483647, 2147483647)

2.2.整型（无符号）：

数据类型	实际大小
uint8	8bit
uint16	16bit
uint32	32bit
uint64	64bit

>>> np.iinfo(np.uint32).max, 2**32 - 1
(4294967295, 4294967295)

2.3.浮点数：

数据类型	实际大小
float16	16bit
float32	32bit
float64	64bit
float96	96bit
float128	128bit

>>> np.finfo(np.float32).eps
1.1920929e-07
>>> np.finfo(np.float64).eps
2.2204460492503131e-16

>>> np.float32(1e-8) + np.float32(1) == 1
True
>>> np.float64(1e-8) + np.float64(1) == 1
False

2.4.浮点复数：

数据类型	实际大小
complex64	2个32bit浮点数
complex128	2个64bit浮点数
complex192	2个96bit浮点数
complex256	2个128bit浮点数

3.自定义结构体

所谓自定义结构体，有点像C/C++里面的struct，下面演示一个这样的自定义结构体，它有3个数据元素：

元素名称	数据类型
sensor_code	长度为4的字符串
position	float
value	float

>>> samples = np.zeros((6,), dtype=[('sensor_code', 'S4'),
...                                 ('position', float), ('value', float)])
>>> samples.ndim
1
>>> samples.shape
(6,)
>>> samples.dtype.names
('sensor_code', 'position', 'value')

>>> samples[:] = [('ALFA',   1, 0.37), ('BETA', 1, 0.11), ('TAU', 1,   0.13),
...               ('ALFA', 1.5, 0.37), ('ALFA', 3, 0.11), ('TAU', 1.2, 0.13)]
>>> samples     
array([('ALFA', 1.0, 0.37), ('BETA', 1.0, 0.11), ('TAU', 1.0, 0.13),
       ('ALFA', 1.5, 0.37), ('ALFA', 3.0, 0.11), ('TAU', 1.2, 0.13)],
      dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])

使用索引下标：

>>> samples['sensor_code']    
array(['ALFA', 'BETA', 'TAU', 'ALFA', 'ALFA', 'TAU'],
      dtype='|S4')
>>> samples['value']
array([ 0.37,  0.11,  0.13,  0.37,  0.11,  0.13])
>>> samples[0]    
('ALFA', 1.0, 0.37)

>>> samples[0]['sensor_code'] = 'TAU'
>>> samples[0]    
('TAU', 1.0, 0.37)

一次性取出两列数据：

>>> samples[['position', 'value']]
array([(1.0, 0.37), (1.0, 0.11), (1.0, 0.13), (1.5, 0.37), (3.0, 0.11),
       (1.2, 0.13)],
      dtype=[('position', '<f8'), ('value', '<f8')])

还可以进行条件筛选操作：

samples[samples['sensor_code'] == 'ALFA']    
array([('ALFA', 1.5, 0.37), ('ALFA', 3.0, 0.11)],
      dtype=[('sensor_code', 'S4'), ('position', '<f8'), ('value', '<f8')])

4.使用maskedarray处理缺失/非法值

在构造array时，传进去一个mask参数，可以标识某部分数据为缺失/非法值

>>> x = np.ma.array([1, 2, 3, 4], mask=[0, 1, 0, 1])
>>> x
masked_array(data = [1 -- 3 --],
             mask = [False  True False  True],
       fill_value = 999999)


>>> y = np.ma.array([1, 2, 3, 4], mask=[0, 1, 1, 1])
>>> x + y
masked_array(data = [2 -- -- --],
             mask = [False  True  True  True],
       fill_value = 999999)

一般来说，mask类型数据常常产生于一些数学运算：

>>> np.ma.sqrt([1, -1, 2, -2]) 
masked_array(data = [1.0 -- 1.41421356237... --],
             mask = [False  True False  True],
       fill_value = 1e+20)