python读书笔记 (二) 内置类型

最新推荐文章于 2024-08-15 01:55:01 发布

chimianli4037

最新推荐文章于 2024-08-15 01:55:01 发布

阅读量69

点赞数

文章标签： python 运维内存管理

原文链接：https://my.oschina.net/acutesun/blog/919928

版权

数据类型：

•空值: None

• 数字: bool, int, long, float, complex

• 序列: str, unicode, list, tuple

• 字典: dict

• 集合: set, frozenset

2.1 数字

bool

None、0、空字符串、以及没有元素的容器对象都可视为 False，反之为 True

>>> m = map(bool, [None, 0, '',u'',list(),[], tuple(),dict(),set(), frozenset()])
>>> list(m)
[False, False, False, False, False, False, False, False, False, False]

True和False可以当作数字使用

>>> int(False)
0
>>> int(True)
1
>>> range(10)[5>3]
1
>>> range(10)[5<3]
0

int

在 64 位平台上,int 类型是 64 位整数 ,这显然能应对绝大多数情况。整数是虚拟机
特殊照顾对象:

• 从堆上按需申请名为 PyIntBlock 的缓存区域存储整数对象

• 使用用固定数组缓存 [-5, 257) 之间的小小数字,只需计算下标就能获得指针。

• PyIntBlock 内存不会返还给操作系统,直至至进程结束。

小数字和大数字区别

>>> a = 15
>>> b = 15
>>> a is b
True
>>> import sys
>>> sys.getrefcount(a)
39
>>> x =456
>>> y = 456
>>> x is y
False
>>> sys.getrefcount(x)
2

因 PyIntBlock 内存只复用不回收,同时持有大量整数对象将导致内存暴涨,且不会在这些对象被
回收后释放内存,造成事实上的内存泄露

long

python3中没有这个long, 只有int

float

四舍五入入 (round), 并不准确

>>> round(1.2675, 2)
1.27
>>> round(2.675, 2)
2.67

使用Decimal 代替,它能精确控制运算精度、有效数位和 round 的结果。

在内存管理上,float 也采用用 PyFloatBlock 模式,但没有特殊的 "小小浮点数"

2.2 字符串

与字符串相关的问题总是很多,比如池化 (intern)、编码 (encode) 等。字符串是不可变类型,保
存字符序列或二进制数据。
• 短字符串存储在 arena 区域, str、unicode 单字符会被永久缓存。
• str 没有缓存机制,unicode 则保留 1024 个宽字符⻓度小于 9 的复用用对象。
• 内部包含 hash 值,str 另有标记用来判断是否被池化

>>> import sys, locale
>>> sys.getdefaultencoding()               # python默认编码 python2是ascii
'utf-8'
>>> c = locale.getdefaultlocale()          # 获取当前系统编码
>>> c
('zh_CN', 'UTF-8')

标准库另有 codecs 模块用来处理更复杂的编码转换,比如大小端和 BOM。

>>> from codecs import BOM_UTF32_LE
>>> BOM_UTF32_LE
b'\xff\xfe\x00\x00'
>>> s = '中国人'
>>> s.encode('utf-32')
b'\xff\xfe\x00\x00-N\x00\x00\xfdV\x00\x00\xbaN\x00\x00'
>>> s.encode('utf-32').decode('utf-32')
'中国人'

2.3 列表

某些时候,可以考虑用数组代替列表。和列表存储对象指针不同,数组直接内嵌数据,既省了创建
对象的内存开销,又又提升了读写效率

import array
>>> a = array.array('l', range(10))   # 用用其他序列类型初始化数组。
>>> a
array('l', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a.tolist()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> arr = array.array('b')            # 创建特定类型数组。

>>> arr
array('b')
>>> arr.fromstring('abc')        
>>> arr
array('b', [97, 98, 99])
>>>

2.4 元组

元组 (tuple) 看上去像列表的只读版本,但在底层实现上有很多不同之处。

• 只读对象,元组和元素指针数组内存是一次性连续分配的。

• 虚拟机缓存 n 个元素数量小于 20 的元组复用对象。

namedtuple 并不是元组,而是利用模板动态创建的自定义类型

>>> from collections import namedtuple
>>> User = namedtuple('User', 'name age')
>>> user = User('a',18)
>>> user.age, user.name
(18, 'a')

2.5 字典

字典 (dict) 采用开放地址法的哈希表实现

• 自带元素容量为 8 的 smalltable,只有 "超出" 时才到堆上额外分配元素表内存

• 虚拟机缓存 80 个字典复用对象,但在堆上分配的元素表内存会被释放。

• 按需动态调整容量。扩容或收缩操作都将重新分配内存,重新哈希。

• 删除元素操作不会立即收缩内存。

>>> dict.fromkeys('abc','')                 # 用用序列做 key,并提供默认 value。
{'c': '', 'b': '', 'a': ''}

>>> {k:v for k, v in zip("abc", range(3))}    # 使用用生生成表达式构造字典。
{'a': 0, 'c': 2, 'b': 1}

>>> d = {'x':1}
>>> d.update({'y':4})                       # 合并
>>> d
{'y': 4, 'x': 1}


>>> d.pop('x')                             # 弹出
1
>>> d
{'y': 4}

>>> d = {'a':1, 'b':2, 'c':3, 'x': 5, 'y':2}; d
{'c': 3, 'b': 2, 'y': 2, 'x': 5, 'a': 1}
>>> d.popitem()
('c', 3)

对于大字典,调用 keys()、values()、items() 会构造同样巨大的列表。建议用迭代器替代,以减
少内存开销。

>>> iter(d.keys())
<dict_keyiterator object at 0x7f25ee6856d8>
>>> iter(d.values())
<dict_valueiterator object at 0x7f25edfdb728>
>>> iter(d.items())
<dict_itemiterator object at 0x7f25ee6856d8>

当访问的 key 不存在时, defaultdict 自动调用 factory 对象创建所需键值对。factory 可以是任
何无参数函数或 callable 对象

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d['l'].append(22)                       # key "l" 不存在,直接用 list() 函数创建一个空列表作为value
>>> d['l'].append(23)
>>> d['l']
[22, 23]

判断字典之间差异，dict.items()返回视图操作同集合类似

>>> d1 = {'a':1, 'b':2}
>>> d2 = {'b':2, 'c':3}

>>> d1.items() & d2.items()                # 交集
{('b', 2)}
>>> d1.items() | d2.items()
{('c', 3), ('a', 1), ('b', 2)}             # 并集

>>> d1.items() - d2.items()                # 差集
{('a', 1)}

2.6 集合

集合 (set) 用来存储无序不重复对象。所谓不重复对象,除了不是同一一对象外,还包括 "值" 不能相
同。集合只能存储可哈希对象,一样有只读版本 frozenset。

集合不是序列类型,不能像列表那样按序号访问,也不能做切片操作。

>>> s = set('abc'); s
{'b', 'a', 'c'}
>>> {v for v in '123'}               # 集合推导式
{'1', '3', '2'}
>>> 'b' in s
True
>>> s.add('d'); s
{'b', 'd', 'a', 'c'}

>>> s.remove('a');s                # 删除
{'b', 'd', 'c'}

>>> s.discard('a');s               # 如果存在就移除
{'b', 'd', 'c'}

>>> s.update(set([1,2,3]));s       # 合并
{'b', 1, 2, 3, 'd', 'c'}
>>> s.pop()                        # 弹出         
'b'

集合运算

>>> set('abc') is set('abc')
False

>>> set('abc') == set('abc')                     # 相等判断
True

>>> 
>>> set('abcd') >= set('abc')                    # 超集判断 (issuperset)
True

>>> set('abcd') |  set('123')                    # 并集
{'b', 'a', '3', '2', '1', 'd', 'c'}

>>> set('abcd') & set('abc')                     # 交集
{'b', 'a', 'c'}

>>> set('abcd') - set('abc')                     # 差集 (difference), 仅左边有,右边没有的
{'d'}

>>> set('abcd').isdisjoint('abc')                # 判断是否 没有交集
False

>>> ss = set('xyz')
>>> ss |= set('789');ss                          # 并集计算 交集，差集同理
{'x', '7', 'y', 'z', '9', '8'}

集合和字典主键都必须是可哈希类型对象,但常用的 list、dict、set、defaultdict、OrderedDict
都是不可哈希的,仅有 tuple、frozenset 可用

>>> hash([])
TypeError: unhashable type: 'list'

>>> hash(tuple()), hash(frozenset())
(3527539, 133156838395276)

如果想把自定义类型放入集合,需要保证 hash 和 equal 的结果都相同才能去重

判重公式:(a is b) or (hash(a) == hash(b) and eq(a, b))

>>> class User(object):
...     def __init__(self, name):
...             self.name = name
...     def __hash__ (self):
...             return hash(self.name)
...     def __eq__(self, o):
...             if not o or not isinstance(o, User): return False
...             return self.name == o.name
... 
>>> s.add(User('tom'))
>>> s.add(User('tom'))
>>> s.add(User('jim'))
>>> s
{<__main__.User object at 0x7f25edf73cf8>, <__main__.User object at 0x7f25edf73c88>}
>>>

转载于:https://my.oschina.net/acutesun/blog/919928

chimianli4037

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python读书笔记 (二) 内置类型

数据类型： •空值: None • 数字: bool, int, long, float, complex • 序列: str, unicode, list, tuple • 字典: dict • 集合: set, frozen...
复制链接

扫一扫