前言
之前写过的一篇文章里面有提到collections
模块里面OrderedDict
的相关源码,今天再看这个模块里面其他部分的时候突然发现之前看的源码是python2的,我也是惊呆了,不过也怪我自己没有去github上面找。
在这里新写一篇文章,也是记录下两个版本。
源码地址:
https://github.com/python/cpython/blob/master/Lib/collections/__init__.py
正文
class _Link(object):
__slots__ = 'prev', 'next', 'key', '__weakref__'
class OrderedDict(dict):
'Dictionary that remembers insertion order'
# An inherited dict maps keys to values.
# The inherited dict provides __getitem__, __len__, __contains__, and get.
# The remaining methods are order-aware.
# Big-O running times for all methods are the same as regular dictionaries.
# The internal self.__map dict maps keys to links in a doubly linked list.
# The circular doubly linked list starts and ends with a sentinel element.
# The sentinel element never gets deleted (this simplifies the algorithm).
# The sentinel is in self.__hardroot with a weakref proxy in self.__root.
# The prev links are weakref proxies (to prevent circular references).
# Individual links are kept alive by the hard reference in self.__map.
# Those hard references disappear when a key is deleted from an OrderedDict.
def __init__(self, other=(), /, **kwds):
'''Initialize an ordered dictionary. The signature is the same as
regular dictionaries. Keyword argument order is preserved.
'''
try:
self.__root
except AttributeError:
self.__hardroot = _Link()
self.__root = root = _proxy(self.__hardroot)
root.prev = root.next = root
self.__map = {}
self.__update(other, **kwds)
def __setitem__(self, key, value,
dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
'od.__setitem__(i, y) <==> od[i]=y'
# Setting a new item creates a new link at the end of the linked list,
# and the inherited dictionary is updated with the new key/value pair.
if key not in self:
self.__map[key] = link = Link()
root = self.__root
last = root.prev
link.prev, link.next, link.key = last, root, key
last.next = link
root.prev = proxy(link)
dict_setitem(self, key, value)
def __delitem__(self, key, dict_delitem=dict.__delitem__):
'od.__delitem__(y) <==> del od[y]'
# Deleting an existing item uses self.__map to find the link which gets
# removed by updating the links in the predecessor and successor nodes.
dict_delitem(self, key)
link = self.__map.pop(key)
link_prev = link.prev
link_next = link.next
link_prev.next = link_next
link_next.prev = link_prev
link.prev = None
link.next = None
首先定义了_Link
类作为节点,并绑定了pre和next属性作为前驱节点和后驱节点,key储存键,__weakref__支持弱引用(相关解释在这里)。
下面是OrderedDict
类,help()
部分从倒数第四行开始有不同,这里说哨兵节点是self.__hardroot
,然后self.__root
是它的一个弱引用,节点的实现从list
改成了_Link
类,__map
里面还是键和节点的映射。
所有节点的前驱节点赋值用弱引用,是为了防止循环引用。这里需要了解到Python的垃圾处理机制里的引用计数,都知道在Python里万物皆对象,所有对象会根据不同的数据类型和内容开辟不同的内存空间存储,返回该空间的地址成为引用。下面简单实用sys
模块的getrefcount
函数测试引用计数:
import sys
class Person:
pass
p1 = Person()
print(sys.getrefcount(p1)) # output 2
p2 = p1
print(sys.getrefcount(p1)) # output 3
del p2
print(sys.getrefcount(p1)) # output 2
引用计数会记录给定对象的引用个数,并在引用个数为零时收集该对象。由于一次仅能有一个对象被回收,引用计数无法回收循环引用的对象。一组相互引用的对象若没有被其它对象直接引用,并且不可访问,则会永久存活下来。一个应用程序如果持续地产生这种不可访问的对象群组,就会发生内存泄漏。在对象群组内部使用弱引用(即不会在引用计数中被计数的引用)有时能避免出现引用环,因此弱引用可用于解决循环引用的问题。
而OrderedDict
里的双向链表加哨兵元素正好构成了一个循环引用,所以对所有前驱节点都是通过weakref
模块的proxy()
函数,采用了弱引用的方式。并且在__delitem__
函数里删除某个键值对的时候,会把对应节点的pre
和next
都赋值None,消除对其他节点的引用。
此外,源码里还多了两个函数:
def popitem(self, last=True):
'''Remove and return a (key, value) pair from the dictionary.
Pairs are returned in LIFO order if last is true or FIFO order if false.
'''
if not self:
raise KeyError('dictionary is empty')
root = self.__root
if last:
link = root.prev
link_prev = link.prev
link_prev.next = root
root.prev = link_prev
else:
link = root.next
link_next = link.next
root.next = link_next
link_next.prev = root
key = link.key
del self.__map[key]
value = dict.pop(self, key)
return key, value
def move_to_end(self, key, last=True):
'''Move an existing element to the end (or beginning if last is false).
Raise KeyError if the element does not exist.
'''
link = self.__map[key]
link_prev = link.prev
link_next = link.next
soft_link = link_next.prev
link_prev.next = link_next
link_next.prev = link_prev
root = self.__root
if last:
last = root.prev
link.prev = last
link.next = root
root.prev = soft_link
last.next = link
else:
first = root.next
link.prev = root
link.next = first
first.prev = soft_link
root.next = link
popitem
这个函数很简单,移除并返回一个键值对,参数last
为真就采用LIFO,也就是最后一个键值对;为假就采用FIFO,也就是第一个键值对。
看了下move_to_end
这个函数也挺简单的,没啥好分析的,都是链表的基本操作。但是里面有个变量soft_link
看得我挺莫名其妙的,明明直接用link
不是也可以吗?