再谈OrderedDict源码

前言

之前写过的一篇文章里面有提到collections模块里面OrderedDict的相关源码,今天再看这个模块里面其他部分的时候突然发现之前看的源码是python2的,我也是惊呆了,不过也怪我自己没有去github上面找。
在这里新写一篇文章,也是记录下两个版本。

源码地址:
https://github.com/python/cpython/blob/master/Lib/collections/__init__.py


正文

class _Link(object):
    __slots__ = 'prev', 'next', 'key', '__weakref__'
    
class OrderedDict(dict):
    'Dictionary that remembers insertion order'
    # An inherited dict maps keys to values.
    # The inherited dict provides __getitem__, __len__, __contains__, and get.
    # The remaining methods are order-aware.
    # Big-O running times for all methods are the same as regular dictionaries.

    # The internal self.__map dict maps keys to links in a doubly linked list.
    # The circular doubly linked list starts and ends with a sentinel element.
    # The sentinel element never gets deleted (this simplifies the algorithm).
    # The sentinel is in self.__hardroot with a weakref proxy in self.__root.
    # The prev links are weakref proxies (to prevent circular references).
    # Individual links are kept alive by the hard reference in self.__map.
    # Those hard references disappear when a key is deleted from an OrderedDict.

    def __init__(self, other=(), /, **kwds):
        '''Initialize an ordered dictionary.  The signature is the same as
        regular dictionaries.  Keyword argument order is preserved.
        '''
        try:
            self.__root
        except AttributeError:
            self.__hardroot = _Link()
            self.__root = root = _proxy(self.__hardroot)
            root.prev = root.next = root
            self.__map = {}
        self.__update(other, **kwds)

	def __setitem__(self, key, value,
                    dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link):
        'od.__setitem__(i, y) <==> od[i]=y'
        # Setting a new item creates a new link at the end of the linked list,
        # and the inherited dictionary is updated with the new key/value pair.
        if key not in self:
            self.__map[key] = link = Link()
            root = self.__root
            last = root.prev
            link.prev, link.next, link.key = last, root, key
            last.next = link
            root.prev = proxy(link)
        dict_setitem(self, key, value)

    def __delitem__(self, key, dict_delitem=dict.__delitem__):
        'od.__delitem__(y) <==> del od[y]'
        # Deleting an existing item uses self.__map to find the link which gets
        # removed by updating the links in the predecessor and successor nodes.
        dict_delitem(self, key)
        link = self.__map.pop(key)
        link_prev = link.prev
        link_next = link.next
        link_prev.next = link_next
        link_next.prev = link_prev
        link.prev = None
        link.next = None

首先定义了_Link类作为节点,并绑定了prenext属性作为前驱节点和后驱节点,key储存键,__weakref__支持弱引用(相关解释在这里)。

下面是OrderedDict类,help()部分从倒数第四行开始有不同,这里说哨兵节点是self.__hardroot,然后self.__root是它的一个弱引用,节点的实现从list改成了_Link类,__map里面还是键和节点的映射。

所有节点的前驱节点赋值用弱引用,是为了防止循环引用。这里需要了解到Python的垃圾处理机制里的引用计数,都知道在Python里万物皆对象,所有对象会根据不同的数据类型和内容开辟不同的内存空间存储,返回该空间的地址成为引用。下面简单实用sys模块的getrefcount函数测试引用计数

import sys

class Person:
    pass

p1 = Person()
print(sys.getrefcount(p1)) # output 2
p2 = p1
print(sys.getrefcount(p1)) # output 3
del p2
print(sys.getrefcount(p1)) # output 2

引用计数会记录给定对象的引用个数,并在引用个数为零时收集该对象。由于一次仅能有一个对象被回收,引用计数无法回收循环引用的对象。一组相互引用的对象若没有被其它对象直接引用,并且不可访问,则会永久存活下来。一个应用程序如果持续地产生这种不可访问的对象群组,就会发生内存泄漏。在对象群组内部使用弱引用(即不会在引用计数中被计数的引用)有时能避免出现引用环,因此弱引用可用于解决循环引用的问题。

OrderedDict里的双向链表加哨兵元素正好构成了一个循环引用,所以对所有前驱节点都是通过weakref模块的proxy()函数,采用了弱引用的方式。并且在__delitem__函数里删除某个键值对的时候,会把对应节点的prenext都赋值None,消除对其他节点的引用。

此外,源码里还多了两个函数:

    def popitem(self, last=True):
        '''Remove and return a (key, value) pair from the dictionary.
        Pairs are returned in LIFO order if last is true or FIFO order if false.
        '''
        if not self:
            raise KeyError('dictionary is empty')
        root = self.__root
        if last:
            link = root.prev
            link_prev = link.prev
            link_prev.next = root
            root.prev = link_prev
        else:
            link = root.next
            link_next = link.next
            root.next = link_next
            link_next.prev = root
        key = link.key
        del self.__map[key]
        value = dict.pop(self, key)
        return key, value

    def move_to_end(self, key, last=True):
        '''Move an existing element to the end (or beginning if last is false).
        Raise KeyError if the element does not exist.
        '''
        link = self.__map[key]
        link_prev = link.prev
        link_next = link.next
        soft_link = link_next.prev
        link_prev.next = link_next
        link_next.prev = link_prev
        root = self.__root
        if last:
            last = root.prev
            link.prev = last
            link.next = root
            root.prev = soft_link
            last.next = link
        else:
            first = root.next
            link.prev = root
            link.next = first
            first.prev = soft_link
            root.next = link

popitem这个函数很简单,移除并返回一个键值对,参数last为真就采用LIFO,也就是最后一个键值对;为假就采用FIFO,也就是第一个键值对。

看了下move_to_end这个函数也挺简单的,没啥好分析的,都是链表的基本操作。但是里面有个变量soft_link看得我挺莫名其妙的,明明直接用link不是也可以吗?


参考的博客:
Python 弱引用的使用
内存管理机制-引用计数/垃圾回收/循环引用/弱引用

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值