python中的散列函数_在字典中覆盖Python的散列函数

I am trying to create a custom hash function for some object that I'll be hashing into a dictionary. The hashing function is unique (not the standard Python one). This is very important to me: to use the unique function. Each key's value is a list.

Assuming I override __hash__ and end up coming up with the right hash number for an object. Would:

dict = {}

dict[number_here] = value

hash the value into the position number number_here, or would it still be at the position that Python's hash table would compute for that number?

Printing dict only shows the items and not which position they're that. However, when I do hash(4), the result is 4. So I'm assuming this means integers are hashed to their respective locations?

Could someone please verify my findings or explain to me if I'm wrong?

解决方案

The python dict implementation uses the hash value to both sparsely store values based on the key and to avoid collisions in that storage. It uses the result of hash() as a starting point, it is not the definitive position.

Thus, although hash(4) returns 4, the exact 'position' in the underlying C structure is also based on what other keys are already there, and how large the current table is. The python hash table is resized as needed (items added), for example.

Since a dict has no ordering, this is not something you need to worry about, or can hope to influence. If you need ordering in a dict, use the collections.OrderedDict() implementation instead, which keeps track of ordering separately.

The details of the python hash table implementation

You may want to read up on how hash tables work on Wikipedia; Python uses open addressing for it's implementation.

When selecting a slot in the table, the modulus of the hash value (an integer) and the current table size is taken, thus on a table of size 32, so the key 45, hash value 45 would initially be stored in slot 14.

If there is a collision (there already is something else stored in slot 14 and it's not the integer 45), then the slot value is perturbed until an empty slot is found or the same key is found instead. Perturbing is done with the formula:

perturb = slot = hash

while slot_is_full and item_in_slot_is_not_equal_to_key:

slot = (5*slot) + 1 + perturb

perturb >>= 5

So, when there is a collision, another slot is picked at progressively smaller steps until it scans the whole table. Note that the table will already have been resized to make space if necessary.

In order for this to work correctly, custom types need both a __hash__() method and need to implement __eq__() to determine if two instances represent the same key. Matching hash values is not enough. For the dict implementation to consider two instances to represent the exact same key, both their hash values must match, and they must return True for the == equality operator. Such objects are considered hashable.

(For Python 2.x, implementing the __cmp__() hook would do instead of implementing __eq__(); support for this has been removed in Python 3).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值