哈希值 哈希码_什么是哈希? 哈希码如何工作-带有示例

哈希值 哈希码

哈希简介 (Introduction to hashing)

Hashing is designed to solve the problem of needing to efficiently find or store an item in a collection.

哈希设计用于解决需要在集合中有效查找或存储项目的问题。

For example, if we have a list of 10,000 words of English and we want to check if a given word is in the list, it would be inefficient to successively compare the word with all 10,000 items until we find a match. Even if the list of words are lexicographically sorted, like in a dictionary, you will still need some time to find the word you are looking for.

例如,如果我们有一个10,000个英语单词的列表,并且想要检查列表中是否有给定的单词,那么将这个单词与所有10,000个项目相继进行比较直到找到匹配项都是无效的。 即使单词列表按字典顺序排序,就像在字典中一样,您仍然需要一些时间才能找到所需的单词。

Hashing is a technique to make things more efficient by effectively narrowing down the search at the outset.

散列是一种通过一开始就有效缩小搜索范围来提高效率的技术。

什么是哈希? (What is hashing?)

Hashing means using some function or algorithm to map object data to some representative integer value.

散列表示使用某种函数或算法将对象数据映射到某个代表整数值。

This so-called hash code (or simply hash) can then be used as a way to narrow down our search when looking for the item in the map.

然后,可以使用这种所谓的哈希码(或简称为哈希)来缩小我们在地图中查找项目时的搜索范围。

Generally, these hash codes are used to generate an index, at which the value is stored.

通常,这些哈希码用于生成索引,在该索引中存储值。

哈希如何工作 (How hashing works)

In hash tables, you store data in forms of key and value pairs. The key, which is used to identify the data, is given as an input to the hashing function. The hash code, which is an integer, is then mapped to the fixed size we have.

在哈希表中,您以键和值对的形式存储数据。 用来标识数据的密钥作为哈希函数的输入提供。 然后将哈希码(是整数)映射到我们拥有的固定大小。

Hash tables have to support 3 functions.

哈希表必须支持3个功能。

  • insert (key, value)

    插入(键,值)
  • get (key)

    获取(密钥)
  • delete (key)

    删除(键)

Purely as an example to help us grasp the concept, let us suppose that we want to map a list of string keys to string values (for example, map a list of countries to their capital cities).

纯粹以帮助我们理解该概念为例,让我们假设我们想将字符串键列表映射到字符串值(例如,将国家列表映射到其首都)。

So let’s say we want to store the data in Table in the map.

假设我们要将数据存储在地图的Table中。

And let us suppose that our hash function is to simply take the length of the string.

让我们假设我们的哈希函数只是获取字符串的长度。

For simplicity, we will have two arrays: one for our keys and one for the values.So to put an item in the hash table, we compute its hash code (in this case, simply count the number of characters), then put the key and value in the arrays at the corresponding index.

为了简单起见,我们将有两个数组:一个用于我们的键,另一个用于值。因此,将一个项放入哈希表中,我们计算其哈希码(在这种情况下,只需计算字符数),然后将键和值在数组中的对应索引处。

For example, Cuba has a hash code (length) of 4. So we store Cuba in the 4th position in the keys array, and Havana in the 4th index of the values array etc. And we end up with the following:

例如,古巴的哈希码(长度)为4。因此,我们将古巴存储在键数组的第4个位置,将哈瓦那存储在values数组的第4个索引中,等等。最后得到以下内容:

Now, in this specific example things work quite well. Our array needs to be big enough to accommodate the longest string, but in this case that’s only 11 slots.We do waste a bit of space because, for example, there are no 1-letter keys in our data, nor keys between 8 and 10 letters.

现在,在此特定示例中,一切工作正常。 我们的数组必须足够大以容纳最长的字符串,但是在这种情况下只有11个插槽。我们确实浪费了一些空间,因为例如我们的数据中没有1个字母的键,也没有8到8之间的键。 10个字母。

But in this case, the wasted space isn’t so bad either. Taking the length of a string is nice and fast, and so is the process of finding the value associated with a given key (certainly faster than doing up to five string comparisons).

但是在这种情况下,浪费的空间也不是那么糟糕。 取得字符串的长度既好又快速,找到与给定键关联的值的过程也是如此(肯定比进行最多五个字符串比较要快)。

But, what do we do if our dataset has a string which has more than 11 characters?What if we have one another word with 5 characters, “India”, and try assigning it to an index using our hash function. Since the index 5 is already occupied, we have to make a call on what to do with it. This is called a collision.

但是,如果我们的数据集包含一个包含11个以上字符的字符串,我们该怎么办?如果我们有另一个5个字符的单词“印度”,并尝试使用我们的哈希函数将其分配给索引,该怎么办。 由于索引5已被占用,因此我们必须调用如何处理它。 这称为碰撞。

If our dataset had a string with thousand characters, and you make an array of thousand indices to store the data, it would result in a wastage of space. If our keys were random words from English, where there are so many words with same length, using length as a hashing function would be fairly useless.

如果我们的数据集包含一个包含数千个字符的字符串,并且您创建了一个包含数千个索引的数组来存储数据,则将导致空间的浪费。 如果我们的关键字是来自英语的随机单词,其中有那么多个长度相同的单词,那么将length用作哈希函数将毫无用处。

碰撞处理 (Collision Handling)

Two basic methods are used to handle collisions.

使用两种基本方法来处理冲突。

  1. Separate Chaining

    单独链接
  2. Open Addressing

    开放式寻址

单独链接 (Separate Chaining)

Hash collision handling by separate chaining, uses an additional data structure, preferrably linked list for dynamic allocation, into buckets. In our example, when we add India to the dataset, it is appended to the linked list stored at the index 5, then our table would look like this.

通过单独的链进行哈希冲突处理,将其他数据结构(最好是用于动态分配的链表)使用到存储桶中。 在我们的示例中,当将印度添加到数据集时,它将印度附加到存储在索引5的链表中,那么我们的表将如下所示。

To find an item we first go to the bucket and then compare keys. This is a popular method, and if a list of links is used the hash never fills up. The cost for get(k) is on average O(n) where n is the number of keys in the bucket, total number of keys be N.

要查找物品,我们首先进入存储桶,然后比较键。 这是一种流行的方法,如果使用链接列表,则哈希永远不会填满。 get(k)的成本平均为O(n) ,其中n是存储桶中的密钥数,密钥总数为N。

The problem with separate chaining is that the data structure can grow with out bounds.

单独链接的问题在于数据结构可以无限制地增长。

开放式寻址 (Open Addressing)

Open addressing does not introduce any new data structure. If a collision occurs then we look for availability in the next spot generated by an algorithm. Open Addressing is generally used where storage space is a restricted, i.e. embedded processors. Open addressing not necessarily faster then separate chaining.

开放式寻址不会引入任何新的数据结构。 如果发生冲突,那么我们会在算法生成的下一个位置中寻找可用性。 开放式寻址通常用于存储空间有限(即嵌入式处理器)的地方。 开放式寻址不一定比单独链接要快。

Methods for Open Addressing

开放式寻址方法

如何在代码中使用哈希。 (How to use hashing in your code.)

Python (Python)
# Few languages like Python, Ruby come with an in-built hashing support.
   # Declaration
    my_hash_table = {}
    my_hash_table = dict()

   # Insertion
    my_hash_table[key] = value

   # Look up
    value = my_hash_table.get(key) # returns None if the key is not present || Deferred in python 3, available in python 2
    value = my_hash_table[key] # throws a ValueError exception if the key is not present

    # Deletion
    del my_hash_table[key] # throws a ValueError exception if the key is not present

    # Getting all keys and values stored in the dictionary
    keys = my_hash_table.keys()
    values = my_hash_table.values()

Run Code

运行代码

Java (Java)
// Java doesn't include hashing by default, you have to import it from java.util library
    // Importing hashmaps
    import java.util.HashMap;

   // Declaration
    HashMap<Integer, Integer> myHashTable = new HashMap<Integer, Integer>(); // declares an empty map.

   // Insertion
    myHashTable.put(key, value);

   // Deletion
    myHashtable.remove(key);

    // Look up
    myHashTable.get(key); // returns null if the key K is not present
    myHashTable.containsKey(key); // returns a boolean value, indicating the presence of a key

    // Number of key, value pairs in the hash table
    myHashTable.size();

Run Code

运行代码

有关哈希的更多信息 (More info on hashing)

翻译自: https://www.freecodecamp.org/news/what-is-hashing/

哈希值 哈希码

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值