散列数据结构_散列数据结构

最新推荐文章于 2021-02-14 22:27:42 发布

cumt30111

最新推荐文章于 2021-02-14 22:27:42 发布

阅读量843

点赞数 1

文章标签：数据库 python java 数据结构 mysql

原文链接：https://www.includehelp.com/data-structure-tutorial/hashing-data-structure.aspx

版权

散列数据结构

什么是哈希？ (What is Hashing?)

Hashing is a technique that is used for storing and extracting information in a faster way. It helps to perform searching in an optimal way. Hashing is used in databases, encryptions, symbol tables, etc.

散列是一种用于以更快的方式存储和提取信息的技术。它有助于以最佳方式执行搜索。散列用于数据库，加密，符号表等。

为什么需要散列？ (Why Hashing is needed?)

Hashing is needed to execute the search, insert, and deletions in constant time on an average. In our other data structures like an array, linked list the above operations take linear time, O(n). The best case can be a self-balanced tree-like AVL tree, where the time complexity is of order O(logn). But Hashing allows us to perform the operations in constant time, O(1) on an average.

平均需要一定时间以散列方式执行搜索，插入和删除操作。在我们的其他数据结构(如数组)，链表中，上述操作需要线性时间O(n) 。最佳情况可以是自平衡树状AVL树，其时间复杂度为O(logn) 。但是散列允许我们平均在恒定时间O(1)中执行操作。

哈希的组成部分 (Component of hashing)

Hash table
哈希表
Hash functions
哈希函数
Collisions
碰撞
Collision resolution techniques
碰撞解决技术

哈希表ADT (Hash Table ADT)

As an ADT, the common operations of a hash table is,

作为ADT，哈希表的常见操作是：

Search a key in the hash table
在哈希表中搜索关键字
Insert a key in the hash table
在哈希表中插入密钥
Delete a key from the hash table
从哈希表中删除密钥

To understand what hash table is and how it works, let's take a programming example. Say the problem is to find unique characters in a string (lowercase only),

要了解什么是哈希表及其工作方式，我们来看一个编程示例。假设问题是在字符串中查找唯一字符(仅小写)，

#include <bits/stdc++.h>
using namespace std;

int main()
{
    string str;

    cout << "input your string:\n";
    cin >> str;

    //create the hash table
    int arr[26] = { 0 };
    for (char c : str) {
        arr[c - 'a']++;
    }

    //finding unique character in O(1)
    cout << "unique characters are: ";
    for (int i = 0; i < 26; i++) {
        if (arr[i] == 1)
            cout << char(i + 'a') << " ";
    }

    return 0;
}

Output:

输出：

input your string:
includehelp
unique characters are: c d h i n p u

Now how we can solve this using hash table & hashing?

现在我们如何使用哈希表和哈希解决这个问题？

Step 1 (Create the hash table):

步骤1(创建哈希表)：

We will use an array to create the hash table. To create the hash table,

我们将使用数组创建哈希表。要创建哈希表，

Let's declare an array of size 26, arr[26] and initialize with 0 initially (size =26 since there is 26 lowercase numbers only)

让我们声明一个大小为26， arr [26]的数组，并初始以0初始化(大小= 26，因为只有26个小写数字)

The for each character of the string we will map like below,

对于字符串中的每个字符，我们将按照以下方式进行映射，

for each character c in string str:
arr[c-'a']++

对于字符串str中的每个字符c ：
arr [c-'a'] ++

This will create a hash table for us to represent the string

这将为我们创建一个哈希表来表示字符串

Step 2 (Search the unique character):

步骤2(搜索唯一字符)：

Now if we find any arr[i] having value 1 then char(i+'a') will be our unique character
This is a standard example of hashing where we found the unique character in O(1) (O(26)=O(1) ) time after constructing the hash table.

现在，如果我们发现任何arr [i]的值为1，那么char(i +'a')将是我们的唯一字符
这是哈希的一个标准示例，其中我们在构造哈希表之后的O(1)(O(26)= O(1))时间内找到了唯一字符。

So hashing is an excellent solution for key-based searching if we can construct the hash table.

因此，如果我们可以构造哈希表，那么哈希对于基于键的搜索是一个很好的解决方案。

Okay, so the question is why we are calling it hashing though we used an array only? We could have termed is as just another use of an array.

好的，所以问题是，为什么我们仅使用数组就将其称为哈希？我们可以称其为数组的另一种用法。

To address the above question let's think what we could have done if the problem was related to numbers instead of lowercase character siring? What would be our array size then? Of course, we can't still take 26 then as the numbers will be ranging +INF to –INF (Okay for your satisfaction LLMAX to LLMIN), but that will not be a feasible thing to do of course as it won't be O(1) anymore. So what we need to do is to hash all the values to some limited range say [0 to k], so that it still stays O(1). This is known as hashing where we are mapping the keys (original values) to some locations based on a predefined algorithm (hashing function).

为了解决上述问题，让我们考虑一下，如果问题与数字有关，而不是小写字符，该怎么办？那么我们的数组大小是多少？当然，那时我们仍然不能取26，因为数字范围是+ INF到–INF (好的，从您满意的LLMAX到LLMIN )，但这当然不是可行的，因为它不会O(1)了。因此，我们需要做的是将所有值散列到[0到k]的某个有限范围内，以便它仍然保持O(1) 。这被称为哈希，在哈希中，我们基于预定义的算法(哈希函数)将键(原始值)映射到某些位置。

For example,

例如，

Say we have keys,
13, 1112, 111119, 112345111118, 11111111111115
And we have the locations (set of limited range) as [0,10]
So after hashing we may found
13 -> 3
1112 -> 2
111119 -> 9
112345111118 -> 8
11111111111115 -> 5

Can you guess the hashing function here?

您能在这里猜测哈希函数吗？

This is a kind of direct addressing where we can find the key just by looking at the locations. Like if we look at location 3, we will find 13. But this never happens if we have a large number of keys and that's why we require some hashing functions.

这是一种直接寻址 ，通过查看地址就可以找到密钥。就像查看位置3一样，我们将找到13。但是，如果我们有大量的键，这永远不会发生，这就是为什么我们需要一些哈希函数的原因。

哈希函数 (Hash functions)

Hash functions convert a key into the index of the hash table (location). A hash function should generate unique locations, but that's difficult to achieve since the number of indexes is much less than the number of keys. We often lead to the collision while using a hash function which is not perfect.

哈希函数将键转换为哈希表的索引(位置)。哈希函数应该生成唯一的位置，但是由于索引的数量远小于键的数量，因此很难实现。在使用不完美的哈希函数时，我们经常会导致冲突。

There are several collision removing techniques like below:

有几种冲突消除技术，如下所示：

Direct chaining
直接链接
1. Separate chaining 单独链接
Open addressing
开放式寻址
1. Linear probing
2. Quadratic probing
3. Double hashing

To more detail on hashing functions: Hash functions and its characteristics

有关散列函数的更多详细信息：散列函数及其特征

Coding problems using hashing:

使用哈希编码的问题：

More coding problems using hashing 使用散列的更多编码问题

翻译自: https://www.includehelp.com/data-structure-tutorial/hashing-data-structure.aspx