一致性哈希表分布式哈希表_哈希表和哈希表的无代码指南

最新推荐文章于 2024-10-02 10:19:33 发布

cumian9828

最新推荐文章于 2024-10-02 10:19:33 发布

阅读量247

点赞数

文章标签：数据结构 python java 机器学习算法

原文链接：https://www.freecodecamp.org/news/the-codeless-guide-to-hash/

版权

一致性哈希表分布式哈希表

If you have programmed before, you are sure to have come across hashing and hash tables. Many developers have used hash tables in one form or another, and beginner developers must learn this fundamental data structure. There is just one problem:

如果您以前进行过编程，则一定会遇到哈希表和哈希表。许多开发人员已使用一种或另一种形式的哈希表，并且初学者必须学习此基本数据结构。只有一个问题：

All the tutorials you come across are sure to discuss hashing and hash tables in JavaScript, Python, or some other programming language.

您遇到的所有教程都一定会讨论JavaScript，Python或其他某种编程语言的哈希和哈希表。

What this means is that you may understand a little about how hashing works and how to use a hash table in [insert language here], but may miss the principles of how it works.

这意味着您可能在[在此处插入语言]中对散列的工作原理以及如何使用哈希表有所了解，但可能会错过其工作原理。

Wouldn't it be great if we could learn about hashing without knowing any particular language? If you know how hashing works, and what a hash table is, the language shouldn't matter.

如果我们不了解任何特定语言就可以学习散列，那不是很好吗？如果您知道哈希的工作原理以及哈希表是什么，那么语言就没有关系了。

That is the codeless approach, and in this post I will teach you all about hashing and hash tables regardless of which programming language you are currently using. Whether you're a junior or senior dev, everyone will learn something from this post.

那是无代码的方法，无论您当前使用哪种编程语言，我都会在本文中教您有关哈希和哈希表的所有知识。无论您是初级开发人员还是高级开发人员，每个人都将从本文中学到一些东西。

那么什么是哈希函数呢？ (So What's a Hash Function Anyway?)

Before we get into all the fancy stuff, let me tell you what hashing is. To make it easy let's imagine we have a black box:

在介绍所有奇特的东西之前，让我告诉您什么是哈希。为了简单起见，我们假设有一个黑匣子：

This black box is special. It is called a function box. We'll call it a function box because this box will map an independent variable on the input to a dependent variable on the output (it sounds mathy but bear with me).

这个黑匣子很特别。它称为功能框。我们将其称为功能框，因为该框会将输入端的自变量映射到输出端的因变量(听起来很数学，但请耐心等待)。

Our function box works like this: if we put a letter into the box, we get a number out. Since our box is a function box, there can only be one output for every input into the box.

我们的功能框的工作方式如下：如果在框中输入字母，则会得到一个数字。由于我们的盒子是一个功能盒子，因此盒子中的每个输入只能有一个输出。

Our function box will take a letter from A-J on the input and output a corresponding number from 0 to 9 on the output. So if we input C we will get 2 on the output.

我们的功能框将在输入上输入AJ的字母，并在输出上输出从0到9的相应数字。因此，如果我们输入C，则输出将为2。

This forms the basics of what a hash function is. The hash function, however, takes it a step further. We will map data on the input to some numeric value on the output, usually a hexadecimal sequence.

这构成了什么是哈希函数的基础。但是，哈希函数使它更进一步。我们将输入上的数据映射到输出上的一些数值，通常是十六进制序列。

So essentially all hashing does is it uses a function to map data to a representative numeric or alphanumeric value. For the hash function, regardless of the size of the input, the output will always remain the same.

因此，基本上所有哈希操作都使用函数将数据映射到代表性的数字或字母数字值。对于哈希函数，无论输入大小如何，输出将始终保持不变。

那哈希表呢？ (What about Hash Tables?)

So at this point you may be wondering what a hash table is. Hash tables utilize hashing to form a data structure.

因此，此时您可能想知道什么是哈希表。哈希表利用哈希来形成数据结构。

Hash tables use an associative method to store data by using what is known as a key-value lookup system. All that means is that, in a hash table, keys are mapped to unique values.

哈希表使用一种称为键值查找系统的关联方法来存储数据。这意味着在哈希表中，键被映射到唯一值。

This system of organizing data results in a very fast way to find data efficiently. This is because since each key is mapped to a unique value – once we know a key then we can find the associated value instantly.

这种组织数据的系统可以非常快速地有效查找数据。这是因为由于每个键都映射到一个唯一值-一旦我们知道一个键，便可以立即找到关联的值。

Hash tables are extremely fast, having a time complexity that is in the order of O(1).

哈希表非常快，时间复杂度约为O(1)。

Confused? Take a look at this diagram, where we have multiple function boxes generating hash values.

困惑？看一下该图，其中有多个函数盒生成哈希值。

In this scenario, each character on the input (each is a key) has a hash function applied to it, and the hash function in the function box generates the hash value. This resulting value is then mapped to an index in the underlying linked list or array used to implement the hash table.

在这种情况下，输入上的每个字符(每个都是键)都应用了哈希函数，并且功能框中的哈希函数会生成哈希值。然后，该结果值将映射到用于实现哈希表的基础链接列表或数组中的索引。

The resulting structure will look like this:

生成的结构如下所示：

哈希冲突 (Hash Collisions )

This is a good time to talk about collision in hash functions and hash tables.

现在是讨论哈希函数和哈希表中冲突的好时机。

A function in mathematics is ideal in that an element in the input is mapped to exactly one element in the output.

数学函数是理想的，因为输入中的一个元素正好映射到输出中的一个元素。

In a hash function, however, it is not always like this. Sometimes differing hash values in the input may produce the same hash value in the output. When this occurs you get what is known as a hash collision.

但是，在哈希函数中，并不总是这样。有时，输入中不同的哈希值可能会在输出中产生相同的哈希值。发生这种情况时，您会得到所谓的哈希冲突。

Hash collisions are not very common in most use cases, as a small change in the input can produce a dramatically differing output. But the more data you have to input to the hash function, the more likely a collision is to occur.

哈希冲突在大多数用例中不是很常见，因为输入的微小变化会产生截然不同的输出。但是，您必须向哈希函数输入的数据越多，发生冲突的可能性就越大。

In the hash table example we provided earlier, we assumed that an array was used to implement the hash table. While this is good for simple hash tables, in practice these are not very good for handling collisions.

在我们之前提供的哈希表示例中，我们假定使用数组来实现哈希表。尽管这对于简单的哈希表很有用，但实际上它们对于处理冲突不是很好。

As such, a method known as chaining is used. In chaining, if the hash table returns the same hash value for multiple elements, we simply "chain" the elements together with the same hash values at the same index in the hash table.

这样，使用称为链接的方法。在链接中，如果哈希表为多个元素返回相同的哈希值，我们只需将元素与哈希表中相同索引处的相同哈希值“链接”在一起。

This way instead of being implemented as an array with an index, we implement the hash table using a linked list where each element is a list rather than merely having a single value assigned to it.

这样，我们不是使用带有索引的数组来实现哈希表，而是使用链接列表来实现哈希表，其中每个元素都是一个列表，而不仅仅是为其分配一个值。

But as the length of the chain increases, the time complexity of the hash table can get worse. A method known as open addressing is also used. In it, alternate locations in the underlying data structure implementing the hash table are found. Just keep in mind that this method will reduce the efficiency of the hash table and has a worse time complexity.

但是随着链条长度的增加，哈希表的时间复杂度可能会变差。还使用一种称为开放式寻址的方法。在其中，可以找到实现哈希表的基础数据结构中的备用位置。请记住，此方法会降低哈希表的效率，并且会增加时间复杂度。

哈希与加密或编码相同吗？ (Is Hashing the Same as Encryption or Encoding? )

Many people tend to associate hashing with encryption or encoding. So is hashing encryption? Is it the same as encoding?

许多人倾向于将哈希与加密或编码相关联。哈希加密呢？与编码相同吗？

You see, in encryption we muddle data so that only someone with the key needed to decrypt the data will have access to it. When we utilize an encryption cipher, we not only make the data encrypted, but we also want to decrypt the data at some point. In encryption we want to recover the original data.

您会看到，在加密过程中，我们会混淆数据，以便只有拥有解密数据所需密钥的人才能访问它。当我们使用加密密码时，我们不仅使数据加密，而且还希望在某个时刻解密数据。在加密中，我们要恢复原始数据。

Hashing, on the other hand, takes data and produces an output for the purpose of confirming the integrity of data. In hashing we have no intention of recovering the original data.

另一方面，散列会获取数据并产生输出，以确认数据的完整性。在散列中，我们无意恢复原始数据。

Encoding differs from encryption and hashing in that the goal of encoding is not to obscure data for any security purpose, but merely to convert the data into a format that another system can use.

编码与加密和散列的不同之处在于，编码的目的不是为了任何安全目的而遮盖数据，而仅仅是将数据转换为另一系统可以使用的格式。

我可以做些什么？ (What Can I Do with Hashing? )

Hashes and hash tables have numerous uses! These include:

哈希表和哈希表有很多用途！这些包括：

Cryptosystems
密码系统
Cyclic Redundancy Checks
循环冗余校验
Search Engines
搜索引擎
Databases
资料库
Compilers
编译器

Or any system that has a complex lookup process.

或任何具有复杂查找过程的系统。

结语 (Wrapping Up)

In this post we've covered the basics of hashing, all without writing a single line of code! This was easy right? The codeless approach is a much easier way of learning about these fundamental topics.

在这篇文章中，我们讨论了哈希的基础知识，所有这些都无需编写任何代码！这很容易吧？无代码方法是一种学习这些基本主题的简便方法。

We learned that hash functions can be used to convert objects into a fixed length alphanumeric output. We also learned that hash tables are key-value lookup systems and, while they work well, are not perfect and sometimes suffer from collisions.

我们了解到，哈希函数可用于将对象转换为固定长度的字母数字输出。我们还了解到，哈希表是键值查找系统，尽管它们可以很好地工作，但它们并不是完美的，有时会遭受冲突的影响。

By the end of this post you should know the difference between hashing, encryption, and encoding, and also have an idea of where hashes can be used.

在这篇文章的结尾，您应该了解哈希，加密和编码之间的区别，并且还了解可以在何处使用哈希。

Did you like the codeless approach? Want to go further?

您喜欢无代码方法吗？想走得更远吗？

Learn about hash tables and other data structures and algorithms in the book "Codeless Data Structures and Algorithms". You'll get an expansion of what was covered in this post and learn about many more topics, all without writing a single line of code!

在“无代码数据结构和算法”一书中了解哈希表以及其他数据结构和算法。您将获得本文扩展的内容，并了解更多主题，所有这些都无需编写任何代码！