mysql复杂查询示例_找到时间和内存复杂性之间的平衡-一个示例

最新推荐文章于 2024-03-24 09:39:47 发布

cumi6497

最新推荐文章于 2024-03-24 09:39:47 发布

阅读量170

点赞数

文章标签：数据结构 python java 数据库 mysql

原文链接：https://www.freecodecamp.org/news/finding-the-balance-between-time-and-memory-complexity-an-illustrated-example-4845ab7afadd/

版权

mysql复杂查询示例

by Anmol Uppal

通过Anmol Uppal

找到时间和内存复杂性之间的平衡-一个示例 (Finding the balance between time and memory complexity — an illustrated example)

As programmers, we often have to trade-off between time and memory complexity. Managing one often means compromising on the other. It is hard to find the right sweet spot between them.

作为程序员，我们经常不得不在时间和内存复杂度之间进行权衡。管理一个往往意味着妥协另一个。很难在它们之间找到正确的最佳位置。

The problem becomes even more pressing for Android and iOS devices, which have relatively limited resources.

对于资源相对有限的Android和iOS设备，此问题变得更加紧迫。

Let’s see how we can find this “sweet spot” using an example. Our example will be to develop a query that checks if a given word is present in the English dictionary or not.

让我们看一个例子如何找到这个“最佳点”。我们的示例将是开发一个查询，以检查给定单词在英语词典中是否存在。

The use case is very specific to text input applications (such as a mobile phone keyboard), but the concepts used in this article can be used in other areas as well.

该用例非常特定于文本输入应用程序(例如手机键盘)，但是本文中使用的概念也可以在其他领域中使用。

Data structures often applied to this problem include:

通常应用于此问题的数据结构包括：

Set
组
Trie
特里
HashMap
哈希图

Among these data structures, Trie is specially tailored for spell-checking. The other data structures are generic, and can be applied to many data types. Let’s visualise how Trie works.

在这些数据结构中，Trie专为拼写检查而设计。其他数据结构是通用的，可以应用于许多数据类型。让我们想象一下Trie的工作原理。

了解特里 (Understanding Trie)

Say we have only four words in the dictionary, for example: “hello,” “world,” “he,” and “win.”

假设我们词典中只有四个单词，例如：“ hello”，“ world”，“ he”和“ win”。

We can visualise a Trie for this dictionary as:

我们可以将此字典的Trie可视化为：

The red circles mark termination of a valid word. The Trie structure maintains a hierarchy of valid parent and child relationships. Each node contains at least these three fields:

红色圆圈表示有效单词的终止。 Trie结构维护有效的父级和子级关系的层次结构。每个节点至少包含以下三个字段：

TrieNode {    char            character;  // a, b, c, d, ... y, z    boolean         isTerminal; // If this node ends in a valid word    List<TrieNode>  children;   // List of children nodes.}

A detailed discussion for constructing Trie is not in the scope of this article, but we will briefly touch upon how a search is performed in Trie:

构造Trie的详细讨论不在本文的讨论范围之内，但是我们将简要介绍如何在Trie中执行搜索：

Let’s say in the above Trie (with just four valid words), we need to search if “hell” is a valid word.

假设在上面的Trie(只有四个有效词)中，我们需要搜索“ hell”是否为有效词。

We will start from the root node and take “h,” the first character of “hell,” then iterate over the root node’s children. If “h” is found, then we will iterate the children of “h,” and ignore the other children of the root node.

我们将从根节点开始，取“ h”(“ hell”的第一个字符)，然后遍历根节点的子代。如果找到“ h”，那么我们将迭代“ h”的子代，并忽略根节点的其他子代。

This process runs until we hit the last character of search word. For the last character, “l,” we also check that node’s isTerminal field. This is false in this case, as the valid words in this dictionary were “he,” “hello,” “win” and “world” only.

这个过程一直进行到我们找到搜索词的最后一个字符为止。对于最后一个字符“ l”，我们还要检查节点的isTerminal 领域。在这种情况下，这是false ，因为该词典中的有效词仅是“他”，“你好”，“赢”和“世界”。

介绍MagicDict (Introducing MagicDict)

MagicDict takes advantage of language-specific properties which Trie has overlooked:

MagicDict利用了Trie忽略的特定于语言的属性：

All characters (a-z) can be imagined as a continuous integer range (0–26).
所有字符(az)都可以想象成一个连续的整数范围(0–26)。

Since all possible children characters (a-z) for a given node can be realized as a contiguous integer range, we can use an array of Boolean values to represent children.

由于给定节点的所有可能的子代字符(az)都可以实现为连续的整数范围，因此我们可以使用Boolean值数组来表示子代。

We will use a Boolean array of size 26, with all elements False as the initial value. We also need another array of 26 Boolean values to represent isTerminal as well.

我们将使用大小为26的Boolean数组，所有元素False为初始值。我们还需要另一个包含26个Boolean值的数组来表示isTerminal 。

This single 2D array represents only 1 level of parent-child relationships. For the English language with 26 characters, the size of 2D array would be: 26 x 52.

该单个2D数组仅表示1个级别的父子关系。对于26个字符的英语，二维数组的大小为：26 x 52。

We can stack them on one another. This means children of the 1st layer become the parent in 2nd layer, children of 2nd layer become the parent in 3rd… and so on. This forms a kind of chained structure, and the basic elements of MagicDict.

我们可以将它们彼此堆叠。这意味着第一层的子级成为第二层的父级，第二层的子级成为第三层的父级……依此类推。这形成了一种链式结构，是MagicDict的基本元素。

插入MagicDict (Insertion in MagicDict)

We construct a stack of 2D layers, where the number of layers required to build the stack is longest_word_length — 1 .

我们构造了一个2D层的堆栈，其中构建堆栈所需的层数为longest_word_length — 1 。

For the previous set of words: “he,” “hello,” “win,” and “world,” longest_word_length equals 5. So, we need to reserve a stack size of 4.

对于上一组单词：“ he”，“ hello”，“ win”和“ world”， longest_word_length等于5。因此，我们需要保留4的堆栈大小。

Say we want to insert “hello” in this data structure. We start with the first pair {“h” and “e”}, and turn on the corresponding isChildren boolean flag in layer 1.

假设我们要在此数据结构中插入“ hello”。我们从第一对{“ h”和“ e”}开始，并在第1层打开相应的isChildren布尔标志。

Then we take the next pair {“e”, “l”}, similarly turning on the corresponding isChildren boolean flag in layer 2. This process is repeatd until we reach the terminating pair {“l” and “o”}, where we also turn on the corresponding isTerminal boolean flag.

然后，我们取下一个对{“ e”，“ l”}，类似地打开第2层中的相应isChildren布尔标志。重复此过程，直到到达终止对{“ l”和“ o”}，在这里还打开相应的isTerminal布尔标志。

The whole process can be visualized as shown:

整个过程可以如下图所示：

在MagicDict中搜索 (Searching in MagicDict)

Searching follows the same flow as insertion. The only difference is that in insertion, we change the bit values. But in case of searching, we only read values to check if the character sequence in the query word is valid.

搜索遵循与插入相同的流程。唯一的区别是在插入时，我们更改了位值。但是在搜索的情况下，我们仅读取值以检查查询词中的字符序列是否有效。

最终魔术之触 (Final Magic Touch)

Now we have a data structure which stores n 2D arrays of size 26 x 52. Each 2D array stores 1352 boolean (true/false) values.

现在我们有了一个数据结构，它存储n个大小为26 x 52的2D数组。每个2D数组存储1352个布尔(真/假)值。

We also know the fact that Boolean values take at most 1 byte of memory (since the smallest addressable unit of memory is a byte). Consuming 1 byte to store a Boolean flag is not the ideal scenario.

我们也知道布尔值最多占用1个字节的内存(因为内存的最小可寻址单元是一个字节)。占用1个字节来存储Boolean标志不是理想的情况。

What if we could find a data type large enough to hold the boolean flags of the 2D array as a contiguous bit pattern?

如果我们可以找到足够大的数据类型来将2D数组的布尔标志保存为连续的位模式，该怎么办？

It turns out that there aren’t any! Primitive data types have 8-bit, 16-bit, 32-bit, 64-bit, 128-bit representations… but no primitive data type is large enough to store 1352 contiguous bits.

原来没有任何东西！基本数据类型具有8位，16位，32位，64位，128位表示形式……但是没有任何原始数据类型足够大以存储1352个连续位。

The closest available contiguous bit pattern seems to be 64-bit, which is also known as long in some languages. We replace the rows in our 2D array with a long value holding 64-bits.

最接近的可用连续位模式似乎是64位，在某些语言中也称为long 。我们将二维数组中的行替换为保留64位的long值。

For a dictionary with a maximum word length of 21, and each layer consuming 26 x 8 bytes, the total size of the data structure would be 4,160 bytes.

对于最大单词长度为21且每层占用26 x 8字节的字典，数据结构的总大小为4,160字节。

标杆管理 (Benchmarking)

We analysed 370,000 English language words from this Github repo, and recorded time taken for:

我们分析了这个Github存储库中的 370,000个英语单词，并记录了以下时间：

Insertion of 370,000 words
插入370,000字
Deletion of 100,000 words
删除100,000个单词
Querying of 100,000 words (50,000 existing words, 50,000 non-existing words
查询100,000个单词(50,000个现有单词，50,000个不存在单词

And we also looked at the estimated memory consumption of various data structures.

我们还查看了各种数据结构的估计内存消耗。

最后的想法 (Final Thoughts)

This initial model does not have extensive features. But, it could be extended to languages other than English by using larger data types.

此初始模型没有广泛的功能。但是，可以通过使用较大的数据类型将其扩展为英语以外的其他语言。

The main takeaway from this data structure is the efficient usage of space, while improving on runtime performance too. This is nearer the “best-of-both-worlds” scenario.

该数据结构的主要优点是空间的有效利用，同时也提高了运行时性能。这更接近“两全其美”的情况。

For detailed information on how this Data Structure is implemented, you can check out the source code of MagicDict.

有关如何实现此数据结构的详细信息，您可以查看MagicDict的源代码。