前缀平方和_平方最短唯一前缀

本文探讨了前缀平方和的概念,来源于Medium上的一篇文章。通过理解这一概念,可以解决找到字符串最短的唯一前缀问题。
摘要由CSDN通过智能技术生成

前缀平方和

问题: (Problem:)

Given a list of words, return the shortest unique prefix of each word. For example, given the list:

给定单词列表,返回每个单词的最短唯一前缀。 例如,给定列表:

dog 
cat
apple
apricot
fist

Return the list:

返回列表:

d
c
app
apr
f

简单的基于哈希的解决方案: (Simple Hash-based Solution:)

It’s easy to develop a simple hash-based solution. Define a std::unordered_map from string to list of integers (each integer is index of a word in the original input). Add every prefix of every word in the map, and later iterate over the map to find all keys corresponding to singleton lists.

开发简单的基于哈希的解决方案很容易。 从字符串到整数列表定义一个std::unordered_map (每个整数是原始输入中一个单词的索引)。 添加地图中每个单词的每个前缀,然后在地图上进行迭代以查找与单例列表相对应的所有键。

std::vector<std::string> shortest_prefixes(const std::vector<std::string>& words) {
  if (words.empty()) return {};
  std::unordered_map<std::string, std::vector<int> > prefix_indices;
  for (const auto& word : words) {
    std::string s = "";
    for (int i = 0; i < words.size(); i++) {
     const auto& word = words[i];
      s+= c;
      if (prefix_indices.find(s) == prefix_indices.end()) {
        prefix_indices[s] = {i};
      } else {
        prefix_indices[s].push_back(i);
      }
    }
  }


  std::vector<std::string> result(words.size(), "");
  for (const auto& kv : prefix_indices) {
    if (kv.second.size() != 1) continue;
    const auto& prefix = kv.first;
    const int index = (kv.second)[0];
    if (result[index].empty() || result[index].size() > prefix.size()) {
      result[index] = prefix;
    }
  }
  return result;
}

What’s the complexity of this algorithm? Let’s assume that an average length of the string is O(M) and there are O(N) strings. You are storing every prefix of every string in an unordered_map and querying it later. Since there are O(M) prefixes for every string, the total complexity of the algorithm is O(MN). This, of course, assumes that the insert and search through unordered_map takes constant time.

该算法的复杂性是什么? 假设字符串的平均长度为O(M)并且有O(N)字符串。 您将每个字符串的每个前缀存储在unordered_map并在以后查询。 由于每个字符串都有O(M)前缀,因此算法的总复杂度为O(MN) 。 当然,这假定通过unordered_map进行插入和搜索花费的时间是恒定的。

How about the space complexity? The total character space taken by all prefixes of a string is O(M²). Since there are N strings, the total space complexity is O(M²N).

空间复杂度如何? 字符串的所有前缀占用的总字符空间为O(M²)。 由于有N字符串,所以总空间复杂度为O(M²N)。

We also run a risk of hash collision and unordered_map search/insert running in non-constant time. Can we improve upon this algorithm?

我们还存在哈希冲突和在非恒定时间内运行unordered_map搜索/插入的风险。 我们可以改进此算法吗?

您自己的基于数据结构的解决方案: (Your Own Data-structure-based Solution:)

In this section, we will define a new data structure, Prefix Tree, which will

在本节中,我们将定义一个新的数据结构Prefix Tree ,它将

  1. Save space by cleverly storing prefixes of all strings

    通过巧妙地存储所有字符串的前缀来节省空间
  2. Save time by early stopping in the search of prefixes, and also by avoiding hash collisions.

    通过尽早停止搜索前缀以及避免哈希冲突来节省时间。

Here is an outline of how the tree looks like:

这是树的外观概述:

  1. Every character in a string forms a node of the tree.

    字符串中的每个字符都构成树的节点。
  2. The next character to the right in the string forms its child node.

    字符串右边的下一个字符形成其子节点。
  3. Thus, every path in the tree from root to a node spells a prefix of the tree.

    因此,树中从根到节点的每个路径都拼写了树的前缀。
  4. Every node contains the index of the word in the array it belongs to.

    每个节点都包含其所属数组中单词的索引。
  5. If a node belongs to multiple indices, we keep the first of them, and set an additional boolean flag, repeated to true.

    如果一个节点属于多个索引,则保留第一个索引,并设置一个附加的布尔标志, repeated为true。

Prefix tree for a two-word example [apple, apricot] looks like this:

两个词示例[apple, apricot]前缀树如下所示:

                   root 
|
'a' (repeated=true)
|
'p' (repeated=true)
|
-----------------------------------
| |
'p' (repeated=false) 'r' (repeated=false)
| |
'l' (repeated=false) 'i' (repeated=false)
| |
'e' (repeated=false) 'c' (repeated=false)
|
'o' (repeated=false)
|
't' (repeated=false)

Here is the struct definition of the Prefix Tree:

这是前缀树的结构定义:

struct PrefixTree {
  char c;
  int word_index = 0;
  bool repeated = false;
  PrefixTree* children[26] = { nullptr };
  PrefixTree* parent = nullptr;
};

Here is how we can construct the PrefixTree from a vector of strings (Let’s assume that the characters in the words only come from lowercase English letters {a, b, ..., z}):

这是我们如何从字符串向量构建PrefixTree的方法(假设单词中的字符仅来自小写英文字母{a, b, ..., z} ):

PrefixTree* construct_tree(const std::vector<std::string>& words) {
  std::unique_ptr<PrefixTree> root = std::make_unique<PrefixTree>();
  for (int word_idx = 0; word_idx < words.size(); word_idx++) {
    PrefixTree* curr = root.get();
    for (const auto& c : words[word_idx]) {
      const auto& offset = c - 'a';
      if (curr->children[offset] != nullptr) {
        curr->children[offset]->repeated = true;
      } else {
        std::unique_ptr<PrefixTree> new_tree = std::make_unique<PrefixTree>();
        new_tree->c = c;
        new_tree->word_index = word_idx;
        new_tree->parent = curr;
        curr->children[offset] = new_tree.get();
      }
      curr = curr->children[offset];
    }
  }
  return root.release();
}

Finally, here is a BFS-like algorithm to search through the prefix tree, and populate the output shortest-prefix array:

最后,这是一个类似于BFS的算法,用于搜索前缀树,并填充输出的最短前缀数组:

  1. Insert the root of the tree in a BFS queue.

    将树的root插入BFS队列中。

  2. At every iteration, pop the front of the queue (call it curr). iterate over all the children of curr. If a child node q belongs to a unique word (repeated=false), construct the unique prefix corresponding to the path from root to that node, and save it in the word_indexth location in the output.

    在每次迭代中,弹出队列的最前面(称为curr )。 遍历curr所有子项。 如果子节点q属于唯一词( repeated=false ),则构造与从根到该节点的路径相对应的唯一前缀,并将其保存在输出中的word_index th位置。

  3. Else, push the new node back in the queue.

    否则,将新节点推回队列。
// Helper function to construct a prefix from a PrefixTree node
std::string construct_prefix(const PrefixTree* node) {
  std::string result = "";
  PrefixTree* curr = node;
  while (curr != nullptr) {
    result = curr->c + result;
    curr = curr->parent;
  }
  return result;
}


std::vector<std::string> shortest_prefixes(const std::vector<std::string>& words) {
  if (words.empty()) return {};
  std::vector<std::string> result(words.size(), "");
  
  // Construct the prefix tree out of the words
  PrefixTree* root = construct_tree(words);
  
  // Main BFS loop to construct the prefixes
  std::queue<PrefixTree*> q;
  q.push(root);
  while (!q.empty()) {
    PrefixTree* curr = q.pop();
    for (int i = 0; i < 26; i++) {
      if (curr->children[i] == nullptr) continue;
      PrefixTree* new_tree = curr->children[i];
      if (!new_tree->repeated) {
        result[new_tree->word_index] = construct_prefix(new_tree);
        continue;
      }
      q.push(new_tree);
    }
  }
}

What’s the complexity of this algorithm? The algorithm has two components: (1) Prefix tree creation, and (2) prefix tree traversal. In both components, we visit every character for every word once. For every character, we perform a constant amount of work (updating a few variables, or iterating over a constant-size array). As a result, the complexity of this algorithm is O(MN). And here is the kick: we store a constant amount of data per character per word (The Node struct has O(1) size)!. As a result, the space complexity of this algorithm is also O(MN).

该算法的复杂性是什么? 该算法有两个组成部分:(1)前缀树创建和(2)前缀树遍历。 在这两个部分中,我们每个单词的每个字符访问一次。 对于每个字符,我们执行恒定的工作量(更新一些变量或遍历恒定大小的数组)。 结果,该算法的复杂度为O(MN) 。 这就是问题所在:每个单词每个字符存储的数据量恒定( Node结构的大小为O(1) )! 结果,该算法的空间复杂度也是O(MN)

测试: (Testing:)

Here are a few test cases in increasing order of difficulty:

以下是一些难度递增的测试案例:

  1. Empty word-list

    空词表
  2. Singleton list

    单身人士名单
  3. A list where every word is a prefix of the next word

    每个单词都是下一个单词的前缀的列表
  4. A random list, e.g. the one given in the problem statement

    随机列表,例如问题陈述中给出的列表
GTEST("Empty list") {
  EXPECT_TRUE(shortest_prefixes({}).empty());
}


GTEST("Singleton list") {
  EXPECT_THAT(shortest_prefixes({"abcd"}), ElementsAre("a"));
}


GTEST("Prefix relationships") {
  EXPECT_THAT(shortest_prefixes({"a", "aa", "aaa"}), ElementsAre("", "", "aaa"));
}


GTEST("Random strings") {
  EXPECT_THAT(shortest_prefixes({"dog", "cat",  "apple",  "apricot", "fist"}), ElementsAre("d", "c", "app", "apr", "f"));
}
}

Originally published at https://cppcodingzen.com on September 6, 2020.

最初于 2020年9月6日 发布在 https://cppcodingzen.com 上。

翻译自: https://medium.com/@cppcodingzen/square-shortest-unique-prefix-6bededf9cfe7

前缀平方和

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值