[C++&Rust]LeetCode No.692 前K个高频单词(每日一题)

本文链接：https://blog.csdn.net/SGCTSGCT/article/details/117061751

原贴地址：http://blog.leanote.com/post/dawnmagnet/lc692

题目

给一非空的单词列表，返回前 k 个出现次数最多的单词。

返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率，按字母顺序排序。

示例 1：

输入: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
输出: ["i", "love"]
解析: "i" 和 "love" 为出现次数最多的两个单词，均为2次。
    注意，按字母顺序 "i" 在 "love" 之前。

示例 2：

输入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
输出: ["the", "is", "sunny", "day"]
解析: "the", "is", "sunny" 和 "day" 是出现次数最多的四个单词，
    出现次数依次为 4, 3, 2 和 1 次。

注意：

假定 k 总为有效值， 1 ≤ k ≤ 集合元素数。
输入的单词均由小写字母组成。

来源：力扣（LeetCode）
链接：https://leetcode-cn.com/problems/top-k-frequent-words
著作权归领扣网络所有。商业转载请联系官方授权，非商业转载请注明出处。

思路分析

这道题就思路而言还是非常简单的，首先我们得遍历一遍单词列表，获得最基本的判据，也就是每个单词出现的次数，这个过程我们只能用哈希表来实现。
在做完上述工作之后，我们获得了每个单词以及它出现的次数。有了这个次数，我们就可以对这些键值对进行排序。而且排序还是需要我们自己定义的。因为没有任何语言提供对于键值对的大小比较，就算有也不一定符合我们的要求，所以我们要自定义比较函数。

可是我们先不急着比较，我们思考一个问题，我们排序了n个键值对，但其实我们只需要k个，所以这里就需要引入一个数据结构来解决我们的问题了，这个数据结构就是优先队列/最大堆/最小堆。这仨其实是一个东西，其实底层都是堆实现的。就是不同的语言叫法不同而已，基本上是个语言就会提供这些工具（C除外，它不配）。
比如C++中的priority_queue和set，python中的heapq， rust中的BinaryHeap，java中的PriorityQueue等，都大同小异，没什么很大的区别，提供的方法也就是那么几个，push/pop/top(或者叫peek)，我们维护一个长度小于k的优先队列，因为k之后的元素我们是不需要的，所以我们就可以直接舍弃，这样我们的复杂度就从O(nlgn)降低到了O(nlgk)，因为每次插入优先队列只需要和k个数进行比较即可。
所以难度转移到了如何向优先队列表示我们需要指定排序的顺序。这就使用到了lambda函数，甚至还需要自定义对象。这些都比较要求我们对一个语言的使用有着更深入的掌握。具体语言的lambda函数我们在这里就不详细展开了，有兴趣的可以自行csdn学习。这个主要是取决于你自己惯用的语言，你总得比较面面俱到地掌握它，才能发挥出更大的价值

C++代码

auto cmp = [](const pair<string, int>& a, const pair<string, int>& b) {
            return a.second == b.second ? a.first < b.first : a.second > b.second;
        };
class Solution {
public:
    vector<string> topKFrequent(vector<string>& words, int k) {
        unordered_map<string, int> m;
        for (auto & word : words) m[word]++;
        priority_queue<pair<string, int>, vector<pair<string, int>>, decltype(cmp)> que(cmp);
        for (auto& it : m) {
            que.emplace(it);
            if (que.size() > k) {
                que.pop();
            }
        }
        vector<string> res;
        while (que.size()) {
            res.push_back(que.top().first);
            que.pop();
        }
        reverse(res.begin(), res.end());
        return res;
    }
};

Rust代码

use std::collections::*;
use std::cmp::Ordering;
#[derive(Eq, Debug)]
struct Pair {
    pub word: String,
    freq: i32,
}
impl Pair {
    fn new(w: &str, f: i32) -> Self {
        Pair {
            word: w.to_string(),
            freq: f,
        }
    }
}
impl Ord for Pair {
    fn cmp(&self, other: &Self) -> Ordering {
        if self.freq == other.freq {
            self.word.cmp(&other.word)
        } else {
            other.freq.cmp(&self.freq)
        }
    }
}
impl PartialOrd for Pair {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
}

impl PartialEq for Pair {
    fn eq(&self, other: &Self) -> bool {
        self.freq == other.freq
    }
}
impl Solution {
    pub fn top_k_frequent(words: Vec<String>, k: i32) -> Vec<String> {
        let mut map = HashMap::new();
        for word in words {
            *map.entry(word).or_insert(0) += 1;
        }
        let mut heap = BinaryHeap::new();
        for (key, val) in map.iter() { 
            heap.push(Pair::new(key, *val));
            if heap.len() > k as usize {
                heap.pop();
            }
        }
        // println!("{:?}", heap);
        let mut res = vec![];
        while heap.len() > 0 {
            if let Some(pair) = heap.pop() {
                res.push(pair.word);
            }
        }
        res.reverse();
        res
    }
}