算法题——前K个高频单词

最新推荐文章于 2024-07-08 22:12:26 发布

子阅哥哥

最新推荐文章于 2024-07-08 22:12:26 发布

阅读量739

点赞数

分类专栏：算法题文章标签：算法

本文链接：https://blog.csdn.net/ZiYueD/article/details/117125796

版权

算法题专栏收录该内容

3 篇文章 0 订阅

订阅专栏

每日刷题LeetCode——692. 前K个高频单词

给一非空的单词列表，返回前 k 个出现次数最多的单词。
返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率，按字母顺序排序。

示例 1：

输入: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
输出: ["i", "love"]
解析: "i" 和 "love" 为出现次数最多的两个单词，均为2次。
    注意，按字母顺序 "i" 在 "love" 之前。

示例 2：

输入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
输出: ["the", "is", "sunny", "day"]
解析: "the", "is", "sunny" 和 "day" 是出现次数最多的四个单词，
    出现次数依次为 4, 3, 2 和 1 次。

注意：

假定 k 总为有效值， 1 ≤ k ≤ 集合元素数。
输入的单词均由小写字母组成。

第一次尝试

package com.algorithm;

import com.sun.corba.se.impl.oa.poa.ActiveObjectMap;

import java.sql.SQLOutput;
import java.util.*;
        /* 输入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
        输出: ["the", "is", "sunny", "day"]
        解析: "the", "is", "sunny" 和 "day" 是出现次数最多的四个单词，
        出现次数依次为 4, 3, 2 和 1 次。*/

public class day01 {

    public static List<String> topKFrequent(String[] words, int k) {
        Map<String,Integer> map = new TreeMap<String,Integer>();
        for (String w:words) {//循环数组
            if (map.containsKey(w)){//存在
                map.put(w,map.get(w)+1);
            }else {//新增
                map.put(w,1);
            }
        }
        //list放入map 排序value
        List<Map.Entry<String,Integer>> list = new ArrayList<Map.Entry<String, Integer>>(map.entrySet());
        Collections.sort(list, new Comparator<Map.Entry<String,Integer>>() {
            @Override
            public int compare(Map.Entry<String,Integer> o1, Map.Entry<String,Integer> o2) {

                return (o2.getValue()).compareTo(o1.getValue());
            }
        });

        List<String> a = new LinkedList<String>();
        //前K个存入list
        for (int i = 0; i < k; i++) {
            a.add(list.get(i).getKey());
        }
        return  a;
    }

    public static void main(String[] args){
        String[] word = new String[]{"the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"};
        System.out.println(topKFrequent(word,4).toString());
    }
}

第一次运行结果（太LOW了）
在这里插入图片描述
第二次尝试
思路2

package com.algorithm;

import java.util.*;
        /* 输入: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
        输出: ["the", "is", "sunny", "day"]
        解析: "the", "is", "sunny" 和 "day" 是出现次数最多的四个单词，
        出现次数依次为 4, 3, 2 和 1 次。*/

public class day01_2 {
    public static List<String> topKFrequent(String[] words, int k) {
        Map<String,Integer> map = new TreeMap<String,Integer>();
        for (String w:words) {//循环数组
            map.put(w,map.getOrDefault(w,0)+1);
        }
        //list放入map 排序value
        List<String> list = new ArrayList<String>();
        for (Map.Entry<String,Integer> entry:map.entrySet()) {
            list.add(entry.getKey());
        }
        //通过map的key获取value比较
        Collections.sort(list, new Comparator<String>() {
            @Override
            public int compare(String o1, String o2) {

                return (map.get(o2)).compareTo(map.get(o1));
            }
        });

        return  list.subList(0,k);
    }

    public static void main(String[] args){
        String[] word = new String[]{"the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"};
        System.out.println(topKFrequent(word,4).toString());
    }
}

在这里插入图片描述
竟然更慢了！

官方解法

方法一：哈希表 + 排序

思路及算法

我们可以预处理出每一个单词出现的频率，然后依据每个单词出现的频率降序排序，最后返回前
k 个字符串即可。

具体地，我们利用哈希表记录每一个字符串出现的频率，然后将哈希表中所有字符串进行排序，排序时，如果两个字符串出现频率相同，那么我们让两字符串中字典序较小的排在前面，否则我们让出现频率较高的排在前面。最后我们只需要保留序列中的前
k 个字符串即可。

class Solution {
    public List<String> topKFrequent(String[] words, int k) {
        Map<String, Integer> cnt = new HashMap<String, Integer>();
        for (String word : words) {
            cnt.put(word, cnt.getOrDefault(word, 0) + 1);
        }
        List<String> rec = new ArrayList<String>();
        for (Map.Entry<String, Integer> entry : cnt.entrySet()) {
            rec.add(entry.getKey());
        }
        Collections.sort(rec, new Comparator<String>() {
            public int compare(String word1, String word2) {
                return cnt.get(word1) == cnt.get(word2) ? word1.compareTo(word2) : cnt.get(word2) - cnt.get(word1);
            }
        });
        return rec.subList(0, k);
    }
}

复杂度分析

时间复杂度：O(l×n+l×mlogm)，其中 n 表示给定字符串序列的长度，l 表示字符串的平均长度，m 表示实际字符串种类数。我们需要 l×n的时间将字符串插入到哈希表中，以及 l×mlog⁡m 的时间完成字符串比较（最坏情况下所有字符串出现频率都相同，我们需要将它们两两比较）。
空间复杂度：O(l×m)，其中 l 表示字符串的平均长度，m 表示实际字符串种类数。哈希表和生成的排序数组空间占用均为 O(l×m)。
方法二：优先队列
思路及算法

对于前 k 大或前 k 小这类问题，有一个通用的解法：优先队列。优先队列可以在 O(log⁡n) 的时间内完成插入或删除元素的操作（其中 n 为优先队列的大小），并可以 O(1) 地查询优先队列顶端元素。
在本题中，我们可以创建一个小根优先队列（顾名思义，就是优先队列顶端元素是最小元素的优先队列）。我们将每一个字符串插入到优先队列中，如果优先队列的大小超过了k，那么我们就将优先队列顶端元素弹出。这样最终优先队列中剩下的 k 个元素就是前 k 个出现次数最多的单词。

class Solution {
    public List<String> topKFrequent(String[] words, int k) {
        Map<String, Integer> cnt = new HashMap<String, Integer>();
        for (String word : words) {
            cnt.put(word, cnt.getOrDefault(word, 0) + 1);
        }
        PriorityQueue<Map.Entry<String, Integer>> pq = new PriorityQueue<Map.Entry<String, Integer>>(new Comparator<Map.Entry<String, Integer>>() {
            public int compare(Map.Entry<String, Integer> entry1, Map.Entry<String, Integer> entry2) {
                return entry1.getValue() == entry2.getValue() ? entry2.getKey().compareTo(entry1.getKey()) : entry1.getValue() - entry2.getValue();
            }
        });
        for (Map.Entry<String, Integer> entry : cnt.entrySet()) {
            pq.offer(entry);
            if (pq.size() > k) {
                pq.poll();
            }
        }
        List<String> ret = new ArrayList<String>();
        while (!pq.isEmpty()) {
            ret.add(pq.poll().getKey());
        }
        Collections.reverse(ret);
        return ret;
    }
}

复杂度分析
时间复杂度：
O(l×n+m×llogk)，其中
n 表示给定字符串序列的长度，
m 表示实际字符串种类数，
l 表示字符串的平均长度。我们需要
l×n 的时间将字符串插入到哈希表中，以及每次插入元素到优先队列中都需要 llogk 的时间，共需要插入 m 次。
空间复杂度：
O(l×(m+k))，其中
l 表示字符串的平均长度，
m 表示实际字符串种类数。哈希表空间占用为
O(l×m)，优先队列空间占用为 O(l×k)。

子阅哥哥

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
算法题——前K个高频单词

每日刷题LeetCode——692. 前K个高频单词给一非空的单词列表，返回前 k 个出现次数最多的单词。返回的答案应该按单词出现频率由高到低排序。如果不同的单词有相同出现频率，按字母顺序排序。示例 1：输入: ["i", "love", "leetcode", "i", "love", "coding"], k = 2输出: ["i", "love"]解析: "i" 和 "love" 为出现次数最多的两个单词，均为2次。注意，按字母顺序 "i" 在 "love" 之前。示例 2：
复制链接

扫一扫