Leetcode347. Top K Frequent Elements

magic_jiayu

于 2020-02-24 16:02:30 发布

阅读量149

点赞数

分类专栏： leetcode

本文链接：https://blog.csdn.net/magic_jiayu/article/details/104477788

版权

leetcode 专栏收录该内容

66 篇文章 0 订阅

订阅专栏

Leetcode347. Top K Frequent Elements

Given a non-empty array of integers, return the k most frequent elements.

Example 1:

Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]

Example 2:

Input: nums = [1], k = 1
Output: [1]

Note:

You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
Your algorithm’s time complexity must be better than O(n log n), where n is the array’s size.

解法一最小堆

用哈希表来家里数字与出现次数的映射，遍历一遍数组统计元素的频率
维护一个元素数目为 k 的最小堆
每次将新的元素与堆顶元素（堆中频率最小的元素）做比较
如果新元素频率大于堆顶元素，则弹出堆顶元素，将新元素添加进堆
最后堆中k个元素就是前k个高频元素

时间复杂度：O(nlogk)
空间复杂度：O(n)

class Solution {
    public List<Integer> topKFrequent(int[] nums, int k) {
        // 使用字典，统计每个元素出现的次数，元素为键，元素出现的次数为值
        HashMap<Integer,Integer> map = new HashMap();
        for(int num : nums){
            if (map.containsKey(num)) {
               map.put(num, map.get(num) + 1);
             } else {
                map.put(num, 1);
             }
        }
        // 遍历map，用最小堆保存频率最大的k个元素
        PriorityQueue<Integer> pq = new PriorityQueue<>(new Comparator<Integer>() {
            @Override
            public int compare(Integer a, Integer b) {
                return map.get(a) - map.get(b);
            }
        });
        for (Integer key : map.keySet()) {
            if (pq.size() < k) {
                pq.add(key);
            } else if (map.get(key) > map.get(pq.peek())) {
                pq.remove();
                pq.add(key);
            }
        }
        // 取出最小堆中的元素
        List<Integer> res = new ArrayList<>();
        while (!pq.isEmpty()) {
            res.add(pq.remove());
        }
        return res;
    }
}

解法二桶排序法

使用哈希表统计频率，统计完成后，创建一个数组，将频率作为数组下标，对于出现频率不同的数字集合，存入对应的数组下标即可。

//基于桶排序求解「前 K 个高频元素」
class Solution {
    public List<Integer> topKFrequent(int[] nums, int k) {
        List<Integer> res = new ArrayList();
        // 使用字典，统计每个元素出现的次数，元素为键，元素出现的次数为值
        HashMap<Integer,Integer> map = new HashMap();
        for(int num : nums){
            if (map.containsKey(num)) {
               map.put(num, map.get(num) + 1);
             } else {
                map.put(num, 1);
             }
        }

        //桶排序
        //将频率作为数组下标，对于出现频率不同的数字集合，存入对应的数组下标
        List<Integer>[] list = new List[nums.length+1];
        for(int key : map.keySet()){
            // 获取出现的次数作为下标
            int i = map.get(key);
            if(list[i] == null){
               list[i] = new ArrayList();
            } 
            list[i].add(key);
        }

        // 倒序遍历数组获取出现顺序从大到小的排列
        for(int i = list.length - 1;i >= 0 && res.size() < k;i--){
            if(list[i] == null) continue;
            res.addAll(list[i]);
        }
        return res;
    }
}

时间复杂度：O(n)
空间复杂度：O(n)

Python实现堆，优先队列

堆处理海量数据的topK，分位数非常合适，优先队列应用在元素优先级排序。与基于比较的排序算法时间复杂度O(nlogn) 相比, 使用堆，优先队列复杂度可以下降到 O(nlogk),在总体数据规模 n 较大，而维护规模 k 较小时，时间复杂度优化明显。

堆，优先队列的本质其实就是个完全二叉树，有其下重要性质

父节点index为 (i-1) // 2
左子节点index为 2*i + 1
右子节点index为 2*i + 2
大顶堆中每个父节点大于子节点，小顶堆每个父节点小于子节点
优先队列以优先级为堆的排序依据

堆，优先队列有两个重要操作，时间复杂度均是 O(logk)。以大顶锥为例：

上浮sift up: 向堆新加入一个元素，堆规模+1，依次向上与父节点比较，如大于父节点就交换。
下沉sift down: 从堆取出一个元素（堆规模-1，用于堆排序）或者更新堆中一个元素（本题），逆序遍历数组index从 (k-1) // 2 到 index为 0，向下走保证父节点大于子节点。

对于topk 问题：最大堆求topk小，最小堆求topk大。

topk小：构建一个k个数的最大堆，当读取的数小于根节点时，替换根节点，重新塑造最大堆
topk大：构建一个k个数的最小堆，当读取的数大于根节点时，替换根节点，重新塑造最小堆

def topKFrequent(self, nums: List[int], k: int) -> List[int]:
    # hashmap 统计频率
    freq_count = {}
    for num in nums:
        if num in freq_count:
            freq_count[num] += 1
        else:
            freq_count[num] = 1

    def sift_up(arr, k):
        """ 时间复杂度 O(logk) k 为堆的规模"""
        new_index, new_val = k-1, arr[k-1]
        while (new_index > 0 and arr[(new_index-1)//2][1] > new_val[1]):
            arr[new_index] = arr[(new_index-1)//2]
            new_index = (new_index-1)//2
        arr[new_index] = new_val # 这里采用的是类似插入排序的赋值交换

    def sift_down(arr, root, k):
        """ O(logk). 右节点index 2*root+1，左节点 2*root+1, 父节点 (child-1)//2"""
        root_val = arr[root]
        while (2*root+1 < k):
            # 右节点 2*root+1，左节点 2*root+1, 父节点 (child-1)//2
            child = 2 * root + 1
            # 小顶锥 用 >，大顶锥 用 <
            if child+1 < k and arr[child][1] > arr[child+1][1]:
                child += 1
            if root_val[1] > arr[child][1]:
                arr[root] = arr[child]
                root = child # 继续向下检查
            else: break # 如果到这里没乱序，不用再检查后续子节点
        arr[root] = root_val

    # 注意构造规模为k的堆, 时间复杂度O(n)，因为堆的规模是从0开始增长的
    freq_list = list(freq_count.items())
    min_heap = []
    for i in range(k):
        min_heap.append(freq_list[i])
        sift_up(min_heap, i+1)

    # 遍历剩下元素，大于堆顶入堆，下沉维护小顶堆
    for item in freq_list[k:]:
        priority = item[1]
        if priority > min_heap[0][1]:
            min_heap[0] = item
            sift_down(min_heap, 0, k)

    return [item[0] for item in min_heap]

堆排序：

def heapSort(arr):
    def sift_down(arr, root, k):
        root_val = arr[root] # 用插入排序的赋值交换
        # 确保交换后，对后续子节点无影响
        while (2*root+1 < k):
            # 构造根节点与左右子节点
            child = 2 * root + 1  # left = 2 * i + 1, right = 2 * i + 2
            if child+1 < k and arr[child] < arr[child+1]: # 如果右子节点在范围内且大于左节点
                child += 1
            if root_val < arr[child]:
                arr[root] = arr[child]
                root = child
            else: break # 如果有序，后续子节点就不用再检查了
        arr[root] = root_val

    k = len(arr) # k 为heap的规模
    # 构造 maxheap. 从倒数第二层起，该元素下沉，构造大顶堆
    for i in range((k-1)//2, -1, -1):
        sift_down(arr, i, k)
    # 从尾部起，依次与顶点交换并再构造 maxheap，heap规模-1，依次把最大值放到尾部
    for i in range(k - 1, 0, -1):
        arr[i], arr[0] = arr[0], arr[i]  # 交换
        sift_down(arr, 0, i)