LeetCode 295. Find Median from Data Stream

题目

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

For example,

[2,3,4], the median is 3

[2,3], the median is (2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

  • void addNum(int num) - Add a integer number from the data stream to the data structure.
  • double findMedian() - Return the median of all elements so far.

 

Example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3) 
findMedian() -> 2

 

Follow up:

  1. If all integer numbers from the stream are between 0 and 100, how would you optimize it?
  2. If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

这道题非常经典,之前在国内准备面试的时候就有见到过这道题,结果到现在才做它:)题目要求求一个数据流的中位数。如果遇到和大小有关的问题首先能想到的就是排序!

排序的话时间复杂度是O(nlogn),写起来是很简单了,但是有一个很坑的地方就在于,如果最开始声明的数据流是int类型的,最后取两个数的平均的时候如果直接除2的话得到的还是int!所以需要*0.5,这样就会转换为double了!代码如下,然鹅超时了:

class MedianFinder {
vector<int> nums;
public:
    /** initialize your data structure here. */
    MedianFinder() {
    }
    
    void addNum(int num) {
        nums.push_back(num);
    }
    
    double findMedian() {
        sort(nums.begin(), nums.end());
        int n = nums.size();
        return n % 2 == 0 ? (nums[n / 2 - 1] + nums[n / 2]) * 0.5 : nums[n / 2];
    }
};

/**
 * Your MedianFinder object will be instantiated and called as such:
 * MedianFinder* obj = new MedianFinder();
 * obj->addNum(num);
 * double param_2 = obj->findMedian();
 */

另一种做法是采用插入排序的思想,在插入元素的时候就插入到这个元素该插入的位置,大概是O(n)的插入时间复杂度,这样就可以在取median的时候直接O(1)取数了。代码实现起来还挺简单的,就是自己忘了vector还有insert这种操作,自己写了个“找到元素该在的位置并把后面的元素都往后挪一位”的代码,搞半天还好多bug,最后看了解答发现可以直接用insert。这里顺便记录一下insert的用法:

v.insert(v.begin(),8);//在最前面插入新元素。  
v.insert(v.begin()+2,1);//在迭代器中第二个元素前插入新元素  
v.insert(v.end(),3);//在向量末尾追加新元素。  
v.insert(v.end(),4,1);//在尾部插入4个1

(reference:https://blog.csdn.net/u010002184/article/details/77676638

如果对于这道题,就是:nums.insert(lower_bound(nums.begin(), nums.end(), num), num)

最开始写了个很不优雅的版本,时间652 ms,5.08% ;空间42.4 MB,100.00% ,放在注释里以供观赏。后来重新思考了一下逻辑发现可以优化得更简洁一些,简洁的版本居然更加糟糕,时间700ms,5.08%,空间42.5,95.65%:

class MedianFinder {
vector<int> nums;
public:
    /** initialize your data structure here. */
    MedianFinder() {
    }
    
    void addNum(int num) {
        nums.push_back(num);
        if (nums.size()) {
            for (int i = 0; i < nums.size(); i++) {
                if (nums[i] > num) {
                    for (int j = nums.size() - 1; j > i; j--) {
                        nums[j] = nums[j - 1];
                    }
                    nums[i] = num;
                    break;
                }
            }
        }
        /*
        if (nums.size() == 0) {
            nums.push_back(num);
        }
        else {
            bool flag = true;
            for (int i = 0; i < nums.size(); i++) {
                if (nums[i] > num) {
                    nums.push_back(1);
                    for (int j = nums.size() - 1; j > i; j--) {
                        nums[j] = nums[j - 1];
                    }
                    nums[i] = num;
                    flag = false;
                    break;
                }
            }
            if (flag)
                nums.push_back(num);
        }
        */
    }
    
    double findMedian() {
        int n = nums.size();
        return n % 2 == 0 ? (nums[n / 2 - 1] + nums[n / 2]) * 0.5 : nums[n / 2];
    }
};

/**
 * Your MedianFinder object will be instantiated and called as such:
 * MedianFinder* obj = new MedianFinder();
 * obj->addNum(num);
 * double param_2 = obj->findMedian();
 */

最后的第三种方法才是真正最好的方法,非常巧妙地运用了max heap和min heap。由于要找中位数,考虑这些数是从小到大排好序的话,我们可以把所有的数分成两个heap,小半边的数用max heap保存,称为small_heap那么heap top就是小半边的最大数,大半边的数用min heap保存,称为big_heap,那么heap top就是大半边的最小数。只要我们保证small_heap里元素的个数和big_heap相等或者只多1,那么我们就可以确定,当所有数的总数为奇数时,返回small_heap的heap top,当所有数的总数为偶数时,返回两个堆heap top的平均数。

这样想是很简单,但是写起代码来还是发现最开始有点bug,本来想着是直接先无脑往small里加,当small的size比big大2的时候就把small里最大的pop到big里。但是这样做会引起一个问题:如果在small中加入一个比big的top还要大的数,并且此时small的size正好只比big大1,那这个数就会被加入到small中,而实际上它应该在big中。为了防止这种情况,就重新写了一下往里放数字的部分。后来看到一种更简洁的方法,写在注释中了,大意是先加入small中,再把small里的top给pop到big中,这时候,small中的元素个数和加数之前一样,但是big的元素个数比之前多了一个。由于我们的规定是small和big的大小相等或者small比big多一个,那么这时就可能出现big的元素比small多一个的情况(对应规定中的第一种情况),就需要重新调整两个heap的大小,把big中的top pop到small中,就回到了small的大小比big多1的情况。时间复杂度由于是采用heap,插入的复杂度是O(logn),查找也还是O(1)。运行时间160ms,70.12%,空间42.5M,82.61%。如果用优化后的方法,时间172ms,40.73%,空间42.3M,100%。其实感觉应该时间差不了这么多……不是很懂。

class MedianFinder {
priority_queue<int, vector<int>, greater<int>> big_heap;  // min heap to store big numbers
priority_queue<int> small_heap; // max heap to store small numbers
public:
    /** initialize your data structure here. */
    MedianFinder() {
    }
    
    void addNum(int num) {
        // if the num is larger than the smallest one in big_heap
        // push it to big_heap and pop the top of big_heap to small_heap
        if (!big_heap.empty() && num > big_heap.top()) {
            int big_top = big_heap.top();
            big_heap.pop();
            small_heap.push(big_top);
            big_heap.push(num);
        }
        // else push to small_heap
        else {
            small_heap.push(num);
        }
        // adjust the size to meet the requirement
        if (big_heap.size() == small_heap.size() - 2) {
            int small_top = small_heap.top();
            small_heap.pop();
            big_heap.push(small_top);
        }
    }
    
    double findMedian() {
        int n = big_heap.size() + small_heap.size();
        return n % 2 == 0 ? (big_heap.top() + small_heap.top()) * 0.5 : small_heap.top();
    }
};

/**
 * Your MedianFinder object will be instantiated and called as such:
 * MedianFinder* obj = new MedianFinder();
 * obj->addNum(num);
 * double param_2 = obj->findMedian();
 */

还有另外一种做法,采用的是AVL树,因为它是平衡的,所以中位数一定是根节点或者根节点和它的一个子节点的平均数。AVL树在C++中的实现是multiset,之前只是听说过但没怎么用过,这次终于派上了用场。它采用两个指针记录一个lo_m和一个hi_m,因为可能出现偶数个数的情况,每次插入数字的时候就对这个指针进行移动。因为不太熟悉multiset,并且感觉这道题主流的做法还是用最大最小堆,所以就暂时先不自己写代码了,把solutions里的代码粘贴过来:

class MedianFinder {
    multiset<int> data;
    multiset<int>::iterator lo_median, hi_median;

public:
    MedianFinder()
        : lo_median(data.end())
        , hi_median(data.end())
    {
    }

    void addNum(int num)
    {
        const size_t n = data.size();   // store previous size

        data.insert(num);               // insert into multiset

        if (!n) {
            // no elements before, one element now
            lo_median = hi_median = data.begin();
        }
        else if (n & 1) {
            // odd size before (i.e. lo == hi), even size now (i.e. hi = lo + 1)

            if (num < *lo_median)       // num < lo
                lo_median--;
            else                        // num >= hi
                hi_median++;            // insertion at end of equal range
        }
        else {
            // even size before (i.e. hi = lo + 1), odd size now (i.e. lo == hi)

            if (num > *lo_median && num < *hi_median) {
                lo_median++;                    // num in between lo and hi
                hi_median--;
            }
            else if (num >= *hi_median)         // num inserted after hi
                lo_median++;
            else                                // num <= lo < hi
                lo_median = --hi_median;        // insertion at end of equal range spoils lo
        }
    }

    double findMedian()
    {
        return (*lo_median + *hi_median) * 0.5;
    }
};

另外solutions底下还提了一些follow up,大概扫了一眼,但是没怎么太看懂,战略性放弃了。

Solutions链接:https://leetcode.com/problems/find-median-from-data-stream/solution/


2020.5.7来更新java版本,在上周写了683最后一个lab以后。写683 lab的时候还是差点没做出来,哎。

先写一下插入O(logn),查找O(1)的排序数组的方法,在写这个二分插入的时候顺便把二分又复习了一下,见:https://blog.csdn.net/qq_37333947/article/details/86662855

结果居然submit以后超时了?之前的C++版本都没超时啊orz 代码如下:

class MedianFinder {
    private List<Integer> data;

    /** initialize your data structure here. */
    public MedianFinder() {
        data = new LinkedList<>();
    }
    
    public void addNum(int num) {
        int index = findIndex(num);
        data.add(index, num);
    }
    
    public double findMedian() {
        if (data.size() % 2 == 0) {
            int index1 = data.size() / 2 - 1;
            int index2 = data.size() / 2;
            return 0.5 * (data.get(index1) + data.get(index2));
        } else {
            return 1.0 * data.get(data.size() / 2);
        }
    }
    
    private int findIndex(int num) {
        int low = 0;
        int high = data.size() - 1;
        while (low <= high) {
            int mid = low + (high - low) / 2;
            if (data.get(mid) >= num) {
                high = mid - 1;
            } else {
                low = mid + 1;
            }
        }
        return low;
    }
}

/**
 * Your MedianFinder object will be instantiated and called as such:
 * MedianFinder obj = new MedianFinder();
 * obj.addNum(num);
 * double param_2 = obj.findMedian();
 */

下面是heap版本的,这次自己写的思路和上次看solutions的不一样,这次是直接判断要加入的数应该加到哪个heap,然后再看要不要调整大小,上次是直接无脑先加入small再把small的max push到high然后再看要不要调整大小。感觉理解起来还是这次写的方便,但是代码的简洁程度上来说还是上次的简洁:

class MedianFinder {
    private PriorityQueue<Integer> small;
    private PriorityQueue<Integer> large;

    /** initialize your data structure here. */
    public MedianFinder() {
        small = new PriorityQueue<>(Collections.reverseOrder());
        large = new PriorityQueue<>();
    }
    
    public void addNum(int num) {
        if (small.size() == 0) {
            small.add(num);
            return;
        }
        if (num < small.peek()) {
            small.add(num);
            if (small.size() - large.size() > 1) {
                large.add(small.poll());
            }
        } else {
            large.add(num);
            if (large.size() > small.size()) {
                small.add(large.poll());
            }
        }
    }
    
    public double findMedian() {
        if (small.size() == large.size()) {
            return 0.5 * (small.peek() + large.peek());
        } else {
            return small.peek();
        }
    }
}

/**
 * Your MedianFinder object will be instantiated and called as such:
 * MedianFinder obj = new MedianFinder();
 * obj.addNum(num);
 * double param_2 = obj.findMedian();
 */

Runtime: 45 ms, faster than 76.80% of Java online submissions for Find Median from Data Stream.

Memory Usage: 50.8 MB, less than 100.00% of Java online submissions for Find Median from Data Stream.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值