Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.
For example,
[2,3,4]
, the median is 3
[2,3]
, the median is (2 + 3) / 2 = 2.5
Design a data structure that supports the following two operations:
- void addNum(int num) - Add a integer number from the data stream to the data structure.
- double findMedian() - Return the median of all elements so far.
Example:
addNum(1) addNum(2) findMedian() -> 1.5 addNum(3) findMedian() -> 2
Follow up:
- If all integer numbers from the stream are between 0 and 100, how would you optimize it?
- If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?
--------------------------------------------------------------------------
用堆很容易想到,主要是两个堆顶和新加的数有6种情况,应该放在哪个堆又有两种情况,写起来一堆if else
所以先找个堆放进去,然后再决定要不要调整,codes就简洁很多:
import heapq
class MedianFinder:
def __init__(self):
"""
initialize your data structure here.
"""
self.mi_heap = []
self.ma_heap = []
def addNum(self, num: int) -> None:
heapq.heappush(self.ma_heap,-num)
top = -heapq.heappop(self.ma_heap)
heapq.heappush(self.mi_heap, top)
if (len(self.mi_heap) > len(self.ma_heap)):
mi_top = heapq.heappop(self.mi_heap)
heapq.heappush(self.ma_heap,-mi_top)
def findMedian(self) -> float:
l1, l2 = len(self.ma_heap),len(self.mi_heap)
if (l1 == 0):
return None
if (l1 == l2):
return (self.mi_heap[0]-self.ma_heap[0]) / 2
return -self.ma_heap[0]
# Your MedianFinder object will be instantiated and called as such:
# obj = MedianFinder()
# obj.addNum(num)
# param_2 = obj.findMedian()
Copy from discussion about extensions:
Followup #1 - If all integer numbers from the stream are between 0 and 100, how would you optimize it
- Create 100 buckets using an array of size 100.
- Store the numbers into these buckets.
- Find median by looping through this array.
- Time Complexity
- addNum() is O(1)
- findMedian() is O(1) since array has fixed size.
- Space Complexity: O(1) since array has fixed size.
- Sample problem: https://leetcode.com/problems/statistics-from-a-large-sample/description/
Followup #2 - If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?
- Divide problem into 3 subproblems. Here are the groupings:
- Numbers < 0: You have 2 options:
- Use 2-heap solution (that we coded in original solution), or
- Use 1 array, which represents 1 bucket
- 0 <= Numbers <= 100: Use 100 buckets using an array of size 100
- 100 < Numbers: You have 2 options:
- Use 2-heap solution (that we coded in original solution), or
- Use 1 array, which represents 1 bucket
- Numbers < 0: You have 2 options:
- For each number we get in the stream, insert it into 1 of the 3 groupings, keeping track of the count of numbers in each of these 3 groupings
- To find the median, see which grouping the median must fall into and find it there.
For Numbers < 0 and 100 < Numbers, using 2 arrays/buckets is the more practical solution since it is very unlikely the median will fall into either bucket/array. This makes findMedian() O(1) in average case. In the worst case, all numbers fall in 1 array, and we would either have to use Quickselect (O(n) average case, O(n2) worst case), or sorting (O(n log n)) to find the median.
If you use 2 heaps instead, you will get findMedian() of O(1) average case, O(log n) worst case.