https://leetcode.com/problems/find-median-from-data-stream/
利用minHeap以及maxHeap。把一个ordered num list, 分成两半A1 A2,或者其中一半比另一半长度多1,那么A1可以维护一个maxHeap,顶部就是最大值,即最靠近中间的值;A2可以维护一个minHeap,顶部就是最小值,也就是最靠近中间的值。
参考http://bookshadow.com/weblog/2015/10/19/leetcode-find-median-data-stream/
维护大顶堆(MaxHeap) + 小顶堆(MinHeap)
需要满足下面的约束条件:
大顶堆中存储的元素 均不大于 小顶堆中的元素
MaxHeap.size() == MinHeap.size(),或者 MaxHeap.size() == MinHeap.size() + 1
则有:
当MaxHeap.size() == MinHeap.size() + 1时,中位数就是MaxHeap的堆顶元素
当MaxHeap.size() == MinHeap.size()时,中位数就是MaxHeap堆顶元素与MinHeap堆顶元素的均值
使用Python的内置堆算法heapq可以很容易地实现小顶堆,而大顶堆可以通过对元素的值 * -1实现。
还可以用下面的精简的code.使用heapq的heappushpop操作可以合并执行heappush与heappop操作,进而使代码简化:这个code更好理解
这里这个code不太一样,是minHeap要比maxHeap多1。如果两个heap大小相等,那么也要先加到预先设定短的那个,然后把top pop出来给长的那个heap。
import heapq
def __init__(self):
self.small = [] # the smaller half of the list, min-heap with invert values
self.large = [] # the larger half of the list, min heap
def addNum(self, num):
#这里的这个判断条件可以在len(self.small) == len(self.large),先加到minHeap,然后把minHeapTop push到maxHeap,因为我们要求maxHeap比minHeap长1,然后else就是说可以在maxHeap比minHeap长1的时候,又恢复len(minHeap) == len(maxHeap)
if len(self.small) == len(self.large):
heapq.heappush(self.large, -heapq.heappushpop(self.small, -num))
else:
heapq.heappush(self.small, -heapq.heappushpop(self.large, num))
def findMedian(self):
if len(self.small) == len(self.large):
return float(self.large[0] - self.small[0]) / 2.0
else:
return float(self.large[0])
下面这个code不太好理解
from heapq import *
class MedianFinder:
def __init__(self):
"""
Initialize your data structure here.
"""
self.minHeap = []
self.maxHeap = []
def addNum(self, num):
"""
Adds a num into the data structure.
:type num: int
:rtype: void
"""
heappush(self.maxHeap, -num)
minTop = self.minHeap[0] if len(self.minHeap) else None
maxTop = self.maxHeap[0] if len(self.maxHeap) else None
if minTop < -maxTop or len(self.minHeap) + 1 < len(self.maxHeap):#这里就是对maxHeap加了一个num之后,如果加了这个数之后,maxHeapTop如果大于minHeapTop,那么结构就破坏了,要把maxHeapTop加到minHeap里去。就相当于要排序。或者maxHeap的长度大于了minHeap长度加1,则也要把maxHeapTop加到minHeap里去
#总之这里是要控制,sorted以及len(maxHeap) <=len(minHeap) + 1
heappush(self.minHeap, -heappop(self.maxHeap))
if len(self.maxHeap) < len(self.minHeap):#这里要控制的是len(maxHeap)>=len(minHeap).例如input {1,2,3}, add 1 到maxHeap之后,就会被调整到minHeap,之后add 2 到maxHeap之后,也会被调整到minHeap。这个时候就要在这个判断条件下进行调整了,否则全部元素都到minHeap去了,所以之类加入的元素都是先进maxheap,再进minheap,再可能的话回到maxheap
heappush(self.maxHeap, -heappop(self.minHeap))
def findMedian(self):
"""
Returns the median of current data stream
:rtype: float
"""
if len(self.minHeap) < len(self.maxHeap):
return -1.0 * self.maxHeap[0]
else:
return (self.minHeap[0] - self.maxHeap[0]) / 2.0