// min-heap
10
/ \
20 100
/
30
堆
-
定义和基本性质
- 数组实现的二叉树; 因此它不是使用parent pointer/child pointer;
- 它是根据"堆属性"来排序的,决定在数组中各个元素的位置;
- 堆属性: max-heap, min-heap;
- 堆是一颗"完全二叉树"(a complete binary tree);
- 完全二叉树: 除了最后一层,其他层是都被填满的; 最后一层是尽可能在左边排列的;
-
数学性质
设i是arr索引(0 - n-1)- parent(i): arr[floor((i-1)/2)], 注意C/C++的整形的truncate;
- left child node: arr[(2 * i) + 1]
- right child node: arr[(2 * i) + 2]
- heap的高度, h(n) / levels:
- leve = h + 1;
- h的数学含义很清晰, logN, 二叉树就是2为底的值, 指数就是层数,即是路径,
h = floor(log2(n))
;
e.g.
图中的节点叶子层7-14的都是2^3 - 2^4之间的, 因此需要用floor();
- 判断任意索引i的所属的层数: floor(log2(i + 1)), i: 0 - n-1
- 给定h,如果叶子节点是满的:
- 叶子层的个数是2^h, e.g. h = 3, 有2^3个叶子
- 上层节点个数是: 2^h - 1;
- heap的整个节点数: 2^(h+1) - 1 <==> 2^h + 2^h -1
- leaf nodes的节点索引是floor(n/2) - n-1; parent的节点是0 - floor(n/2) - 1;
-
应用
- 构建优先队列; 优先队列的最常用实现是heap; 优先队列的应用有较多, e.g. A*, Dijstra, etc.
- 堆排序(Heap sort)
- 快速查找最大/最小值
-
操作:
- O(logN): shiftup, shiftdown, insert, remove, removeAtIndex, replace
- other:
- peek(): O(1)
- search(value): O(n)
- buildHeap(array): O(NlogN) vs O(n) - insert / Floyd’s algorithm
- heap sort: O(nlogn)
-
Searching the heap(堆搜索)
- O(n)
- 优化: 需要判断, 当测试完某一层的节点,发现搜索的value实际上都比这层节点小(min-heap)或大(max-heap), 则说明当前层的children也都不可能符合条件(e.g. in min-heap, children are even bigger), 则可以中断搜索;
- 这个优化需要判断每一层right-most-node的索引, 详见代码注释:
h = floor(log2(i)) // 任意节点所属层数;
right_most_idx = sum(2^h) - 1 ; // 2^0 - 1 + 2^1 + 2^2....
- 尽管是存在以上优化, 其时间复杂度仍然是O(n);
- 如果通过空间换时间, 增加对节点值的映射的字典可以使查询时减小到O(1); 因此, 这样的时间复杂度能够支持"优先队列"的应用;
e.g. 在buildHeap的过程中, 利用hashmap保存各个值的索引,这样search就变成了O(1)的;但要注意, 这样的堆导致在shifting操作, remove操作, replace操作时都需要调整其value-index的键值对; 挖个坑, 下篇写个查找操作是O(1)的堆实现;
优先队列
等价于堆;
-
操作
- Enqueue, Dequeue, Find Miniumum, Find Maximum, Change Priority
-
实现方法:
- sorted array(排序数组): 排序的数组是heap, 但反之未必是;
- binary search tree(二叉查找树)
- heap(堆): 常用;
-
优先队列的应用场景
- Event queue: 对event的timestamp实施的优先队列, 排列timestamp的大小
- Dijkstra algorithm: 图搜索的算法优先队列
- Huffman coding: build up compression tree;
- A* pathfinding: 启发式寻路算法;
实现
包括3部分:heap.h, copyable.h, main.cpp;
- heap.h
#include <iostream>
#include <cmath>
#include <stdexcept>
#include "copyable.h"
///
template <typename T>
class MinHeap {
T* data_;
int capacity_;
int heap_size_;
void swap_(int i, int j);
template <typename G>
friend std::ostream& operator<<(std::ostream&, const MinHeap<G>&);
public:
MinHeap<T>(int capacity);
int parent(int i) {
return std::floor(float(i - 1)/2);
}
int leftChild(int i) {
return 2 * i + 1;
}
int rightChild(int i) {
return 2 * i + 2;
}
T& peek() {
return data_[0];
}
int shiftUp(int i);
int shiftDown(int i);
void remove();
void removeAtIndex(int i);
void insert(T node);
int search(T value);
void buildHeap(T*, int);
void floydBuilidHeap(T*, int);
};
template <typename T>
MinHeap<T>::MinHeap(int cap):
data_(new T[cap]),
capacity_(cap),
heap_size_(0) {}
template <typename T>
void MinHeap<T>::swap_(int i, int j) {
T temp = data_[i];
data_[i] = data_[j];
data_[j] = temp;
}
template<typename T>
int MinHeap<T>::shiftUp(int i) {
int j = parent(i);
while (j >= 0 && data_[j] > data_[i]) {
swap_(i, j);
i = j;
j = parent(i);
}
return i;
}
template <typename T>
int MinHeap<T>::shiftDown(int j) {
int k = leftChild(j);
int h = rightChild(j);
while ( k <= heap_size_ - 1 || h <= heap_size_ - 1) {
T left_val, right_val;
int idx;
bool hasLeft = k <= heap_size_ - 1;
bool hasRight = h <= heap_size_ - 1;
if (hasLeft) {
left_val = data_[k];
}
if (hasRight) {
right_val = data_[h];
}
if (hasRight && hasLeft) {
idx = left_val < right_val ? k : h;
} else {
idx = k;
}
if (data_[j] > data_[idx]) {
swap_(j, idx);
j = idx;
k = leftChild(j);
h = rightChild(j);
} else
break;
}
return j;
}
template <typename T>
void MinHeap<T>::remove() {
swap_(0, heap_size_ -1);
// data_[heap_size_ - 1]
--heap_size_;
shiftDown(0);
}
template <typename T>
void MinHeap<T>::removeAtIndex(int i) {
swap_(i, heap_size_ - 1);
--heap_size_;
int j = parent(i);
if (j >= 0 && data_[i] < data_[j]) {
shiftUp(i);
return;
}
if (j <= heap_size_ - 1 && (data_[i] > data_[leftChild(i)] || data_[i] > data_[rightChild(i)])) {
shiftDown(i);
return;
}
}
template <typename T>
int MinHeap<T>::search(T value) {
int right_most_idx = 0;
for (int i = 0; i < heap_size_; ++i) {
if (value == data_[i]) {
return i;
}
// 优化:
// 当在测试了某一层节点完毕, 得到value小于这一层的
// 所有元素, 由于是最小堆, 则可以判定value小于其children; 所以就不必再查找了;
//
// 每一层的right-most-node的索引是:
// h: 层数的索引(0, 1, ...)
// sum(2^h) - 1, h =>(0, level-1)
//
int h = std::floor(std::log2(i));
if (i > 0) {
right_most_idx += std::pow(2, i);
}
if (i <= std::floor(heap_size_/2) - 1 && i == right_most_idx && value < data_[i]) {
return -1;
}
}
return -1;
}
template <typename T>
void MinHeap<T>::insert(T node) {
if (heap_size_ + 1 >= capacity_) {
// or we enlarge the size;
throw std::range_error("reach the maxium capacity");
return;
}
data_[heap_size_++] = node;
shiftUp(heap_size_ - 1);
}
// inserting - O(NlogN)
template <typename T>
void MinHeap<T>::buildHeap(T* source, int n) {
if (n >= capacity_) {
throw std::range_error("source array size is bigger than capacity");
}
for (int i = 0; i < n; ++i) {
insert(*source++);
}
}
// floyd buid heap algorith - O(N)
template<typename T>
void MinHeap<T>::floydBuilidHeap(T* source, int n) {
if (n >= capacity_) {
throw std::range_error("source array size is bigger than capacity");
}
for (int j = 0; j < n; ++j) {
data_[j] = *source++;
}
heap_size_ = n;
for (int i = std::floor(n / 2) - 1; i >= 0; --i) {
shiftDown(i);
}
}
template <typename G>
std::ostream& operator<<(std::ostream& os, const MinHeap<G>& min_heap) {
os << "min_heap's heap_size=" << min_heap.heap_size_ << std::endl;
os << "min_heap's capacity=" << min_heap.capacity_ << std::endl;
for (int j=0; j < min_heap.heap_size_; ++j) {
if (j != min_heap.heap_size_ - 1)
os << min_heap.data_[j] << ", ";
else
os << min_heap.data_[j] << std::endl;
}
return os;
}
- copyable.h
#ifndef COPYABLE_H_
#define COPYABLE_H_
#include <iostream>
class Copyable {
public:
int value;
Copyable() = default;
Copyable(int val): value(val) {}
Copyable(const Copyable& c) {
value = c.value;
}
Copyable& operator=(const Copyable& c) {
value = c.value;
return *this;
}
std::ostream& operator<<(std::ostream& os) const {
os << this->value;
return os;
}
bool operator<(const Copyable& c) {
return value < c.value;
}
bool operator>(const Copyable& c) {
return !(this->operator<(c));
}
bool operator==(const Copyable& c) {
return value == c.value;
}
};
std::ostream& operator<<(std::ostream& os, const Copyable& cc) {
return cc.operator<<(os);
}
#endif // COPYABLE_H_
- main.cpp
int main() {
int source[] = {10, 5, 9, 7, 8, 2, 1};
Copyable cc[] = {5, 8, 10, 6, 30, 3};
MinHeap<int> heap0(100);
MinHeap<int> heap1(50);
MinHeap<Copyable> heap2(100);
heap0.buildHeap(source, (sizeof(source) / sizeof(source[0])));
std::cout << "heap0=" << heap0;
heap1.floydBuilidHeap(source, sizeof(source) / sizeof(source[0]));
std::cout << "heap1=" << heap1;
heap1.remove();
heap1.insert(12);
heap1.insert(15);
heap1.insert(20);
heap1.removeAtIndex(3);
std::cout << "heap1=" << heap1;
heap2.floydBuilidHeap(cc, sizeof(cc)/sizeof(cc[0]));
heap2.insert(12);
heap2.insert(15);
heap2.insert(20);
std::cout << "heap2=" << heap2;
}
Ref
- heap: https://github.com/raywenderlich/swift-algorithm-club/tree/master/Heap
- heap: https://en.wikipedia.org/wiki/Heap_%28data_structure%29
- 优先队列: https://github.com/raywenderlich/swift-algorithm-club/tree/master/Priority%20Queue
- heap- GeeksforGeeks:https://www.geeksforgeeks.org/heap-data-structure/
- floyd算法的详细解释