Fastest Table Sort in the West - Redesigning DuckDB’s Sort - Laurens Kuiper
瓶颈在:
- 内存的随机访问
- 分支预测
规避办法:
- 行式比较
- Key Normalization
实现关键点:
- Key Normalization
- Row Compare Vs Column Compare
- 算法的选择:
- RadixSort
- QuickSort
引申
paralleled using merge path
pointer swizzling: https://en.wikipedia.org/wiki/Pointer_swizzling
Push-Based Execution in DuckDB - Mark Raasveldt
Pull-Based pipeline模型的弊端:
- Load imbalance问题
- Plan explosion
- Added materialization costs
Morsel-Driven Parallelism
- operators为parallelism-aware,是并发可控的;
- query可以被切分成pipeline
- pipeline是可以并行执行的
Push-Based pipeline模型改造:
- 怎么更好地实现Union
- Right/Full outer join
Future Work
- Hybrid Async IO
- Hybrid Early/Late Materialization