number of passes:
1
+
⌈
l
o
g
2
(
N
/
M
)
⌉
1+ \lceil log_2 (N/M) \rceil
1+⌈log2(N/M)⌉
seek time: O(number of passes)
a k-way merge
number of passes:
1
+
⌈
l
o
g
k
(
N
/
M
)
⌉
1+ \lceil log_k (N/M) \rceil
1+⌈logk(N/M)⌉
require 2k tapes
polyphase merge
require k+1 tapes
Huffman tree
Total merge time = O ( the weighted external path length )
If the number of runs is a Fibonacci number
F
N
F_N
FN, then the best way to distribute them is to split them into
F
N
–
1
F_{N–1}
FN–1 and
F
N
–
2
F_{N–2}
FN–2 .
For a k-way merge,
F
N
(
k
)
=
F
N
−
1
(
k
)
+
F
N
−
2
(
k
)
F_N^{(k)} = F_{N-1}^{(k)}+F_{N-2}^{(k)}
FN(k)=FN−1(k)+FN−2(k), where
F
N
(
k
)
=
0
  
(
0
≤
N
≤
k
−
2
)
,
F
k
−
1
(
k
)
=
1
F_N^{(k)}=0 \; (0 \leq N \leq k-2), F_{k-1}^{(k)}=1
FN(k)=0(0≤N≤k−2),Fk−1(k)=1
Exercises
hardware
外部排序主要开销在I/O上
⌈ 1 + l o g 2 ( 100 , 000 , 000 × 256 ÷ 128 ÷ 1 0 6 ) ⌉ = 9 \lceil 1+log_2(100,000,000 \times 256 \div 128 \div 10^6) \rceil = 9 ⌈1+log2(100,000,000×256÷128÷106)⌉=9
Huffman tree,每次挑最短的两条链合并