Hw3 Counting Sort & Hash
1 Counting Sort
1
// A[1..n]:0到k的n个数
// B[0..k]:记数数组
// 预处理
PRE-PROCESS(A, B, k)
for i = 0 to k
B[i] = 0
for j = 1 to A.length
B[A[j]] = B[A[j]] + 1
// B[i]现在指值为i的元素个数
for i = 1 to k
B[i] = B[i] + B[i-1]
// B[i]现在指值小于等于i的元素个数
// 得到在范围在(a,b)间数的个数
RANGE-CNT(B, a, b)
return B[b] - B[a-1]
预处理时间代价:
Θ
(
n
+
k
)
\Theta(n+k)
Θ(n+k)
得到结果:
O
(
1
)
\Omicron(1)
O(1)
2
使用基数排序。
先在线性时间内确定输入序列中最大的元素,以确定所有元素的最高位。再按最低有效位进行排序。
每一位的排序耗时
Θ
(
n
+
k
)
\Theta(n+k)
Θ(n+k),总共可以在
Θ
(
10
n
+
10
k
)
\Theta(10n+10k)
Θ(10n+10k) 的时间将数组排好序。
2 Hash table
因为 ∣ U ∣ > n m |U| > nm ∣U∣>nm,且散列表大小为 m m m,所以一定存在一个大小为 n n n 的子集,散列到到同一槽位。对于这个槽位的链表,使用链接法散列的查找,最坏情况需遍历该链表,此时的时间代价为 Θ ( n ) \Theta(n) Θ(n)。
3 Hash Function
记两个不同
n
n
n元组
A
=
⟨
a
0
,
a
1
,
.
.
.
,
a
n
−
1
⟩
A=⟨a_0, a_1, . . . , a_{n−1}⟩
A=⟨a0,a1,...,an−1⟩ 和
B
=
⟨
b
0
,
b
1
,
.
.
.
,
b
n
−
1
⟩
B=⟨b_0, b_1, . . . , b_{n−1}⟩
B=⟨b0,b1,...,bn−1⟩。
于是存在至少一个下标
k
k
k,使
a
k
≠
b
k
′
a_k \neq b'_k
ak=bk′,将第一个符合要求的下标记作
i
i
i。
有散列函数:
h
b
(
A
)
=
h
b
(
⟨
a
0
,
a
1
,
.
.
.
,
a
n
−
1
⟩
)
=
Σ
j
=
0
n
−
1
(
a
j
b
j
)
m
o
d
p
h_b(A) = h_b(⟨a_0, a_1, . . . , a_{n−1}⟩) = \Sigma_{j = 0}^{n-1}(a_j b^j) \mod p
hb(A)=hb(⟨a0,a1,...,an−1⟩)=Σj=0n−1(ajbj)modp
h
b
(
B
)
=
h
b
(
⟨
b
0
,
b
1
,
.
.
.
,
b
n
−
1
⟩
)
=
Σ
j
=
0
n
−
1
(
b
j
b
j
)
m
o
d
p
h_b(B) = h_b(⟨b_0, b_1, . . . , b_{n−1}⟩) = \Sigma_{j = 0}^{n-1}(b_j b^j) \mod p
hb(B)=hb(⟨b0,b1,...,bn−1⟩)=Σj=0n−1(bjbj)modp
将两者作差得:
h
b
(
A
)
−
h
b
(
B
)
=
Σ
j
=
0
n
−
1
(
(
a
j
−
b
j
)
b
j
)
m
o
d
p
=
Σ
j
=
0
i
−
1
(
(
a
j
−
b
j
)
b
j
)
+
Σ
j
=
i
n
−
1
(
(
a
j
−
b
j
)
b
j
)
m
o
d
p
=
(
a
i
−
b
i
)
b
i
+
Σ
j
=
i
+
1
n
−
1
(
(
a
j
−
b
j
)
b
j
)
m
o
d
p
≥
(
a
i
−
b
i
)
b
i
m
o
d
p
>
−
p
\begin {aligned}h_b(A) - h_b(B) &= \Sigma_{j = 0}^{n-1}((a_j - b_j) b^j) \mod p \\&= \Sigma_{j = 0}^{i-1}((a_j - b_j) b^j) + \Sigma_{j = i}^{n-1}((a_j - b_j) b^j) \mod p \\&= (a_i− b_i) b^i + \Sigma_{j = i+1}^{n-1}((a_j - b_j) b^j) \mod p \\&\geq (a_i− b_i) b^i \mod p \\& > -p\end{aligned}
hb(A)−hb(B)=Σj=0n−1((aj−bj)bj)modp=Σj=0i−1((aj−bj)bj)+Σj=in−1((aj−bj)bj)modp=(ai−bi)bi+Σj=i+1n−1((aj−bj)bj)modp≥(ai−bi)bimodp>−p
又有:
∣
h
b
(
A
)
−
h
b
(
B
)
∣
≤
p
−
1
|h_b(A) - h_b(B)| \leq p-1
∣hb(A)−hb(B)∣≤p−1
因为:
b
∈
{
0
,
1
,
.
.
.
,
p
−
1
}
b \in \left\{0,1,...,p-1\right\}
b∈{0,1,...,p−1}
所以至多有
p
−
1
p-1
p−1 个不同值,使得
h
b
(
A
)
≠
h
b
(
B
)
h_b(A) \neq h_b(B)
hb(A)=hb(B)
于是有:
P
r
(
h
(
A
)
=
h
(
B
)
)
≤
p
−
1
p
=
1
−
1
p
≤
n
−
1
p
Pr(h(A)=h(B)) ≤ \frac{p - 1}{p} = 1 − \frac{1}{p} \leq \frac{n - 1}{p}
Pr(h(A)=h(B))≤pp−1=1−p1≤pn−1
所以证明了 H \Eta H 是 ( n − 1 ) / p (n − 1)/p (n−1)/p 全域的。
4 Longest-probe bound for hashing
1
P ( X > k ) = n m × n − 1 m − 1 × n − 2 m − 2 × . . . × n − ( k − 1 ) m − ( k − 1 ) ≤ n m × n m × n m × . . . × n m ≤ 1 2 × 1 2 × 1 2 × . . . × 1 2 = ( 1 2 ) k = 2 − k \begin {aligned} P(X > k) &= \frac{n}{m} \times \frac{n-1}{m-1} \times \frac{n-2}{m-2} \times ...\times \frac{n-(k-1)}{m-(k-1)} \\& \leq \frac{n}{m} \times \frac{n}{m} \times \frac{n}{m} \times ...\times \frac{n}{m} \\& \leq \frac{1}{2} \times \frac{1}{2} \times \frac{1}{2} \times ...\times \frac{1}{2} \\& = (\frac{1}{2})^{k} \\& = 2^{-k}\end{aligned} P(X>k)=mn×m−1n−1×m−2n−2×...×m−(k−1)n−(k−1)≤mn×mn×mn×...×mn≤21×21×21×...×21=(21)k=2−k
2
k
=
2
lg
n
=
lg
n
2
k =2\lg n = \lg n^2
k=2lgn=lgn2
所以:
P
(
X
>
2
lg
n
)
=
O
(
2
−
lg
n
2
)
=
O
(
n
−
2
)
=
O
(
1
n
2
)
P(X > 2\lg n) = \Omicron(2^{-\lg n^2}) = \Omicron(n^{-2}) = \Omicron(\frac{1}{n^2})
P(X>2lgn)=O(2−lgn2)=O(n−2)=O(n21)
4
P r ( X > 2 lg n ) = P r ( X 1 > 2 lg n ∪ X 2 > 2 lg n ∪ . . . ∪ X n > 2 lg n ) = P r ( X 1 > 2 lg n ) + P r ( X 2 > 2 lg n ) + . . . + P r ( X n > 2 lg n ) = n × O ( 1 n 2 ) = O ( 1 n ) \begin {aligned} Pr(X > 2\lg n) &= Pr(X_1 > 2\lg n \cup X_2 > 2\lg n \cup ... \cup X_n > 2\lg n) \\& = Pr(X_1 > 2\lg n) + Pr(X_2 > 2\lg n) + ... + Pr(X_n > 2\lg n) \\& = n \times \Omicron(\frac{1}{n^2}) \\& = \Omicron(\frac{1}{n}) \end{aligned} Pr(X>2lgn)=Pr(X1>2lgn∪X2>2lgn∪...∪Xn>2lgn)=Pr(X1>2lgn)+Pr(X2>2lgn)+...+Pr(Xn>2lgn)=n×O(n21)=O(n1)
5
E [ x ] = Σ k = 1 n ( k × P r ( X i = k ) ) = Σ k = 1 2 lg n ( k × P r ( X i = k ) ) + Σ k = 2 lg n + 1 n ( k × P r ( X i = k ) ) ≤ 2 lg n × P r ( X < k ) + n × P r ( X i = 2 lg n ) × ( n − 2 lg n ) < 2 lg n + n × 2 − 2 lg n × n = 2 lg n + 1 = O ( lg n ) \begin {aligned} E[x] &= \Sigma_{k = 1}^{n}(k \times Pr(X_i = k)) \\& = \Sigma_{k = 1}^{2 \lg n}(k \times Pr(X_i = k)) + \Sigma_{k = 2 \lg n+1}^{n}(k \times Pr(X_i = k)) \\& \leq 2 \lg n \times Pr(X <k) + n \times Pr(X_i = 2 \lg n) \times (n-2 \lg n) \\& < 2 \lg n + n \times 2^{-2 \lg n} \times n \\& = 2 \lg n +1 \\& = \Omicron(\lg n) \end{aligned} E[x]=Σk=1n(k×Pr(Xi=k))=Σk=12lgn(k×Pr(Xi=k))+Σk=2lgn+1n(k×Pr(Xi=k))≤2lgn×Pr(X<k)+n×Pr(Xi=2lgn)×(n−2lgn)<2lgn+n×2−2lgn×n=2lgn+1=O(lgn)