Exercises
11.2-3 Professor Marley hypothesizes that substantial performance gains can be obtained if we modify the chaining scheme so that each list is kept in sorted order. How does the professor's modification affect the running time for successful searches, unsuccessful searches, insertions, and deletions?
The conclusion is: some constants may be optimized, but time complexity cannot be changed. There is a way to improve the time complexity of searches, but it's of little use.
The time of successful search is identical, Θ(1+n/m), based on the identical analysis of the ordinary hashtable.
If we use linked list to implement each hash slots, the expected running time of unsuccessful searching is 50% of the original time if we simply assume that the probability that one element's value falls between two consecutive elements in the hash slot is uniformly distributed. Then the time is still Θ(1+n/m).
Insertions and deletions are based on searches, so the overall time complexity does not change if we sort each slot.
If we use array to implement each has slots, the search can be implemented by divide-and-conquer. Thus the search time of each slot is O(logni), while ni is the size of the slot. It seems that it's optimized, but you should be aware that you must allocate the memory for the array before using it. It is also annoying that overflow occurs. And though the theoretical time is O(logn), this does not beat O(n) ordinary search when n is very small, say, n<=10 in most situations.
11.3-5 Define a family H of hash functions from a finite set U to a finite set B to be ε-universal if for all pairs of distinct elements k and l in U,
Pr{h(k) = h(l)} <= ε,
where the probability is taken over the drawing of hash function h at random the family H. Show that an ε-universal family of hash functions must have ε >= 1/|B| - 1/|U|.
Our goal is to prove max(Pr{h(k) = h(l)}) >= 1/|B| - 1/|U|
Let's focus on the overall collision that occurs.
Let m = |B|, n = |U|
First, for each slot which has x elements, there are C(x,2) = x*(x-1)/2 collisions.
For a function that makes a (d1,d2,...,dm) distribution for n elements into m slots,
the number of collisions are sigma(i=1~m,C(di,2)) = sigma(i=1~m,(di^2 - di)/2) >= (n^2 - nm)/2m
Thus the total collisions are at least |H|*n(n-m)/2m since sigma(i=1~m,di^2) >= sigma(i=1~m,di)^2 / m and sigma(i=1~m,di) = n
Second, |H|*Pr{h(k) = h(l)} is the number of collision that happens for the pair (k,l)
If we sum up all the (k,l) pairs, we will get the same number of total collisions. i.e. |H|*sigma(k,l,Pr{h(k) = h(l)}) = sigma(i=1~m,C(di,2)) >= n(n-m)/2m
Since there are C(n,2) different pairs of (k,l), and C(n,2)*max(Pr{h(k) = h(l)}) >= sigma(k,l,Pr{h(k) = h(l)}), we have:
|H|*C(n,2)*max(Pr{h(k) = h(l)}) >= |H|*n(n-m)/2m
max(Pr{h(k) = h(l)}) >= (n-m)/(n-1)m >= (n-m)/nm = 1/m - 1/n = 1/|B| - 1/|U|
Since for all (k,l) pair, Pr{h(k) = h(l)} <= ε
Thus ε >= max(Pr{h(k) = h(l)}) >= 1/|B| - 1/|U|
11.3-6 Let U be the set of n-tuples of values drawn from Z[p], and let B = Z[p], where p is prime. Define the hash function h(b):U->B for b∈Z[p] on an input n-tuple from U as h(b)[] = sigma(j=0~n-1,aj*b^j)
and let H = {h[b]:b∈Z[p]}. Argue that H is ((n-1)/p)-universal.
We just focus on a certain n-tuple , and simply call h(b)[] = h(b) without any confusion.
It is easy to see that h(b) is a (n-1)-degree polynomial. Thus for any constant integer c, the formula h(b) = c (mod p) has at most n-1 roots in Z[p].
Thus, for any certain value y∈Z[p], there are at most n-1 roots for h(x) = h(y) (mod p) in Z[p], since h(y) is a constant.
Thus, for a fixed y, the probability that x collides with y is at most (n-1)/p.
Based on the similar analysis for Theorem 11.5 (which has the conclusion that the pair (x,y) collides at the probability of at most 1/p), we can conclude that H is ((n-1)/p)-universal.
11.5-1 Suppose that we insert n keys into a hash table of size m using open addressing and uniform hashing. Let p(n,m) be the probability that no collisions occur. Show that p(n,m) <= e^(-n(n-1)/2m). Argue that when n exceeds sqrt(m), the probability of avoiding collisions goes rapidly to zero.
When n keys are randomly put into the hash table, there are n^m cases. However, only P(m,n) = m*(m-1)*...*(m-n+1) of them have no collisions. Thus the probability, p(n,m) = m*(m-1)*...*(m-n+1)/m^n
Since (m-i)*(m-n+i) < (m-n/2)^2 for all real number i
Thus p(n,m) < m*(m-n/2)^(n-1) / m^n = (1-n/2m)^(n-1) < (e^(-n/2m))^(n-1) = e^(-n(n-1)/2m)
Problems
11-1 Longest-probe bound for hashing
The ultimate task is to prove a O(logn) expected length of the longest probe sequence for a hash table using open addressing. I won't analyze it in detail because it shares the similar model with 5.4.3, Streaks.
转载于:https://www.cnblogs.com/FancyMouse/articles/1069646.html