算法导论复习

最新推荐文章于 2024-09-12 18:58:16 发布

voygern

最新推荐文章于 2024-09-12 18:58:16 发布

阅读量247

点赞数

分类专栏：算法学习文章标签：算法

本文链接：https://blog.csdn.net/voygern/article/details/84350380

版权

学习同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

算法

1 篇文章 0 订阅

订阅专栏

Hash Table

Dynamic Sets

Elements have a key and satellite data
support queries:Search(S,k),Maximum(S),Successor(S,x),Predecessor(S,x)
support modifying operation: Insert(S,x), Del

Hash Tables

Direct-access tables

keys are drawn from set $U\subseteq\{0,1,...,m-1\}$ and distinct, set up an array T[m]
$T[k]=\begin{cases}x, \quad \ \ & if\ x\in K\ and\ key[x]=k,\\ NIL,\quad \ \ &otherwise\end{cases}$
$\Theta (1)$ time
limitation:range of key can be very large, while set K of keys actually stored maybe <<U, most of the space allocated for T is wasted.

Resolving collisions by chaining

Link records in the same slot in a list
Worst case: $\Theta(n)$
Average case:
let n = #keys, m= #slots, load factor T $\alpha=\frac{n}{m}$ = average #keys/slot
Search cost: expected unsuccessful search $=\Theta(1+\alpha )$
expected search time $=\Theta(1)\quad \ if\ \alpha=O(1)(n=O(m))$

Choosing hash functions

distribute the keys uniformly into the slots of the table
regularity in the key distribution should not affect this uniformly
(1) Division method: $h(k)=k\ mod\ m$
Deficiency: Don’t pick an m has a small remainder d. A preponderance of(数量上的优势) keys that are conguent(相配的，全等的) modulo d can adversely affect uniformly. If $m=2^r$ , the the hash doesnot even depend on all its bits.
Proper m: prime and not too close to a power of 2 or 10 and not otherwise used prominently in the computing environment. eg. m=701 for n=2000
(2)Multiplication method: $m=2^r\ \ h(k)=(A\cdot k\ mod 2^w)rsh(w-r)$ $2^{w-1}<A<2^w)$
donnot pick A too close to $2^{w-1}$ or $2^w$

Open addressing
No storage is used outside of the hash table itself.
Insertion systematically probes the table until an empty slot is found
$\quad \newline$ Probing strategies:
(1)Linear probing: $h(k,i)=(h^{'}(k)+i)\ mod\ m$ suffers from primary clustering(occupied slots tend to get longer.
(2)Quadratic probing: $h(k,i)=(h^{'}(k)+c_1i+c_2i^2)\ mod\ m$ , suffers sencondary clustering
(3)Double hashing: $h(k,i)=(h_1(k)+ih_2(k))$ , $h_2(k)$ must be co-prime to m, one way is make m a power of 2 and $h_2(k)$ produce odd numbers
$\newline$
Analysis:
Assumption of uniform hashing , we get Theorem: Given open-addressed hash table with $\alpha=n/m<1$ ,the expected number of probes in an unsuccessful search is at most $1/(1-\alpha)$
Proof:
$\alpha=\frac{n}{m}<1,so\ \frac{n-1}{m-1}<\frac{n}{m}, E(s)=1+\frac{n}{m}(1+(\frac{n-1}{m-1}(1+...)))$
$E(s)\leq1+\alpha(1+\alpha(1+\alpha(...)))\leq1+\alpha+\alpha^2+...=\frac{1}{1-\alpha}$

Binary Search Trees

Properties: $key[left(x)]\leq key[x]\leq key[right(x)]$
Walks:
(1)Inorder tree walk: left,root,right output sorted (increasing) sequence
(2)Preorder tree walk: root,left,right
(3)Postorder tree walk: left,right,root
Operation:
(1)search or insert: $O (h)$ worst-case $O (n)$
(2)successor:have right subtree: minimum node in the subtree; otherwise, lowest ancestor of x whose left child is also one ancestor of x.
(3)Delete:
- has no children : remove x
- has only one child: splice out x
- has two children swap x with successor perform case 1 or 2 to delete
Sorting with BST
- Worst case: $O(n^2)$ ; best case: $O (n l g n)$ ; average case: $O (n l g n)$
- Comparison to quicksort: Inserted nodes are similar to partition pivot, but in a different order, BST does not partition immediately after picking the inserted node.
  quicksort is better since it sorts in place and does not need to build data structure.
Randomly Built BST
- (1)Theorem: The average height of a randomly-built binary search tree of n distinct keys is $O l g (n)$
- (2)Proof:
- Outline:
1. Prove Jensen’s inequality $f(E[X])\leq E[f(X)]\ where\ f\ is\ convex$
2. Analyze the exponential height of RBST
3. Put together
- Convex Funciton: $\alpha,\beta \geq0\&\alpha+\beta=1\Rightarrow f(\alpha x+\beta y)\leq \alpha f(x)+\beta f(y)$
- Prove Jensen’s inequality:
  For n=1, we have $\alpha_1=1, hense\ f(\alpha_1x_1)\leq \alpha_1f(x_1)$
  Suppose for it is true for n, then prove it is also true for n+1
  $f(\sum^{n+1}_{k=1}\alpha_kx_k)=f(\alpha_{n+1}x_{n+1}+(1-\alpha_{n+1})\sum^{n}_{k=1}\frac{\alpha_k}{1-\alpha_{n+1}}x_k)\\ \leq \alpha_{n+1}f(x_{n+1})+(1-\alpha_{n+1})f(\sum^{n}_{k=1}\frac{\alpha_k}{1-\alpha_{n+1}}x_k))\\ \leq \alpha_{n+1}f(x_{n+1})+(1-\alpha_{n+1})(\sum^n_{k=1}\dfrac{\alpha_k}{1-\alpha_{n+1}}f(x_k))=\sum_{k=1}^{n+1}\alpha_kf(x_k)$
  Let $f$ be a convex function, and let $X$ be a random variable, $\alpha_k=P\{X=x_k\}$ , then we get $f(E[x])\leq E[f(x)]$
- Analysis of BST Height:
  Let $X_n$ be the random variable denoting the height of RBST on n nodes, and let $Y_n=2^{X_n}$ be its exponential height.
  If the root of tree has a rank $k$ , then $X_n=1+max\{X_{k-1},X_{n-k}\}$ ,since each side of the subtrees are built randomly. Hence, $Y_n=2\cdot max\{Y_{k-1},Y_{n-k}\}$
  Define indicator random variable $Z_{nk}=\begin{cases} 1\quad \ & if\ the\ root\ has\ rank\ k,\\ 0 & otherwise. \end{cases}$
  Thus, $Pr\{Z_{nk}=1\}=E[Z_{nk}]=\dfrac{1}{n}$ , and
  $Y_n=\sum_{k=1}^{n}Z_{nk}(2\cdot max\{Y_{k-1},Y_{n-k}\})\\ E[Y_n]=E[\sum_{k=1}^{n}Z_{nk}(2\cdot max\{Y_{k-1},Y_{n-k}\})]=\sum_{k=1}^{n}E[Z_{nk}(2\cdot max\{Y_{k-1},Y_{n-k}\})]$
  Due to the independence of the rank of the root from the rank of the subtrees
  $E[Y_n]=2\sum_{k=1}^{n}E[Z_{nk}]E[(max\{Y_{k-1},Y_{n-k}\})]\leq \frac{2}{n}\sum_{k=1}^{n}E[Y_{k-1}+Y_{n-k}]=\frac{4}{n}\sum_{k=0}^{n-1}E[Y_k]$
  Assume for k<n, it is true that $E[Y_k]\leq ck^3$ , for n,
  $E[Y_n]=\frac{4}{n}\sum_{k=0}^{n-1}E[Y_k]\leq \frac{4}{n}\sum_{k=0}^{n-1}ck^3\leq \frac{4c}{n}\int_0^nx^3dx=cn^3$
- Putting it all together
  $f(x)=2^{x}\quad \ \ 2^{E[X_n]}\leq E[2^{X_n}]=E[Y_n]\leq cn^3\Rightarrow E[X_n]\leq 3lgn+O(1)$

Red-Black Trees

Red-Black Tree Properties:
- Every node is either red or black
- Every leaf(NULL pointer) is black
- If a node is red, both children are black
- Every path from node to descendent leaf contains the same nuber of black nodes
- The root is always black
Black-height
- a height-h node has black-height $\geq h/2$ (according to thrid rule)
- Theorem: A red-black tree with n internal nodes has height $h\leq 2\ lg(n+1)$
  Proof:
  Claims:A subtree rooted at a node x contains at least $2^{bh(x)}-1$ internal nodes
  it is true for bh(x)=0
  $2^{bh(x)-1}-1)+(2^{bh(x)-1}-1)+1=2^{bh(x)}-1$
  Thus $n\geq 2^{bh(root)}-1\Rightarrow n\geq 2^{h/2}-1\Rightarrow h\leq 2\ lg(n+1)$
- Corollary:
  Minimum(),Maximum(),Successor(),Predecessor(),Search() take $O (l g n)$ time
Insert Operation
if parent is the left child of its parent: (3 cases) if x’s uncle is red, then recolor is ok. Otherwise, if x is the right child of its parent, left rotate it to the case 3. For case 3, right rotate x’s parent and then recolor is ok.
Expansion
- Longest simple path from a node x in a red-black tree to a descendant leaf has length at most twice that of the shortest simple path. (Proof)
- Largest possible number of internal nodes in a bh(x)=k , $2^{2k}-1$ , smallest $2^k-1$
- AVL tree:
  (1)Prove that n nodes has height O(lgn)
  Least number $F_0=0,F_1=1,F_2=2,F_n=F_ {n-1}+F_{n-2}+1$
  Fi grows exponetialy so the height of the tree is lgn
  (2)Insert operation:rotation

B-tree

Definition
- Every node x has following fields: n[x] where keys are stored in undescreasing order, Leaf[x]=True|False
- Each internal x also contains n[x]+1 pointers. Leaf nodes have no children, so their $c_i$ field are undefine
- $k_1\leq key_1[x]\leq k_2\leq key_2[x]\leq \cdots \leq key_{n[x]}\leq k_{n[x]+1}$
- All leaves have the same depth, which is the tree’s height h
- There are lower and upper bounds on the number of keys a node can contain: each node other than root must have t-1 keys. If the tree is nonempty, the root must have at least one key. Every node can contain at most 2t-1 keys, therefore, an internal node can have at most 2t children.
B-Tree Height： $h\leq log_t((n+1)/2)$
Basic Operations
- Searching a B-tree: CPU time: $O(tlog_t\ n)$ , disk operation: $\theta(log_t\ n)$
- Splitting a node: CPU times: $\theta (t)$ , disk operation: $O (1)$
- Inserting a key: need to split when the node is full
- Deleting a key: need to merge before go down; if it is leaf, remove directly; otherwise find predecessor or successor to replace k, recursively delete, when both of them are less then t elements, then merge them.

Augmenting Data Structures

Dynamic Order Statistics
- Os-Tree: find the ith element of a dynamic set in O(lg n) time
- Os-Tree augment red-black trees:
  size[x]=size[left]+size[right]+1, core idea: rank[x]=size[left]+1
Interval Trees
record the max interval of the subtree
Correctness: if it go left, the overlap is in left tree, otherwise ,there will be no overlap in the right tree.

Skip List

Introduction:
Simple random dynamic search structure.
Maintains a dynamic set of n elements in $O (l g n)$ time per operation in expectation and with high probability
Search Cost
$|L_1|+\dfrac{|L_2|}{|L_1|}\Rightarrow 2\sqrt{n}$
k sorted lists $\Rightarrow k\cdot \sqrt[k]{n}$
lg n sorted lists $\Rightarrow lg\ n\cdot \sqrt[lg n]{n} = 2lg\ n$
Insert Operation:
insert x to the bottom list, promote x to next level up and flip again
Delete Operation:
remove x from all lists containing it
Theorem: With high probability, every search in an n-element skip list costs O(lg n)
-With high probability: Event E occurs w.h.p. if, for any $\alpha \geq1$ , there is an appropriate choise of constants for which E occurs with probability at least $1-O(1/n^{\alpha})$ . Formally: Parameterized event $E_\alpha$ occurs w.h.p if, for any $\alpha \geq1$ , there is an appropriate choise of constants for which $E_\alpha$ occurs with probability at least $1-c_\alpha/n^{\alpha})$
-Proof:
(1)w.h.p. skip list has O(lg n) levels
$Pr\{more\ than\ c\ lg\ n\ levels\}\leq n\cdot Pr\{$ x promoted at least c lg n tiems $\}=n\cdot (\dfrac{1}{2^{c\ lg\ n}})=n\cdot (\dfrac{1}{n^c})=\dfrac{1}{n^{c-1}}$
Error probability for having at most c lg n levels are less than $\dfrac{1}{n^{c-1}}$ , which is polynomially small. We can make $\alpha$ arbitraily large by choosing the constant c in the O(lg n) bound accordingly.
(2)w.h.p every search in skip list cost O(lg n)
Analyze search backwards——leaf to root, If node wasnot promoted higher(got tails here) then we go left, otherwise we go up.
Number of ‘up’ moves < #levels $\leq$ clg n w.h.p.(Lemma(1))
#coin flips until c lg n heads $=\Theta(lg\ n)$ w.h.p.
Obviously at least flip c lg n coins to get heads， so #= $\Omega(lg\ n)$
Suppose we make 10c lg n flips
$Pr\{at\ most c\ lg\ n\ heads\}\leq C({10c\ lg\ n},{c\ lg\ n})\cdot (\dfrac{1}{2})^{9c\ lg\ n}\leq (e\dfrac{10c\ lg\ n}{c\ lg\ n})^{c\ lg\ n}\cdot (\dfrac{1}{2})^{9c\ lg\ n}\\ =2^{lg(10e)\cdot c\ lg\ n}2^{-9c\ lg\ n}=2^{[lg(10e)-9]c\ lg\ n}=1/n^\alpha\ for\alpha=[9-lg(10e)]c$
if we flip more coins, the posibilities are even less, so the upper bound is c lg n. Proof.
Expansion
Randomly built binary tree
Assign random priority

Dynamic Programming

Basic idea:
Subproblem are dependent, if we treat them independently, we will calculate redundantly.
Procedure：
- Divide into subproblem
- Solve subproblem and store the solution in a list, and access it when the solution is reused
- Bottom-up
Elements:
- Optimal structure: optimal solution to the problem is contained within its optimal solutions to subproblems
- Overlapping subproblems
Cases
- Matrix-chain Multiplication
  A[1:n]=optimal order for complute $A_1A_2...A_n$
  A[1:n]=minimum{1 $\leq k<n$ :A[1,k]+A[k+1,n]+ $p_0p_kp_n$ }
  for A[i:j] = minimum{i $\leq k<j$ :A[1,k]+A[k+1,n]+ $p_{i-1}p_kp_j$ }
  -Longest Common Sequence
  c[i,j] means LCS of $X_i,and\ Y_j$
  $c[i,j]=\begin{cases}0\quad \ \ \ &if\ i=0\ or\ j=0\ \\ c[i-1,j-1]+1&if\ i,j>0\ and\ x_i=y_j\\ max\{c[i-1,j],c[i,j-1]\}&if\ i,j>0\ and\ x_i\neq y_j\end{cases}$
- Triangle Decomposition
  t[i,j]: $v_{i-1},v_i,\dots,v_j$ 's minimum price.
  $t[i][j]=\begin{cases}0\quad \ \ \ &i=j\\ min\{t[i][k]+t[k+1][j]+w(v_{i-1}v_kv_j)\}\end{cases}$
- edit distance between X and Y
  $d[i,j]=\begin{cases}0\quad \ \ \ &if\ i=0\ and\ j=0\ \\ min\{d[i-1,j-1]+1,d[i-1,j]+diff(i,j),d[i,j-1]+1\}&if\ i,j>0\ and\ x_i\neq y_j\end{cases}$
- knapsack problem:
  $c[i,w]=\begin{cases}0\quad \ \ \ &if\ i=0\ or\ w=0\ \\ c[i-1,w]&if\ w_i>w\\ max\{v_i,c[i-1,w-w_i],c[i-1][w]\}&if\ i,j>0\ and\ x_i\neq y_j\end{cases}$

Greedy Algorithms

Simpler and more efficient than Dynamic Programming

Activity-selection problem
greedy select earliest finish activity, Proof, cut and paste: $|S_{ij}-a_k\cup a_m|=|S_{ij}|$
O(n)
knapsack prob
0-1 knapsack does not has the greedy choice, but fractional problem has.
Huffman codes
Proof
unit-time task scheduling
Elelents of greedy strategy:
1.make a choose and are left with one subproblem to solve
2.Prove that there’s always an optimal solution that makes the greedy choice always safe.
3.show that greedy choice and optimal solution to subproblem is the optimal solution to the problem

Linear Programming

Procedure:
simplex=>find tightest constraint=>switch until all the coefficient of unknown variable in Z are negative.

Amortized Analysis

Analysis of dynamic tables
Worst case for n insertions is $\Theta(n)$
Amortized analysis
garantees the average performance of each operation in the worst case
- aggregate method
- accounting method
  charge $\hat{c_i}$ and the actual cost is $c_i$ , ensure that
  $\sum^{n}_{i=1}c_i\leq \sum^{n}_{i=1}\hat{c_i}$
  Thus, the total amortized costs provide an upper bound on the total true cost
- potential method
  $\hat{c_i}=c_i+\Phi(D_i)-\Phi(D_{i-1})$
  $\sum^{n}_{i=1}\hat{c_i}=\sum^{n}_{i=1}(c_i+\Phi(D_i)-\Phi(D_{i-1}))=\sum^{n}_{i=1}{c_i}+\Phi(D_n)-\Phi(D_{0})\geq \sum^{n}_{i=1}{c_i}$
Expansion

voygern

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
算法导论复习

Hash TableDynamic SetsElements have a key and satellite datasupport queries:Search(S,k),Maximum(S),Successor(S,x),Predecessor(S,x)support modifying operation: Insert(S,x), DelHash TablesDirec...
复制链接

扫一扫