EPI: hash tables

最新推荐文章于 2021-12-22 07:55:00 发布

hello_qingwen

最新推荐文章于 2021-12-22 07:55:00 发布

阅读量624

点赞数

分类专栏： EPI

本文链接：https://blog.csdn.net/hello_qingwen/article/details/37989335

版权

EPI 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

A hash table is a data structure used to implement the associative array, a structure that can map keys to values.

(1) Design a hash function for words in a dictionary

思路：The hash function should check every character in each word. It should give a large range of values, and should not let one character dominate (e.g., if we simply cast characters to integers and multiplied them, a single 0 would result in a hash code of 0). We would like a rolling hash function, one in which if a character is deleted from the front of the string and another added to the end.

int string_hash(const string &str, const int &modules){
     const int MULT=997;
     int val=0;
     for(const char &c: str){
           val=(val*MULT+c)%modules;
     }
     return val;
}

(2) Longest subarray A[i:j] such that all elements in A[i:j] are distinct

Idea: Using two pointers to index the start and the end of the target subarray, respectively. A hash table exist[256] indicates the occurrences of characters in A. If a character appears again, then we should adjust the start index of the subarray, and count the length of the subarray.

    //let two pointers which index the start and the end of the substring
    int lengthOfLongestSubstring(string s) {
        bool exist[256]={false};
        int i,j=0;
        int maxlen=0;
        for(i=0; i<s.length(); i++){ //end pointer prceeds
            if(exist[s[i]]){  //revise the start pointer 
                maxlen=max(maxlen, i-j); //count the length of the current substring
                while(s[j]!=s[i]){
                    exist[s[j]]=false;
                    j++;
                }
                j++;
            }
            else exist[s[i]]=true;
        }   
        maxlen=max(maxlen, i-j);
        return maxlen;
    }

(3) Minimum length subarray A[i:j] that covers Q [string version]

Idea: We can achieving a streaming algorithm by keeping track of latest occurrences of each item in Q as we process A. We use a doubly linked list L to store the last occurrence (index) of each keywords in Q, and hash table H to map each keyword in Q to the corresponding node in L.

//List loc keeps track of the latest occurrence [index in A] of each item in Q and append it to the back of it.
//Hash table records each keyword in Q which have occurred in A, and map it to corresponding node in loc
pair<int,int> longestCoveringSubarray(istringstream &sin, const vector<string>& Q){
	unordered_map<string, list<int>::iterator> dict;
	list<int> loc; //last occurrence of each string in Q
	
	for(int i=0; i<Q.size(); i++)
		dict[Q[i]]=loc.end();
		
	pair<int,int> res(-1,-1);
	int index=0; //indexing input string stream
	string s;
	while(sin>>s){
		auto it=dict.find(s);
		if(it!=dict.end()){ //s is in Q
			if(it->second != loc.end()){ //delete the old index
				loc.erase(it->second);
			}
			loc.push_back(index); //push back the current string index
			dict[s]=--loc.end();
		}
		
		if(loc.size()==Q.size() && (res.first==-1 && res.second==-1) || inde-loc.front() < res.second-res.first){
			res={loc.front(), index};
		}
		++index;
	}
	return res;
}

反思：很容易用hash table记录Q中每个keyword最近出现的位置(index)。但是无法知道他们出现的顺序。这里用list类似cache的LRU的管理方法，让最近出现的keyword总是在list的末尾，最老的keyword总是位于list头部。

(4) Minimum length subarray A[i:j] that covers Q sequentially [string version]

Idea: We solve this with a single pass over the elements of A. We maintain three data structure:

i. A hash map K which maps each element of Q to its index in Q

ii. An array L which maps j to the index of Q[j]' latest occurrence in A

iii. An array D which maps j to the length of the shortest subarray of A that ends at L[j] and sequentially covers Q[0, j]

pair<int,int> find_sequentially_covering_subarray(const vector<string>& A, const vector<string>& Q){
	unordered_map<string, int> K; //map keyword into its index in Q
	vector<int> L(Q.size(), -1); //latest occurrence of each keyword of Q in A
	vector<int> D(Q.size(), numeric_limits<int>::max()); //shortest length of subarray which ends at L[j]
	
	for(int i=0; i<Q.size(); i++) 
		K[Q[i]]=i;
		
	pair<int,int> res(-1,-1);
	for(int i=0; i<A.size(); i++){
		auto it=K.find(A[i]);
		if(it!=K.end()){ //if A[i] is in Q, update D and L
			int ind=it->second;
			// update D according to D[j]=D[j-1] + i-L[j-1]
			if(ind==0){
				D[0]=1;
			}
			else if(D[ind-1])!=numeric_limits<int>::max()){
				D[ind]=D[ind-1]+i-L[ind-1]; 
			}
			L[ind]=i; //L keeping track of the latest occurrence
			
			if(ind==Q.size()-1 && D.back()<res.second-res.first){
				res={i-D.back+1, i};
			}
		}
	} 
	return res;
}

(5) Minimum length subarray A[i:j] that covers Q [char version]

Two pointers increase from left to right. One is for the start, and the other is for the end. Two maps need to maintained:

i. A hash map needToFind maps each character of Q to the # of its occurrence in Q

ii. A hash map hasFound maps each character of Q to the # of its occurrence in A

if the total number of characters in hasFound == Q.size(), then we can adjust the start pointer to the proper place, and count the length of the substring

class Solution {
public:
    string minWindow(string S, string T) {
        int needToFound[256]={0};
        int hasFound[256]={0};
        
        int minWindowlen=INT_MAX;
        int minWindowBeg=-1;
        int minWindowEnd=-1;
        int count=0;
        
        for(int i=0; i<T.length(); i++)
            needToFound[T[i]]+=1;
        
        for(int beg=0, end=0; end<S.length(); end++){
            if(needToFound[S[end]]==0)
                continue;
            hasFound[S[end]]+=1;
            if(hasFound[S[end]]<=needToFound[S[end]])
                count++;
            if(count==T.length()){
                while(hasFound[S[beg]]==0||hasFound[S[beg]]>needToFound[S[beg]]){
                    if(hasFound[S[beg]]>needToFound[S[beg]])
                        hasFound[S[beg]]--;
                    beg++;
                }
                int windowlen=end-beg+1;
                if(windowlen<minWindowlen){
                    minWindowlen=windowlen;
                    minWindowBeg=beg;
                    minWindowEnd=end;
                }
                //count=0;
            }
        }
        if(minWindowBeg==-1) return "";
        string res=S.substr(minWindowBeg, minWindowlen);
        return res;
    }
};