A hash table is a data structure used to implement the associative array, a structure that can map keys to values.
(1) Design a hash function for words in a dictionary
思路:The hash function should check every character in each word. It should give a large range of values, and should not let one character dominate (e.g., if we simply cast characters to integers and multiplied them, a single 0 would result in a hash code of 0). We would like a rolling hash function, one in which if a character is deleted from the front of the string and another added to the end.
int string_hash(const string &str, const int &modules){
const int MULT=997;
int val=0;
for(const char &c: str){
val=(val*MULT+c)%modules;
}
return val;
}
(2) Longest subarray A[i:j] such that all elements in A[i:j] are distinct
Idea: Using two pointers to index the start and the end of the target subarray, respectively. A hash table exist[256] indicates the occurrences of characters in A. If a character appears again, then we should adjust the start index of the subarray, and count the length of the subarray.
//let two pointers which index the start and the end of the substring
int lengthOfLongestSubstring(string s) {
bool exist[256]={false};
int i,j=0;
int maxlen=0;
for(i=0; i<s.length(); i++){ //end pointer prceeds
if(exist[s[i]]){ //revise the start pointer
maxlen=max(maxlen, i-j); //count the length of the current substring
while(s[j]!=s[i]){
exist[s[j]]=false;
j++;
}
j++;
}
else exist[s[i]]=true;
}
maxlen=max(maxlen, i-j);
return maxlen;
}
(3) Minimum length subarray A[i:j] that covers Q [string version]
Idea: We can achieving a streaming algorithm by keeping track of latest occurrences of each item in Q as we process A. We use a doubly linked list L to store the last occurrence (index) of each keywords in Q, and hash table H to map each keyword in Q to the corresponding node in L.
//List loc keeps track of the latest occurrence [index in A] of each item in Q and append it to the back of it.
//Hash table records each keyword in Q which have occurred in A, and map it to corresponding node in loc
pair<int,int> longestCoveringSubarray(istringstream &sin, const vector<string>& Q){
unordered_map<string, list<int>::iterator> dict;
list<int> loc; //last occurrence of each string in Q
for(int i=0; i<Q.size(); i++)
dict[Q[i]]=loc.end();
pair<int,int> res(-1,-1);
int index=0; //indexing input string stream
string s;
while(sin>>s){
auto it=dict.find(s);
if(it!=dict.end()){ //s is in Q
if(it->second != loc.end()){ //delete the old index
loc.erase(it->second);
}
loc.push_back(index); //push back the current string index
dict[s]=--loc.end();
}
if(loc.size()==Q.size() && (res.first==-1 && res.second==-1) || inde-loc.front() < res.second-res.first){
res={loc.front(), index};
}
++index;
}
return res;
}
反思:很容易用hash table记录Q中每个keyword最近出现的位置(index)。但是无法知道他们出现的顺序。这里用list类似cache的LRU的管理方法,让最近出现的keyword总是在list的末尾,最老的keyword总是位于list头部。
(4) Minimum length subarray A[i:j] that covers Q sequentially [string version]
Idea: We solve this with a single pass over the elements of A. We maintain three data structure:
i. A hash map K which maps each element of Q to its index in Q
ii. An array L which maps j to the index of Q[j]' latest occurrence in A
iii. An array D which maps j to the length of the shortest subarray of A that ends at L[j] and sequentially covers Q[0, j]
pair<int,int> find_sequentially_covering_subarray(const vector<string>& A, const vector<string>& Q){
unordered_map<string, int> K; //map keyword into its index in Q
vector<int> L(Q.size(), -1); //latest occurrence of each keyword of Q in A
vector<int> D(Q.size(), numeric_limits<int>::max()); //shortest length of subarray which ends at L[j]
for(int i=0; i<Q.size(); i++)
K[Q[i]]=i;
pair<int,int> res(-1,-1);
for(int i=0; i<A.size(); i++){
auto it=K.find(A[i]);
if(it!=K.end()){ //if A[i] is in Q, update D and L
int ind=it->second;
// update D according to D[j]=D[j-1] + i-L[j-1]
if(ind==0){
D[0]=1;
}
else if(D[ind-1])!=numeric_limits<int>::max()){
D[ind]=D[ind-1]+i-L[ind-1];
}
L[ind]=i; //L keeping track of the latest occurrence
if(ind==Q.size()-1 && D.back()<res.second-res.first){
res={i-D.back+1, i};
}
}
}
return res;
}
(5) Minimum length subarray A[i:j] that covers Q [char version]
Two pointers increase from left to right. One is for the start, and the other is for the end. Two maps need to maintained:
i. A hash map needToFind maps each character of Q to the # of its occurrence in Q
ii. A hash map hasFound maps each character of Q to the # of its occurrence in A
if the total number of characters in hasFound == Q.size(), then we can adjust the start pointer to the proper place, and count the length of the substring
class Solution {
public:
string minWindow(string S, string T) {
int needToFound[256]={0};
int hasFound[256]={0};
int minWindowlen=INT_MAX;
int minWindowBeg=-1;
int minWindowEnd=-1;
int count=0;
for(int i=0; i<T.length(); i++)
needToFound[T[i]]+=1;
for(int beg=0, end=0; end<S.length(); end++){
if(needToFound[S[end]]==0)
continue;
hasFound[S[end]]+=1;
if(hasFound[S[end]]<=needToFound[S[end]])
count++;
if(count==T.length()){
while(hasFound[S[beg]]==0||hasFound[S[beg]]>needToFound[S[beg]]){
if(hasFound[S[beg]]>needToFound[S[beg]])
hasFound[S[beg]]--;
beg++;
}
int windowlen=end-beg+1;
if(windowlen<minWindowlen){
minWindowlen=windowlen;
minWindowBeg=beg;
minWindowEnd=end;
}
//count=0;
}
}
if(minWindowBeg==-1) return "";
string res=S.substr(minWindowBeg, minWindowlen);
return res;
}
};