因为国庆假和调课,直到现在本人才把上一周的算法博客发出来。不过我在这道题上也走了不少弯路,只能说自己的思维确实还需要锻炼。
这篇博客详细地描述了我的思路过程,当是一个记录,也希望它能提醒我以后进行更为全面高效的思考。
下面来看看题目:
题目
Given two words (beginWord and endWord), and a dictionary’s word list, find all shortest transformation sequence(s) from beginWord to endWord, such that:
- Only one letter can be changed at a time
- Each transformed word must exist in the word list. Note that beginWord is not a transformed word.
Note:
- Return an empty list if there is no such transformation sequence.
- All words have the same length.
- All words contain only lowercase alphabetic characters.
- You may assume no duplicates in the word list.
- You may assume beginWord and endWord are non-empty and are not the same.
Example 1:
Input:
beginWord = "hit",
endWord = "cog",
wordList = ["hot","dot","dog","lot","log","cog"]
Output:
[
["hit","hot","dot","dog","cog"],
["hit","hot","lot","log","cog"]
]
Example 2:
Input:
beginWord = "hit"
endWord = "cog"
wordList = ["hot","dot","dog","lot","log"]
Output: []
Explanation: The endWord "cog" is not in wordList, therefore no possible transformation.
分析
我们先定义如何检测两个字符串是否只有一个不同处:
inline bool cmp(const string& s1, const string& s2){
int len = s1.length();
int diff = 0;
for(int i = 0; i < len; i++){
if(s1[i] != s2[i]) diff++;
}
return diff == 1;
}
构建两端之间的最短路径图
因为要找 begin
与 end
之间的最短路径,所以要用 BFS
算法,即广度优先搜索算法。
那么,就可以把每个单词视作一个节点,这个问题本质上是一个图论问题。
为了方便,我们用 vector
的索引代表每个单词。使用 vector<int>
表示 vertex
,即节点的前驱节点列表,vector<vertex>[i]
表示第 i
个节点的前驱节点列表。
由于要找所有最短路径,且可能有多个节点通往 end
,所以在找到 end
之后,还不能停下来,要继续找到其他在前一层的、同样通往 end
的节点。
同理,对其他节点,可能有多个前驱节点,所以在进行 BFS
时,如果某一个节点有已经加入 BFS
搜索队列(即 distance
值已更改)的后继节点,也要把该节点加入这个后继节点的前驱节点列表。只有未加入 BFS
搜索队列的后继节点,才需要加入 BFS
搜索队列、更新 distance
值。
typedef vector<int> vertex; // prev-list
class Solution{
private:
vector<vertex> buildGraph(bool& accessible, const int& begin, const int& end,
const int& size, const vector<string>& dictionary);
void findPaths(...); // 具体参数因算法不同而有所改变,参见后面代码块
public:
vector<vector<string>> Solution::findLadders(string beginWord, string endWord, vector<string>& wordList);
};
vector<vertex> Solution::buildGraph(bool& accessible, const int& begin, const int& end,
const int& size, const vector<string>& dictionary){
vector<vertex> vers; // saves prev-lists of vertexes
vers.assign(size, vertex());
int distance[size];
for(int i = 0; i < size; i++){
distance[i] = INT_MAX;
}
distance[begin] = 0;
queue<int> bfs;
bfs.push(begin);
int levelFlag = -1; // records the distance from the end to the begin
// builds a graph and makes bfs
while(!bfs.empty()){
int index = bfs.front();
bfs.pop();
if(distance[index] == levelFlag) break;
else if(cmp(dictionary[index], dictionary[end])){
vers[end].push_back(index);
// cout << "[" << dictionary[index] << " " << dictionary[end] << "]" << endl;
if(distance[end] > distance[index] + 1){ // end is found for the first time
levelFlag = distance[end] = distance[index] + 1;
bfs.push(end);
accessible = true;
}
continue;
}
for(int i = 0; i < size; i++){
if(i != index && cmp(dictionary[index], dictionary[i])){
if(distance[i] >= distance[index] + 1){ // index -> i
// cout << "[" << index << " " << dictionary[index] << " "
// << i << " " << dictionary[i] << "]" << endl;
vers[i].push_back(index);
}
if(distance[i] > distance[index] + 1){
distance[i] = distance[index] + 1;
bfs.push(i);
}
}
}
}
return vers;
}
记录所有路径
通过从 end
开始,使用前驱节点列表向前回溯,就不用考虑那些不在最短路径上的节点。这样,产生的路径与期望路径是相反的,我们需要在最后对每条路径进行逆序操作。
举个比较复杂的、可能的图的例子:
图中存在环,需要考虑在遍历时选择分支和分支汇合的问题。
歧路:节点分裂
我曾经考虑过,把有多个入度(在这里,如果 A
在 B
的前驱节点列表里,则在从 end
开始的回溯过程中有从 B
指向 A
的边,这会产生一个入度)的节点分裂出多个副本,以此得到没有分支汇合的树。在 end
和 begin
之间,将产生多条独立而无交叉的路径。
类似下图:
但其实这是歧路,难以实现。我们需要考虑到更复杂的情况,即有多个入度大于1的节点。
① 如下图,若在进行 BFS
从 begin
走到 end
的过程中,对有多个后继节点的节点进行分裂,那么到了后面依旧会再现嵌套环的结构。
( A
为 begin
,H
为 end
,从 u
指向 v
的箭头表示 u
的前驱节点列表中有 v
,u
是 BFS
过程中 v
的后继节点,从 end
开始回溯时会从 u
走到 v
。)
② 如果在进行从 end
回溯到 begin
的过程时,将遇到的入度大于1的节点分裂,保持每个节点只有一个入度。
这样虽然可以消除嵌套环,但是就需要更复杂的数据结构,比如每个节点维持一个前驱节点列表和后继节点列表。空间耗费比起①又会更大一些。
而且,进行节点分裂后一样需要进行 DFS
遍历,因为仍然存在分支,只是分支不会汇合了。
正途:不删除边的DFS算法
老老实实地重新审视这些图。
自己重新走一遍图,可以发现遵循的仍是 DFS
算法。但以前做对树的 DFS
遍历时,通常会选择在出栈时把已访问的节点从树中删除。然而对这个有分支汇合形成环的图,不能把节点从图中删掉,因为可能之后还要再回来访问。
那么,就可能需要辅助的数据结构来记录分支的访问情况。
递归实现
利用递归的函数栈,来实现 DFS
访问。只改变记录路径的 words
,不改变图 graph
。
void Solution::findPaths(const int& index, vector<vector<string>>& result, vector<string>& words,
const vector<vertex>& graph, const vector<string>& dictionary){
words.push_back(dictionary[index]);
const vertex& list = graph[index];
for(const int& v : list){
findPaths(v, result, words, graph, dictionary);
}
if(index == graph.size() - 1){ // index == begin
result.push_back(words);
}
words.pop_back();
}
vector<vector<string>> Solution::findLadders(string beginWord, string endWord, vector<string>& wordList){
// finds if endWord is in dictionary
int e = -1;
for(int i = 0; i < wordList.size(); i++){
if(wordList[i] == endWord){
e = i;
break;
}
}
if(e == -1) return vector<vector<string>>();
vector<string> dictionary = wordList;
dictionary.push_back(beginWord);
const int size = dictionary.size();
const int begin = size - 1;
const int end = e;
bool accessible = false; // presents if the end is accessible
// builds a graph by bfs
const vector<vertex> graph = buildGraph(accessible, begin, end, size, dictionary);
if(!accessible) return vector<vector<string>>();
// finds all the shortest paths by dfs
vector<vector<string>> result;
vector<string> words;
findPaths(end, result, words, graph, dictionary);
for(vector<string>& v : result){
reverse(v.begin(), v.end());
}
return result;
}
用时 496 m s 496ms 496ms 。
整个算法的时间复杂度为 O ( V + E ) O(V+E) O(V+E) , V V V 指节点数, E E E 指边数。
循环&栈实现
递归的缺点是,调用函数时的栈操作比较耗时。现在尝试使用循环来实现 DFS
算法。
在循环中,不像递归那样可以直接扫描前驱节点列表进入不同的分支。我们需要一个栈,来记录在每个岔路口选择的分支。
以下面这张图为例,
有如下分支栈的变化:
在以下算法中,我们有两个int
类型的栈 dfs
和 branch
。
dfs
栈记录遍历时遇到的节点,branch
栈记录遇到岔路口(有多个前驱节点的节点)时,选择的分支。
-
遍历时,将
dfs
栈顶节点的第一个前驱节点推入dfs
栈,如果dfs
栈顶节点有多个前驱节点,则将0
推入branch
栈,表示选择第一个分支。一直入dfs
栈直到begin
入栈。 -
然后对
dfs
进行出栈操作,期间如果遇到有多个前驱节点的节点,则将branch
栈的栈顶与该节点的前驱节点个数比较,看是否已经走过了最后一个分支。如果是最后一个分支,那么
branch
出栈,dfs
继续出栈,直到遇到下一个有未遍历的分支的岔路口。这个节点不出栈。 -
停止出栈操作,将
branch
的栈顶自增,记结果为top
。表示选择dfs
栈顶节点的第top+1
个分支。将
dfs
栈顶节点的第top+1
个前驱节点推入dfs
栈,如果该节点也有多个前驱节点,则将0
推入branch
栈。 -
重复以上过程,直到
dfs
栈为空。
void Solution::findPaths(vector<vector<string>>& result, const int end,
const vector<vertex>& graph, const vector<string>& dictionary){
const int begin = graph.size() - 1;
vector<string> words;
stack<int> dfs;
stack<int> branch;
dfs.push(end);
// cout << "push " << end << " " << dictionary[end] << endl;
words.push_back(dictionary[end]);
if(graph[end].size() > 1){
branch.push(0);
// cout << "branch push 0" << endl;
}
while(1){
// push
while(dfs.top() != begin){
const int& prev = dfs.top();
const int& next = graph[prev][0];
dfs.push(next);
words.push_back(dictionary[next]);
// cout << "push " << next << " " << dictionary[next] << endl;
if(graph[next].size() > 1){
branch.push(0);
// cout << "branch push 0" << endl;
}
}
result.push_back(words);
// pop till meets a vertex with unchoosed branch
while(!dfs.empty()){
const int& prev = dfs.top();
if(graph[prev].size() > 1){
if(graph[prev].size() != branch.top() + 1){
break;
}
else{
// cout << "branch pop" << branch.top();
branch.pop();
}
}
// cout << " pop " << dfs.top() << " " << words.back() << endl;
dfs.pop();
words.pop_back();
}
if(dfs.empty()) break;
// cout << dfs.top() << endl;
// turn to another branch
int top = branch.top();
top++;
branch.pop();
branch.push(top);
// cout << branch.top() << " branch top" << endl;
const int& prev = dfs.top();
const int& next = graph[prev][top];
dfs.push(next);
words.push_back(dictionary[next]);
// cout << "push " << next << " " << dictionary[next] << endl;
if(graph[next].size() > 1) {
branch.push(0);
// cout << "branch push 0" << endl;
}
}
}
vector<vector<string>> Solution::findLadders(string beginWord, string endWord, vector<string>& wordList){
// finds if endWord is in dictionary
int e = -1;
for(int i = 0; i < wordList.size(); i++){
if(wordList[i] == endWord){
e = i;
break;
}
}
if(e == -1) return vector<vector<string>>();
vector<string> dictionary = wordList;
dictionary.push_back(beginWord);
const int size = dictionary.size();
const int begin = size - 1;
const int end = e;
bool accessible = false; // presents if the end is accessible
// builds a graph by bfs
const vector<vertex> graph = buildGraph(accessible, begin, end, size, dictionary);
if(!accessible) return vector<vector<string>>();
// finds all the shortest paths by dfs
vector<vector<string>> result;
vector<string> words;
findPaths(result, end, graph, dictionary);
for(vector<string>& v : result){
reverse(v.begin(), v.end());
}
return result;
}
用时 316 m s 316ms 316ms 。
整个算法的时间复杂度为 O ( V + E ) O(V+E) O(V+E) 。
测试代码
现附上测试代码,方便后来人:
void print(vector<vector<string>> result){
for(vector<string>& v : result){
for(string& s : v){
cout << s << " ";
}
cout << endl;
}
cout << endl;
}
int main(){
string b1 = "hit", e1 = "cog";
vector<string> l1;
l1.push_back("hot"); l1.push_back("dot"); l1.push_back("dog");
l1.push_back("lot"); l1.push_back("log"); l1.push_back("cog");
string b2 = "hit", e2 = "cog";
vector<string> l2;
l2.push_back("hot"); l2.push_back("dot"); l2.push_back("dog");
l2.push_back("lot"); l2.push_back("log");
string b3 = "red", e3 = "tax";
vector<string> l3;
l3.push_back("ted"); l3.push_back("tex"); l3.push_back("red");
l3.push_back("tax"); l3.push_back("tad"); l3.push_back("den");
l3.push_back("rex"); l3.push_back("pee");
string b4 = "magic", e4 = "pearl";
vector<string> l4;
string strs[20] = {"magic","manic","mania","maria","marta","maris","marty","paris","marks","party",
"marry","parks","parry","merry","perks","perry","peaks","peary","pears","pearl"};
for(int i = 0; i < 20; i++){
l4.push_back(strs[i]);
}
string b5 = "qa", e5 = "sq";
vector<string> l5;
string qq[95] = {"si","go","se","cm","so","ph","mt","db","mb","sb",
"kr","ln","tm","le","av","sm","ar","ci","ca","br",
"ti","ba","to","ra","fa","yo","ow","sn","ya","cr",
"po","fe","ho","ma","re","or","rn","au","ur","rh",
"sr","tc","lt","lo","as","fr","nb","yb","if","pb",
"ge","th","pm","rb","sh","co","ga","li","ha","hz",
"no","bi","di","hi","qa","pi","os","uh","wm","an",
"me","mo","na","la","st","er","sc","ne","mn","mi",
"am","ex","pt","io","be","fm","ta","tb","ni","mr",
"pa","he","lr","sq","ye"};
for(int i = 0; i < 95; i++){
l5.push_back(qq[i]);
}
Solution s;
print(s.findLadders(b1,e1,l1));
print(s.findLadders(b2,e2,l2));
print(s.findLadders(b3,e3,l3));
print(s.findLadders(b4,e4,l4));
print(s.findLadders(b5,e5,l5));
return 0;
}
正确的输出如下:
hit hot dot dog cog
hit hot lot log cog
red ted tex tax
red rex tex tax
red ted tad tax
magic manic mania maria marta marty party parry perry peary pearl
magic manic mania maria marta marty marry parry perry peary pearl
magic manic mania maria marta marty marry merry perry peary pearl
magic manic mania maria maris paris parks perks peaks pears pearl
magic manic mania maria maris marks parks perks peaks pears pearl
qa ca cm sm sq
qa fa fm sm sq
qa ta tm sm sq
qa pa pm sm sq
qa ca ci si sq
qa ba bi si sq
qa ma mi si sq
qa ha hi si sq
qa na ni si sq
qa la li si sq
qa ta ti si sq
qa pa pi si sq
qa ca cr sr sq
qa ba br sr sq
qa fa fr sr sq
qa ma mr sr sq
qa la lr sr sq
qa ca co so sq
qa ya yo so sq
qa ma mo so sq
qa ga go so sq
qa ha ho so sq
qa na no so sq
qa la lo so sq
qa ta to so sq
qa pa po so sq
qa ba be se sq
qa ra re se sq
qa fa fe se sq
qa ya ye se sq
qa ma me se sq
qa ga ge se sq
qa ha he se sq
qa na ne se sq
qa la le se sq
qa ra rn sn sq
qa ma mn sn sq
qa la ln sn sq
qa ra rh sh sq
qa ta th sh sq
qa pa ph sh sq
qa ra rb sb sq
qa ya yb sb sq
qa ma mb sb sq
qa na nb sb sq
qa ta tb sb sq
qa pa pb sb sq
qa ma mt st sq
qa la lt st sq
qa pa pt st sq
qa ta tc sc sq