算法----Word Search Trie

最新推荐文章于 2024-06-02 09:58:51 发布

暗夜猎手-大魔王

最新推荐文章于 2024-06-02 09:58:51 发布

阅读量472

点赞数

分类专栏：算法与数据结构

本文链接：https://blog.csdn.net/u014106644/article/details/96475895

版权

算法与数据结构专栏收录该内容

108 篇文章 11 订阅

订阅专栏

Word Search

Given a 2D board and a word, find if the word exists in the grid.

The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once.

Example:board = [ ['A','B','C','E'], ['S','F','C','S'], ['A','D','E','E'] ] Given word = "ABCCED", return true. Given word = "SEE", return true. Given word = "ABCB", return false.

给定一个字符二维数组，规定一个单词，判断单词是否可以在数组中搜索到，搜索方向为水平或者垂直。

从二维数组的每一个位置开始，进行dfs深度优先搜索。记录匹配单词索引位置，如果在搜索过程中单词匹配完毕，则说明匹配成功，如果数组某一字符与单词某一字符不匹配而此时单词还没匹配完，则说明当前数组位置没有匹配成功，则返回上一个位置，继续向其他方向进行搜索；如果数组字符与单词字符匹配，而单词没有匹配完毕，则从当前位置开始向四个方向继续dfs来匹配单词的下一个字符。

package util;

public class WordSearch {
	
	int[] dx = new int[]{-1,1,0,0};
	int[] dy = new int[]{0,0,-1,1};
	
    public boolean exist(char[][] board, String word) {
    	int m = board.length-1;
    	int n = board[0].length-1;
    	boolean[][] flag = new boolean[m+1][n+1];
    	//从数组的每一个开始dfs寻找
    	for(int i=0; i<=m; i++){
    		for(int j=0; j<=n; j++){  
    			//如果从某一个位置dfs找到则直接返回true
    			if(dfs(board, i, j, word, 0, flag, m, n))
    				return true;
    		}
    	}
    	return false;
        
    }
    
    private boolean dfs(char[][] board, int i, int j, String word, int index, boolean[][] flag, int m, int n) {
    	//index表示当前要判断word中字符索引
    	//如果不相等，则直接返回到上一个位置，从其他方向继续dfs
		if(board[i][j]!=word.charAt(index))
			return false;
		//如果当前字符匹配，并且已经匹配到word最后一个字符，说明已经找到返回true
		if(index==word.length()-1)
			return true;
		System.out.println(i + " " + j + " " + index + " " + board[i][j]);
		//记录该字符已经被访问过，避免下一位置四方向搜索时再次被访问
		flag[i][j] = true;
		//沿上下左右继续匹配word下一个字符
		//首先进行索引检查
		for(int k=0; k<4; k++){
			if(0<=i+dx[k] && i+dx[k]<=m && 0<=j+dy[k] && j+dy[k]<=n && !flag[i+dx[k]][j+dy[k]]){
				if(dfs(board, i+dx[k], j+dy[k], word, index+1, flag, m, n))
					return true;
			}
		}
		//说明从当前位置出发不可能匹配，记录当前位置未访问，继续返回上一个位置，从上一个位置沿其他方向dfs
		flag[i][j] = false;
		return false;
	}



	public static void main(String[] args) {
    	//char[][] board = new char[][]{{'A','B','C','E'},{'S','F','C','S'},{'A','D','E','E'}};
    	char[][] board = new char[][]{{'A'}};
    	String word = "A";
    	//String word = "ABCCED";
    	//String word = "ABCD";
    	//String word = "SEE";
    	System.out.println(new WordSearch().exist(board, word));
    	
	}

}

Word SearchII

Given a 2D board and a list of words from the dictionary, find all words in the board.

Each word must be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be used more than once in a word.

Example:Input: board = [ ['o','a','a','n'], ['e','t','a','e'], ['i','h','k','r'], ['i','f','l','v'] ] words = ["oath","pea","eat","rain"] Output: ["eat","oath"]

Note: All inputs are consist of lowercase letters a-z. The values of words are distinct.

给定字符数组，给定单词数组，判断单词数组中哪些存在于字符数组中。

解法一

第一种解法，利用Word Search方法，来依次判断单词数组中每一个单词是否存在于字符数组中，效率不高。

class Solution {
    int[] dx = new int[]{-1,1,0,0};
	int[] dy = new int[]{0,0,-1,1};
	
    public boolean exist(char[][] board, String word) {
    	int m = board.length-1;
    	int n = board[0].length-1;
    	boolean[][] flag = new boolean[m+1][n+1];
    	//从数组的每一个开始dfs寻找
    	for(int i=0; i<=m; i++){
    		for(int j=0; j<=n; j++){  
    			//如果从某一个位置dfs找到则直接返回true
    			if(dfs(board, i, j, word, 0, flag, m, n))
    				return true;
    		}
    	}
    	return false;
        
    }
    
    private boolean dfs(char[][] board, int i, int j, String word, int index, boolean[][] flag, int m, int n) {
    	//index表示当前要判断word中字符索引
    	//如果不相等，则直接返回到上一个位置，从其他方向继续dfs
		if(board[i][j]!=word.charAt(index))
			return false;
		//如果当前字符匹配，并且已经匹配到word最后一个字符，说明已经找到返回true
		if(index==word.length()-1)
			return true;
		//System.out.println(i + " " + j + " " + index + " " + board[i][j]);
		//记录该字符已经被访问过，避免下一位置四方向搜索时再次被访问
		flag[i][j] = true;
		//沿上下左右继续匹配word下一个字符
		//首先进行索引检查
		for(int k=0; k<4; k++){
			if(0<=i+dx[k] && i+dx[k]<=m && 0<=j+dy[k] && j+dy[k]<=n && !flag[i+dx[k]][j+dy[k]]){
				if(dfs(board, i+dx[k], j+dy[k], word, index+1, flag, m, n))
					return true;
			}
		}
		//说明从当前位置出发不可能匹配，记录当前位置未访问，继续返回上一个位置，从上一个位置沿其他方向dfs
		flag[i][j] = false;
		return false;
	}
    
    public List<String> findWords(char[][] board, String[] words) {
    	
    	List<String> ll = new ArrayList<>();
    	for(String s : words){
    		if(exist(board, s))
    			ll.add(s);
    	}
		return ll;
        
    }
}

解法二算法优化DFS+Trie

You would need to optimize your backtracking to pass the larger test. Could you stop backtracking earlier?

If the current candidate does not exist in all words' prefix, you could stop backtracking immediately. What kind of data structure could answer such query efficiently? Does a hash table work? Why or why not? How about a Trie? If you would like to learn how to implement a basic trie, please work on this problem: Implement Trie (Prefix Tree) first.

https://leetcode.com/problems/implement-trie-prefix-tree/

利用单词树，首选处理单词数组来构建单词树。

从字符数组每一个位置开始，从单词树开始搜索，根据字符数组搜索路径构成的字符串在单词树中的位置，如果在单词树中存在某一单词，则标记为已找到，如果某一字符数组搜索字符串在字典树中不存在，则说明从字符数组当前位置继续搜索已经没有必要，直接返回即可。

class Solution {
    
    class TrieNode{
		String word;
		TrieNode[] children = new TrieNode[26];
	}
	
	TrieNode root = new TrieNode();

	public void insert(String word) {
		TrieNode t = root;
		for(char w : word.toCharArray()){
			if(t.children[w-'a'] == null)
				t.children[w-'a'] = new TrieNode();				
			t = t.children[w-'a'];
		}
		t.word = word;
	}

    public List<String> findWords(char[][] board, String[] words) {
    	
    	List<String> ll = new ArrayList<>();
    	    	
    	for(String word : words){
    		insert(word);
    	}
    	
    	int m = board.length-1;
    	int n = board[0].length-1;
    	
    	TrieNode node = root;
    	
    	for(int i=0; i<=m; i++){
    		for(int j=0; j<=n; j++){
    			//System.out.println(i + " " + j);
    			dfs(node, board, i, j, ll, m, n);
    		}
    	}
    	
		return ll;
        
    }

	private void dfs(TrieNode node, char[][] board, int i, int j, List<String> ll, int m, int n) {
		if(i<0||i>m||j<0||j>n||board[i][j]==' '||node.children[board[i][j]-'a']==null)
			return;
		node = node.children[board[i][j]-'a'];
		if(node.word != null){
			ll.add(node.word);
			node.word = null;
		}
		//利用将board[i][j]置为空字符，往下一个方向dfs搜索时，避免再次重新搜索
		//利用if(board[i][j]==' ')就返回来实现
		char temp = board[i][j];
		//System.out.println(i+" "+j+" "+temp);
		board[i][j] = ' ';
		//往左右上下四个方向继续搜索
		dfs(node, board, i-1, j, ll, m, n);
		dfs(node, board, i+1, j, ll, m, n);
		dfs(node, board, i, j-1, ll, m, n);
		dfs(node, board, i, j+1, ll, m, n);
		board[i][j] = temp;
	}
}

208. Implement Trie (Prefix Tree)

Implement a trie with insert, search, and startsWith methods.

Example:Trie trie = new Trie(); trie.insert("apple"); trie.search("apple"); // returns true trie.search("app"); // returns false trie.startsWith("app"); // returns true trie.insert("app"); trie.search("app"); // returns true

Note:

You may assume that all inputs are consist of lowercase letters a-z.
All inputs are guaranteed to be non-empty strings.

构建一颗单词树，实现插入，搜索，以及前缀匹配等方法，假定所有字符串由小写字母组成。

参考地址：https://leetcode.com/problems/implement-trie-prefix-tree/solution/

单词树/前缀树的使用

Trie (we pronounce "try") or prefix tree is a tree data structure, which is used for retrieval of a key in a dataset of strings. There are various applications of this very efficient data structure such as :

1. Autocomplete

Google Suggest

Figure 1. Google Suggest in action.

2. Spell checker

Spell Checker

Figure 2. A spell checker used in word processor.

3. IP routing (Longest prefix matching)

IP Routing

Figure 3. Longest prefix matching algorithm uses Tries in Internet Protocol (IP) routing to select an entry from a forwarding table.

4. T9 predictive text

T9 Predictive Text

Figure 4. T9 which stands for Text on 9 keys, was used on phones to input texts during the late 1990s.

5. Solving word games

Boggle

Figure 5. Tries is used to solve Boggle efficiently by pruning the search space.

There are several other data structures, like balanced trees and hash tables, which give us the possibility to search for a word in a dataset of strings. Then why do we need trie? Although hash table has O(1)O(1) time complexity for looking for a key, it is not efficient in the following operations :

Finding all keys with a common prefix.
Enumerating a dataset of strings in lexicographical order.

Another reason why trie outperforms hash table, is that as hash table increases in size, there are lots of hash collisions and the search time complexity could deteriorate to O(n), where n is the number of keys inserted. Trie could use less space compared to Hash Table when storing many keys with the same prefix. In this case using trie has only O(m) time complexity, where m is the key length. Searching for a key in a balanced tree costs O(mlogn) time complexity.

关于字符串集合的搜索，尽管哈希表也可以实现高效的检索，但是字典树可以实现前缀匹配，字典序元素枚举；并且当数据量过大时，哈希表碰撞过于严重，其搜索时间可能退化为O(n)。当字符串集合含有大量的前缀时，字典树可以实现更好的性能。

Trie node structure字典树节点结构

对于每一个节点，含有两种基本属性，一个节点数组，用来表示下一个字符。假定字符串由小写字母表示，则零节点数组为26，利用字符串某一个字符c-'a'来作为数组索引可以实现字符的快速定位；一个布尔属性值，用来表示从根路径到当前节点所有字符构成的字符串是否存在于集合中，存在或者仅仅是集合汇总前缀。

Trie is a rooted tree. Its nodes have the following fields:

Maximum of R links to its children, where each link corresponds to one of RR character values from dataset alphabet. In this article we assume that R is 26, the number of lowercase latin letters.
Boolean field which specifies whether the node corresponds to the end of the key, or is just a key prefix.

Representation of a key in trie

Figure 6. Representation of a key "leet" in trie.

节点实现

class TrieNode {

    // R links to node children
    private TrieNode[] links;

    private final int R = 26;

    private boolean isEnd;

    public TrieNode() {
        links = new TrieNode[R];
    }

    public boolean containsKey(char ch) {
        return links[ch -'a'] != null;
    }
    public TrieNode get(char ch) {
        return links[ch -'a'];
    }
    public void put(char ch, TrieNode node) {
        links[ch -'a'] = node;
    }
    public void setEnd() {
        isEnd = true;
    }
    public boolean isEnd() {
        return isEnd;
    }
}

Insertion of a key to a trie插入操作

从根节点开始，依次遍历字符串的每一个字符，根据字符来确定连接数组中相应位置是否为空，如果为空，则新建一个节点，继续遍历字符串的下一个字符，直到字符串的最后一个字符，标记该节点为结束节点，表示从根路径到当前节点上的所有字符构成的字符串在集合中存在。

We insert a key by searching into the trie. We start from the root and search a link, which corresponds to the first key character. There are two cases :

A link exists. Then we move down the tree following the link to the next child level. The algorithm continues with searching for the next key character.
A link does not exist. Then we create a new node and link it with the parent's link matching the current key character. We repeat this step until we encounter the last character of the key, then we mark the current node as an end node and the algorithm finishes.

Insertion of keys into a trie

Figure 7. Insertion of keys into a trie.

实现如下：

class Trie {
    private TrieNode root;

    public Trie() {
        root = new TrieNode();
    }

    // Inserts a word into the trie.
    public void insert(String word) {
        TrieNode node = root;
        for (int i = 0; i < word.length(); i++) {
            char currentChar = word.charAt(i);
            if (!node.containsKey(currentChar)) {
                node.put(currentChar, new TrieNode());
            }
            node = node.get(currentChar);
        }
        node.setEnd();
    }
}

Complexity Analysis

Time complexity : O(m), where m is the key length.

In each iteration of the algorithm, we either examine or create a node in the trie till we reach the end of the key. This takes only m operations.

Space complexity : O(m).

In the worst case newly inserted key doesn't share a prefix with the the keys already inserted in the trie. We have to add mm new nodes, which takes us O(m)space.

Search for a key in a trie搜索

遍历字符串的某一个字符，从根节点开始遍历，如果相应位置字符节点存在，则继续匹配字符串的下一个字符；如果字符串已经匹配完毕，并且当前节点为结束标记，则说明字符串已经找到，如果为非结束状态，则说明该字符串仅为前缀，不存在；如果相应位置字符串不存在，则说明当前字符不存在；

Each key is represented in the trie as a path from the root to the internal node or leaf. We start from the root with the first key character. We examine the current node for a link corresponding to the key character. There are two cases :

A link exist. We move to the next node in the path following this link, and proceed searching for the next key character.
A link does not exist. If there are no available key characters and current node is marked as isEnd we return true. Otherwise there are possible two cases in each of them we return false :
- There are key characters left, but it is impossible to follow the key path in the trie, and the key is missing.
- No key characters left, but current node is not marked as isEnd. Therefore the search key is only a prefix of another key in the trie.

Search of a key in a trie

Figure 8. Search for a key in a trie.

实现如下：

// search a prefix or whole key in trie and
    // returns the node where search ends
    private TrieNode searchPrefix(String word) {
        TrieNode node = root;
        for (int i = 0; i < word.length(); i++) {
           char curLetter = word.charAt(i);
           if (node.containsKey(curLetter)) {
               node = node.get(curLetter);
           } else {
               return null;
           }
        }
        return node;
    }

    // Returns if the word is in the trie.
    public boolean search(String word) {
       TrieNode node = searchPrefix(word);
       return node != null && node.isEnd();
    }

Complexity Analysis

Time complexity : O(m)O(m) In each step of the algorithm we search for the next key character. In the worst case the algorithm performs mm operations.
Space complexity : O(1)O(1)

Search for a key prefix in a trie前缀匹配

The approach is very similar to the one we used for searching a key in a trie. We traverse the trie from the root, till there are no characters left in key prefix or it is impossible to continue the path in the trie with the current key character. The only difference with the mentioned above search for a key algorithm is that when we come to an end of the key prefix, we always return true. We don't need to consider the isEnd mark of the current trie node, because we are searching for a prefix of a key, not for a whole key.

Search of a key prefix in a trie

Figure 9. Search for a key prefix in a trie.

// Returns if there is any word in the trie
    // that starts with the given prefix.
    public boolean startsWith(String prefix) {
        TrieNode node = searchPrefix(prefix);
        return node != null;
    }

Complexity Analysis

Time complexity : O(m)O(m)
Space complexity : O(1)O(1)

最终实现如下：

class Trie {

    class TrieNode{
		String word;
		TrieNode[] children = new TrieNode[26];
	}
	
	TrieNode root;

	/** Initialize your data structure here. */
	public Trie() {
		root = new TrieNode();
	}

	/** Inserts a word into the trie. */
	public void insert(String word) {
		TrieNode t = root;
		for(char w : word.toCharArray()){
			if(t.children[w-'a'] == null)
				t.children[w-'a'] = new TrieNode();				
			t = t.children[w-'a'];
		}
		t.word = word;
	}

	/** Returns if the word is in the trie. */
	public boolean search(String word) {
		TrieNode t = root;
		for(char w : word.toCharArray()){
			if(t.children[w-'a'] == null) 
				return false;
			t = t.children[w-'a'];
		}
		if(word.equals(t.word))
			return true;
		return false;
	}

	/**
	 * Returns if there is any word in the trie that starts with the given
	 * prefix.
	 */
	public boolean startsWith(String prefix) {
		TrieNode t = root;
		for(char w : prefix.toCharArray()){
			if(t.children[w-'a'] == null) 
				return false;
			t = t.children[w-'a'];
		}
		return true;
	}
}

单词查找树Trie https://blog.csdn.net/u014106644/article/details/89883351

统计文本中出现次数最多的单词（字典树） https://blog.csdn.net/u014106644/article/details/84105305

暗夜猎手-大魔王

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
算法----Word Search Trie

Word SearchGiven a 2D board and a word, find if the word exists in the grid.The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or...
复制链接

扫一扫