单词分割匹配字典

CodeHuba

已于 2022-02-12 14:15:51 修改

阅读量6.8k

点赞数

分类专栏： leetcode 文章标签： leetcode 算法职场和发展

于 2022-02-12 10:49:54 首次发布

本文链接：https://blog.csdn.net/huba_yosa/article/details/122893419

版权

leetcode 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

问题

给定一个字符串s和一个单词字典dict，确定s是否可以被分割成一个或多个字典单词的空格分隔序列。例如，给定s =“leetcode”， dict = [“leet”， “code”]。返回true，因为"leetcode"可以被分割为“leet code”。

解决方案

1. 朴素方法

这个问题可以通过使用一种简单的方法来解决。讨论总是可以从这开始。

public static boolean wordBreak1(String s, Set<String> dict) {
    return wordBreak1(s, dict, 0);
}

// 参数start指的是字符串的字符索引
public static boolean wordBreak1(String s, Set<String> dict, int start) {
    // 分割到末尾 视为成功
    if (start == s.length()) {
        return true;
    }
    // 遍历字典
    for (String a : dict) {
        int len = a.length();
        int end = len + start;
        if (end > s.length()) {
            continue;
        }
        if (s.substring(start, start + len).equals(a)) {
            // 对剩余的子串进行字典匹配
            if (wordBreak1(s, dict, start + len)) {
                return true;
            }
        }
    }
    return false;
}

时间:O (n2)，此解决方案超出时间限制。

2. 动态规划法

采用动态规划方法解决该问题的关键是:

定义一个数组t[]，使t[i]==true =>0-(i-1)可以使用字典进行分段；
初始状态 t[0]==true；

public static boolean wordBreak2(String s, Set<String> dict) {
    boolean[] t = new boolean[s.length() + 1];
    t[0] = true;

    for (int i = 0; i < s.length(); i++) {
        if (!t[i]) {
            continue;
        }
        // 只有t[i]=true时继续往下匹配 否则跳过
        for (String a : dict) {
            int len = a.length();
            int end = i + len;
            if (end > s.length()) {
                continue;
            }
            if (t[end]) {
                continue;
            }
            if (s.substring(i, end).equals(a)) {
                t[end] = true;
            }
        }
    }
    return t[s.length()];
}

时间:O(字符串长度*字典大小)

这个解决方案中一个棘手的部分是:我们应该找到所有可能的匹配，而不是停留在“程序目标”。

3. 正则表达式

这个问题应该等同于匹配正则表达式 (leet|code)*，这意味着它可以通过在O(2^m)中构建一个DFA并在O(n)中执行来解决。Leetcode在线评判不允许使用Pattern类。

public static void main(String[] args) {
    String s = "leetcode";
    Set<String> dict = new HashSet<>();
    dict.add("leets");
    dict.add("leet");
    dict.add("code");

    StringBuilder sb = new StringBuilder();
    for (String str : dict) {
        sb.append(str + "|");
    }
    String pattern = sb.toString().substring(0, sb.length() - 1);
    pattern = "(" + pattern + ")*";
    Pattern p = Pattern.compile(pattern);
    // 正则表达式 (leets|leet|code)*
    Matcher m = p.matcher("leetcode");
    boolean matches = m.matches();
    if (m.matches()) {
        System.out.println("true");
    }
}

问题

解决方案

1. 朴素方法

这个问题可以通过使用一种简单的方法来解决。讨论总是可以从这开始。

public static boolean wordBreak1(String s, Set<String> dict) {
    return wordBreak1(s, dict, 0);
}

// 参数start指的是字符串的字符索引
public static boolean wordBreak1(String s, Set<String> dict, int start) {
    // 分割到末尾 视为成功
    if (start == s.length()) {
        return true;
    }
    // 遍历字典
    for (String a : dict) {
        int len = a.length();
        int end = len + start;
        if (end > s.length()) {
            continue;
        }
        if (s.substring(start, start + len).equals(a)) {
            // 对剩余的子串进行字典匹配
            if (wordBreak1(s, dict, start + len)) {
                return true;
            }
        }
    }
    return false;
}

时间:O (n2)，此解决方案超出时间限制。

2. 动态规划法

采用动态规划方法解决该问题的关键是:

定义一个数组t[]，使t[i]==true =>0-(i-1)可以使用字典进行分段；
初始状态 t[0]==true；

public static boolean wordBreak2(String s, Set<String> dict) {
    boolean[] t = new boolean[s.length() + 1];
    t[0] = true;

    for (int i = 0; i < s.length(); i++) {
        if (!t[i]) {
            continue;
        }
        // 只有t[i]=true时继续往下匹配 否则跳过
        for (String a : dict) {
            int len = a.length();
            int end = i + len;
            if (end > s.length()) {
                continue;
            }
            if (t[end]) {
                continue;
            }
            if (s.substring(i, end).equals(a)) {
                t[end] = true;
            }
        }
    }
    return t[s.length()];
}

时间:O(字符串长度*字典大小)

这个解决方案中一个棘手的部分是:我们应该找到所有可能的匹配，而不是停留在“程序目标”。

3. 正则表达式

这个问题应该等同于匹配正则表达式 (leet|code)*，这意味着它可以通过在O(2^m)中构建一个DFA并在O(n)中执行来解决。Leetcode在线评判不允许使用Pattern类。

public static void main(String[] args) {
    String s = "leetcode";
    Set<String> dict = new HashSet<>();
    dict.add("leets");
    dict.add("leet");
    dict.add("code");

    StringBuilder sb = new StringBuilder();
    for (String str : dict) {
        sb.append(str + "|");
    }
    String pattern = sb.toString().substring(0, sb.length() - 1);
    pattern = "(" + pattern + ")*";
    Pattern p = Pattern.compile(pattern);
    // 正则表达式 (leets|leet|code)*
    Matcher m = p.matcher("leetcode");
    boolean matches = m.matches();
    if (m.matches()) {
        System.out.println("true");
    }
}

对上面问题的进行引申，抛出一个新的问题。

问题

给定一个字符串s和一个单词字典dict，在s中添加空格来构造一个句子，其中每个单词都是字典中的有效单词，返回所有这些可能的句子。

例如，给定s = “catsanddog”， dict = [“cat”， “cats”， “and”， “sand”， “dog”]，解决方案是[“cats anddog”， “catsanddog”]。

解决方案

1. 动态规划法

这个问题与上面的单词分割非常相似。我们需要跟踪实际的单词，而不是使用布尔数组（上面的单词分割）来跟踪匹配位置。然后，我们可以使用深度优先搜索来获取所有可能的路径，即字符串列表。

下图显示了跟踪阵列的结构：

在这里插入图片描述

public static List<String> wordBreak(String s, Set<String> dict) {
        List<String> dp[] = new ArrayList[s.length() + 1];
        dp[0] = new ArrayList<>();

        for (int i = 0; i < s.length(); i++) {
            if (dp[i] == null) {
                continue;
            }

            for (String word : dict) {
                int len = word.length();
                int end = len + i;
                if (end > s.length()) {
                    continue;
                }
                if (s.substring(i, end).equals(word)) {
                    if (dp[end] == null) {
                        dp[end] = new ArrayList<String>();
                    }
                    dp[end].add(word);
                }
            }
        }
        List<String> result = new LinkedList<>();
        if (dp[s.length()] == null) {
            return result;
        }
        ArrayList<String> temp = new ArrayList<>();
        dfs(dp, s.length(), result, temp);

        return result;
    }

    public static void dfs(List<String> dp[], int end, List<String> result, ArrayList<String> tmp) {
        // 倒序打印路径
        if (end <= 0) {
            String path = tmp.get(tmp.size() - 1);
            for (int i = tmp.size() - 2; i >= 0; i--) {
                path += " " + tmp.get(i);
            }
            result.add(path);
            return;
        }
        for (String str : dp[end]) {
            tmp.add(str);
            dfs(dp, end - str.length(), result, tmp);
            // 删除以当前str的方案
            tmp.remove(tmp.size() - 1);
        }
    }

2. 测试代码

public static void main(String[] args) {
    String s = "catsanddog";
    Set<String> dict = new HashSet<>();
    dict.add("sand");
    dict.add("and");
    dict.add("cats");
    dict.add("cat");
    dict.add("dog");
    System.out.println(wordBreak(s,dict));
}

结果：

[cat sand dog, cats and dog]

进程已结束，退出代码 0

CodeHuba

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
单词分割匹配字典

问题给定一个字符串s和一个单词字典dict，确定s是否可以被分割成一个或多个字典单词的空格分隔序列。例如，给定s =“leetcode”， dict = [“leet”， “code”]。返回true，因为"leetcode"可以被分割为“leet code”。解决方案1. 朴素方法这个问题可以通过使用一种简单的方法来解决。讨论总是可以从这开始。public static boolean wordBreak1(String s, Set<String> dict) { ..
复制链接

扫一扫