在html文本中确定字符串的位置

戎码亿升

于 2022-04-07 11:54:08 发布

阅读量783

点赞数

分类专栏：开发中出现的问题

本文链接：https://blog.csdn.net/H785503444/article/details/124011254

版权

KMP算法 HTML标签模糊匹配性能优化标签处理

关键词由CSDN通过智能技术生成

开发中出现的问题专栏收录该内容

8 篇文章 0 订阅

订阅专栏

思想也是借鉴了kmp算法，但是没有它的部分匹配表，所以性能上稍差些，主要是目标字符串中含有html标签，

将源字符串向右滑动，挨个去匹配，当匹配到时，在选取第二个字符串往后匹配，

由于目标字符串中含有标签，所以每次匹配时，遇到标签需要

/**
     * 匹配字符串在文本中的位置(目标字符串是html)
     *
     * @param source
     * @param target
     */
    public static List<Integer> kmpFuzzyMatching(String source, String target) {
        char[] sourceChar = source.toCharArray();
        char[] targetChar = target.toCharArray();

        int temp = 0;
        List<Integer> pointList = new ArrayList<>();
        for (int i = 0; i < sourceChar.length; i++) {
            for (int j = temp; j < target.length(); j++) {
                //是不是标签
                if (targetChar[j] == '<') {
                    String tempStr = target.substring(j);
                    int k = tempStr.indexOf(">") + j;
                    if (k > 0) {
                        j = k;
                        continue;
                    }
                }
                //匹配到了
                if (sourceChar[i] == targetChar[j]) {
                    pointList.add(j);
                    temp = j + 1;
                    break;
                } else if (targetChar[j] == '\n'
                        || targetChar[j] == ' ') {
                    continue;
                } else {
                    //下一个字符是不是标签
                    if (targetChar[j + 1] == '<') {
                        String tempStr = target.substring(j);
                        int k = tempStr.indexOf(">") + j;
                        if (k > 0) {
                            j = k;
                            continue;
                        }
                    } else {
                        temp = j - i + 1;
                        i = -1;
                        pointList.clear();
                        break;
                    }
                }
            }
            if (pointList.size() == source.length()) {
                break;
            }
        }

        if (pointList.size() == source.length()) {
            if (source.length() == 1) {
                return Arrays.asList(pointList.get(0), pointList.get(0) + 1);
            }
            return Arrays.asList(pointList.get(0), pointList.get(pointList.size() - 1) + 1);
        }
        return Arrays.asList(-1, -1);
    }

对于结果，返回起点与终点，如果没有匹配到，就返回-1，-1。