【问题记录】在一个文件中匹配字符出现的次数

最新推荐文章于 2022-04-14 00:39:29 发布

巴黎会飞的猪

最新推荐文章于 2022-04-14 00:39:29 发布

阅读量520

点赞数

分类专栏： JAVA Android基础文章标签： android JAVA算法文件匹配字符匹配

本文链接：https://blog.csdn.net/dakun012/article/details/79580844

版权

Android基础同时被 2 个专栏收录

28 篇文章 0 订阅

订阅专栏

JAVA

7 篇文章 0 订阅

订阅专栏

在一个文件中找出某段字符出现的次数

今天突发奇想，想到了这个问题，如题

我在网上搜到了两种答案，我先贴出来大家看下有什么问题：

第一种：读取每行去匹配

/**
     * 统计给定文件中给定字符串的出现次数
     * 
     * @param filename  文件名
     * @param word 字符串
     * @return 字符串在文件中出现的次数
     */
    public static int countWordInFile(String filename, String word) {
        int counter = 0;
        try (FileReader fr = new FileReader(filename)) {
            try (BufferedReader br = new BufferedReader(fr)) {
                String line = null;
                while ((line = br.readLine()) != null) {
                    int index = -1;
                    while (line.length() >= word.length() && (index = line.indexOf(word)) >= 0) {
                        counter++;
                        line = line.substring(index + word.length());
                    }
                }
            }
        } catch (Exception ex) {
            ex.printStackTrace();
        }
        return counter;
    }

第二种：所有字符存入StringBuilder后再去匹配

public static int count(String filename, String target)
            throws FileNotFoundException, IOException {
        FileReader fr = new FileReader(filename);
        BufferedReader br = new BufferedReader(fr);
        StringBuilder strb = new StringBuilder();
        while (true) {
            String line = br.readLine();
            if (line == null) {
                break;
            }
            strb.append(line);
        }
        String result = strb.toString();
        int count = 0;
        int index = 0;
        while (true) {
            index = result.indexOf(target, index + 1);
            if (index > 0) {
                count++;
            } else {
                break;
            }
        }
        br.close();
        return count;
    }

大家想一下各有什么问题？

第一种：问题在于如果所匹配的字符正好位于两行换行处
第二种：问题在于如果只是小文件倒不会出现什么问题，但是如果文件大小是百M千M，那势必会引起OOM

我先贴出来我的解决办法，在说明下思路：

public static int countWordInFile(String filename, String word) {
        FileReader fileReader;
        BufferedReader bufferedReader;
        int count = 0;
        try {
            fileReader = new FileReader(filename);
            bufferedReader = new BufferedReader(fileReader);
            String currentLineStr = bufferedReader.readLine();
            String nextLineStr;
            while (currentLineStr != null) {
                nextLineStr = bufferedReader.readLine();
                //当前是最后一行
                if (nextLineStr == null) {
                    count += getContainCounts(currentLineStr, word);
                    break;
                }
                //下一行字符长度小于word.length()（注意不一定是最后一行）
                if (nextLineStr.length() < word.length()) {
                    currentLineStr += nextLineStr;
                    count += getContainCounts(currentLineStr, word);
                    currentLineStr = bufferedReader.readLine();
                } else {
                    //计算两次不同值
                    int currentCount = getContainCounts(currentLineStr, word);
                    currentLineStr += nextLineStr.substring(0, word.length() - 1);
                    int currentAddNextCount = getContainCounts(currentLineStr, word);
                    //记录出现次数
                    count += currentAddNextCount;
                    if (currentCount == currentAddNextCount) {
                        currentLineStr = nextLineStr;
                    } else {
                        currentLineStr = nextLineStr.substring(word.length() - 1, nextLineStr.length());
                    }
                }

            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return count;
    }

1：首先判断是不是最后一行，如果是直接结束循环
2：判断下一行的长度是否>=匹配字符的长度，如果小于的话说明下一行不可能匹配到该字符，直接把该行的字符全部加到当前字符上去匹配，如果大于等于的话，这时候会有两种情况：第一，当前行+下一行匹配到的数量=当前行匹配到的数量，也就是说加不加下一行不影响；第二，当前行+下一行匹配到的数量>当前行单独匹配的数量，也就是说匹配字符恰巧处在两行的首位；对于第一种情况，下次匹配时可以使用整行去匹配，对于第二种，下次匹配的时候必须-上次所用掉的一部分;

解决办法如下：

/**
     * 统计给定文件中给定字符串的出现次数
     * @param filename 文件名
     * @param word     字符串
     * @return 字符串在文件中出现的次数
     */
    public static int countWordInFile(String filename, String word) {
        FileReader fileReader;
        BufferedReader bufferedReader;
        int count = 0;
        try {
            fileReader = new FileReader(filename);
            bufferedReader = new BufferedReader(fileReader);
            String currentLineStr = bufferedReader.readLine();
            String nextLineStr;
            while (currentLineStr != null) {
                nextLineStr = bufferedReader.readLine();
                //当前是最后一行
                if (nextLineStr == null) {
                    count += getContainCounts(currentLineStr, word).first;
                    break;
                }
                //下一行字符长度小于word.length()（注意不一定是最后一行）
                if (nextLineStr.length() < word.length()) {
                    currentLineStr += nextLineStr;
                    count += getContainCounts(currentLineStr, word).first;
                    currentLineStr = bufferedReader.readLine();
                } else {
                    Pair<Integer, Integer> pair = getContainCounts(currentLineStr, word);
                    //计算两次不同值
                    int currentCount = pair.first;
                    currentLineStr += nextLineStr.substring(0, word.length() - 1);
                    pair = getContainCounts(currentLineStr, word);
                    int currentAddNextCount = pair.first;
                    //记录出现次数
                    count += currentAddNextCount;
                    if (currentCount == currentAddNextCount) {
                        currentLineStr = nextLineStr;
                    } else {
                        //不可使用lastIndexOf(word) 原因大家可以自己思考下
                        int useCount = pair.second - (currentLineStr.length() - word.length() + 1);
                        currentLineStr = nextLineStr.substring(useCount, nextLineStr.length());
                    }
                }

            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return count;
    }

获得匹配数量方法

/**
     * 正则匹配出现次数
     *
     * @param source 字符
     * @param regex  需要匹配的字符
     * @return 出现次数
     */
    public static Pair<Integer, Integer> getMatchCounts(String source, String regex) {
        Pattern expression = Pattern.compile(regex);
        Matcher matcher = expression.matcher(source);
        int n = 0;
        int end = 0;
        while (matcher.find()) {
            n++;
            end = matcher.end();
        }
        return Pair.create(n, end);
    }

好了以上就是我的解决办法，写完都凌晨了，太困了，代码写的有点乱，大家凑合着看吧！
本人能力有限，如果有不对的地方还是欢迎指正，如果你有好的想法还希望不吝赐教。
写完收工，睡觉去了~~~

巴黎会飞的猪

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【问题记录】在一个文件中匹配字符出现的次数

在一个文件中找出某段字符出现的次数今天突发奇想，想到了这个问题，如题我在网上搜到了两种答案，我先贴出来大家看下有什么问题：第一种：读取每行去匹配/** * 统计给定文件中给定字符串的出现次数 * * @param filename 文件名 * @param word 字符串 * @return 字符串在文件中出现的次数 ...
复制链接

扫一扫

专栏目录