一招解决99%小说目录生成--TXT小说目录正则匹配分割

最新推荐文章于 2023-12-12 14:46:15 发布

喜欢敲代码的一歪风

最新推荐文章于 2023-12-12 14:46:15 发布

阅读量2.2w

点赞数 26

文章标签：字符串正则表达式 java

本文链接：https://blog.csdn.net/qq_43257319/article/details/108530208

版权

第一步读小说文件

public void cutFile() throws IOException {
        //定义一个字符串用来储存读入的小说内容
        String src = "";
        //文件输入流，个人喜欢用流
        FileInputStream fis=null;
        try {
            //从指定路径读取小说
            fis=new FileInputStream("src\\main\\webapp\\static\\txt\\拜师九叔.txt");
            byte[] bt=new byte[5440];//一个页面5440字节
            int i=0;
            //for循环读取数据，保存在src中
            while((i=fis.read(bt))!=-1) {
                String temp =  new String(bt,"GBK");
                //System.out.println(temp);
                src += temp;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }finally {
            //记得关闭流
            fis.close();
        }
        cutCatlog(src);
    }

** 以上就是一些简单的JavaIO操作，没什么可说的。**

第二步编写正则表达式

写正则表达式之前先得了解一下一些规则。
到底什么是正则表达式？
在编写处理字符串的程序或网页时，经常有查找符合某些复杂规则的字符串的需要。正则表达式就是用于描述这些规则的工具。换句话说，正则表达式就是记录文本规则的代码。
常用元字符

代码	说明
.	匹配除换行符以外的任意字符
\w	匹配字母或数字或下划线
\s	匹配任意的空白符
\d	匹配数字
\b	匹配单词的开始或结束
^	匹配字符串的开始
$	匹配字符串的结束

常用限定符

代码/语法	说明
*	重复零次或更多次
+	重复一次或更多次
?	重复零次或一次
{n}	重复n次
{n,}	重复n次或更多次
{n,m}	重复n到m次

常用反义词

代码/语法	说明
\W	匹配任意不是字母，数字，下划线，汉字的字符
\S	匹配任意不是空白符的字符
\D	匹配任意非数字的字符
\B	匹配不是单词开头或结束的位置
[^x]	匹配除了x以外的任意字符
[^aeiou]	匹配除了aeiou这几个字母以外的任意字符

常用正则表达式

代码/语法	说明
[\u4e00-\u9fa5]	中文字符
\s	空格符
[1-9]	1-9的整数
[a-z]	所有小写字母
[A-Z]	所有大写字母
[s]	表示匹配s这个字符，s可以是任何字符
(\s	\n)

下面是匹配目录的正则表达式

(\s|\n)(第)([\u4e00-\u9fa5a-zA-Z0-9]{1,7})[章][^\n]{1,35}(|\n)

(\s|\n) : 章节名以空格或换行符开始
(第) ：章节名第一个字为第
([\u4e00-\u9fa5a-zA-Z0-9]{1,7}) ：第xx章中的xx,这里xx表示匹配1-7中文英文和数字，如‘ 第一千五百三十六章：茅山现状 ’
[章]：章节名中的章
[^\n]{1,35} ：表示匹配章后面1-35个非换行符的章节名字
(|\n) ：表示以换行符结尾

public void cutCatlog(String src) {
      //匹配规则
       String pest="(正文){0,1}(\\s|\\n)(第)([\\u4e00-\\u9fa5a-zA-Z0-9]{1,7})[章][^\\n]{1,35}(|\\n)";//[章节卷集部回]( )
       //替换规则
       String washpest = "(PS|ps)(.)*(|\\n)";
       //将小说内容中的PS全部替换为“”
       src = src.replaceAll(washpest,"");
       //list用来储存章节内容
       List<String> list = new ArrayList<>();
       List<String> namelist = new ArrayList<String>();
       //根据匹配规则将小说分为一章一章的，并存到list
       for (String s:src.split(pest)){
               list.add(s);
       }
       System.out.println("size"+src.split(pest).length);
       //java正则匹配
       Pattern p=Pattern.compile(pest);
       Matcher m=p.matcher(src);
       int i=1,j=1;
       //存拼接章节内容和章节名后的内容
       List<String> newlist = new ArrayList<>();
       //临时字符串
       String newstr=null;
       //循环匹配
       while (m.find()) {
           newstr="";
           //替换退格符
           String temp = m.group(0).replace(" ","").replace("\r","");
           if (i==list.size())
               break;
           //拼接章节名和内容
           newstr = temp + list.get(i);
           i++;
           newlist.add(newstr);
           //添加章节名在list,过滤干扰符号
           temp= temp.replaceAll("[（](.)*[）]","").replace("：","");
           temp = temp.replace("\\","").replace("/","").replace("|","");

           temp = temp.replace("?","").replace("*","").replaceAll("[(](.)*[)]","");
           System.out.println("j="+j+" temp="+temp+".txt");
           j++;
           namelist.add(temp.replace("\n",".txt"));
           temp = "";
       }

       //2.创建目录
       File file = new File("E:\\BookFile\\"+bookname);
       if (!file.exists()){
           file.mkdir();
       }
       String filedir = file.getPath();

       //循环生成章节TXT文件
       for(i=0;i<newlist.size();i++){
           //System.out.println("catname="+filedir+File.separator+namelist.get(i));
           //2.在目录下创建TXT文件
           StringBuffer ctl = new StringBuffer(namelist.get(i));
           String bloodbath = filedir+"\\"+ctl.append(".txt");
           //System.out.println(bloodbath);

           File book = new File(bloodbath);

           FileWriter fr = null;
           try {
               fr = new FileWriter(book);
               fr.write(newlist.get(i));
           } catch (Exception e) {
               e.printStackTrace();
           }finally {
               try {
                   fr.close();
               } catch (IOException e) {
                   e.printStackTrace();
               }
           }
       }
       }

完整代码

public void cutFile() throws IOException {
      //定义一个字符串用来储存读入的小说内容
      String src = "";
      //文件输入流，个人喜欢用流
      FileInputStream fis=null;
      try {
          //从指定路径读取小说
          fis=new FileInputStream("src\\main\\webapp\\static\\txt\\拜师九叔.txt");
          byte[] bt=new byte[5440];//一个页面5440字节
          int i=0;
          //for循环读取数据，保存在src中
          while((i=fis.read(bt))!=-1) {
              String temp =  new String(bt,"GBK");
              //System.out.println(temp);
              src += temp;
          }
      } catch (Exception e) {
          e.printStackTrace();
      }finally {
          //记得关闭流
          fis.close();
      }
      cutCatlog(src);
  }
  public void cutCatlog(String src) {
      //匹配规则
      String pest="(正文){0,1}(\\s|\\n)(第)([\\u4e00-\\u9fa5a-zA-Z0-9]{1,7})[章][^\\n]{1,35}(|\\n)";//[章节卷集部回]( )
      //替换规则
      String washpest = "(PS|ps)(.)*(|\\n)";
      //将小说内容中的PS全部替换为“”
      src = src.replaceAll(washpest,"");
      //list用来储存章节内容
      List<String> list = new ArrayList<>();
      List<String> namelist = new ArrayList<String>();
      //根据匹配规则将小说分为一章一章的，并存到list
      for (String s:src.split(pest)){
              list.add(s);
      }
      System.out.println("size"+src.split(pest).length);
      //java正则匹配
      Pattern p=Pattern.compile(pest);
      Matcher m=p.matcher(src);
      int i=1,j=1;
      //存拼接章节内容和章节名后的内容
      List<String> newlist = new ArrayList<>();
      //临时字符串
      String newstr=null;
      //循环匹配
      while (m.find()) {
          newstr="";
          //替换退格符
          String temp = m.group(0).replace(" ","").replace("\r","");
          if (i==list.size())
              break;
          //拼接章节名和内容
          newstr = temp + list.get(i);
          i++;
          newlist.add(newstr);
          //添加章节名在list,过滤干扰符号
          temp= temp.replaceAll("[（](.)*[）]","").replace("：","");
          temp = temp.replace("\\","").replace("/","").replace("|","");

          temp = temp.replace("?","").replace("*","").replaceAll("[(](.)*[)]","");
          System.out.println("j="+j+" temp="+temp+".txt");
          j++;
          namelist.add(temp.replace("\n",".txt"));
          temp = "";
      }

      //2.创建目录
      File file = new File("E:\\BookFile\\"+bookname);
      if (!file.exists()){
          file.mkdir();
      }
      String filedir = file.getPath();

      //循环生成章节TXT文件
      for(i=0;i<newlist.size();i++){
          //System.out.println("catname="+filedir+File.separator+namelist.get(i));
          //2.在目录下创建TXT文件
          StringBuffer ctl = new StringBuffer(namelist.get(i));
          String bloodbath = filedir+"\\"+ctl.append(".txt");
          //System.out.println(bloodbath);

          File book = new File(bloodbath);

          FileWriter fr = null;
          try {
              fr = new FileWriter(book);
              fr.write(newlist.get(i));
          } catch (Exception e) {
              e.printStackTrace();
          }finally {
              try {
                  fr.close();
              } catch (IOException e) {
                  e.printStackTrace();
              }
          }


      }

记得关注，点赞哦！