Java如何简单使用正则表达式去除txt文本里的网站字符

最新推荐文章于 2024-07-24 04:38:06 发布

三文鱼先生

最新推荐文章于 2024-07-24 04:38:06 发布

阅读量592

点赞数 1

分类专栏：常用的工具文章标签：正则表达式 java 开发语言

本文链接：https://blog.csdn.net/qq_44717657/article/details/124928096

版权

常用的工具专栏收录该内容

24 篇文章 3 订阅

订阅专栏

Java简单使用正则去除txt文本里的网站

原因
消除这样的字符该怎么做呢？

原因

我相信大部分人的初、高中都有那么一部印象深刻的小说，或是修仙，或是玄幻，或是武侠，也有可能是其他的。还记得我初中时，去网吧包夜，总是在黎明到来之前，将裤兜子里的MP3拿出来，插上USB下载一些小说、歌曲和视频，然后听着学校里的起床广播，匆匆地往学校赶去。。。

但下载下来的小说大多质量不好，或是错别字丛生，或是文章的句子里老是穿插一些网站，很影响阅读体验。比如以下这种：
在这里插入图片描述
或是以下这种：

消除这样的字符该怎么做呢？

运用正则

处理字符串就不免要用上正则表达式了，这里就不再详细讲解正则表达式的意思了，大家可以自行百度搜索。对于以上的需要去除的字符串我用的比较简单了：

//用于去除www.xxxxx.com字符串
www.[\w\W]{4,31}.com
//用于去除 wxwxwx.xxxxxx.cxoxm的字符串
w[\w\W]w[\w\W]w[\w\W].[\w\W]{4,31}.c[\w\W]o[\w\W]m

具体实现

ClearWebInTest.class

工具类，用来去除文件中的指定字符串，并写入到新的文件中（D盘下）

/**
 * @author 三文鱼先生
 * @title
 * @description
 * @date 2022/5/23
 **/
public class ClearWebInTest {
    /*
     * @description 用于去除简单的插入字符串
     * @author 三文鱼先生
     * @date 14:59 2022/5/23 
     * @param filePath 
     * @return void
     **/
    public static void clearEasyWebString(String filePath) {
        FileInputStream inputStream = null;
        InputStreamReader inReader = null;
        FileOutputStream fileOutputStream = null;
        OutputStreamWriter outWriter = null;
        String newPath = getNameFromPath(filePath)+getNowDate();
        try {
            inputStream = new FileInputStream(filePath);
            inReader = new InputStreamReader(inputStream);
            //生成在D盘下的新文件
            fileOutputStream = new FileOutputStream("D:" + File.separator + newPath + ".txt");
            outWriter = new OutputStreamWriter(fileOutputStream);
            int len;
            char[] c = new char[1024];
            while ((len=inReader.read(c)) != -1) {
                //处理字符串 以及替换
                String regex[] = {"www.[\\w\\W]{4,31}.com" ,
                                  "w[\\w\\W]w[\\w\\W]w[\\w\\W].[\\w\\W]{4,31}.c[\\w\\W]o[\\w\\W]m"};
                String replacedStr = new String(c , 0 , len);
                for (String s : regex) {
                    //遍历所有替换规则
                    replacedStr = replacedStr.replaceAll(s , "");
                }
                //写入新的文件中
                outWriter.write(replacedStr);
            }
        }catch (FileNotFoundException exception) {
            exception.printStackTrace();
        } catch (IOException exception) {
            exception.printStackTrace();
        } finally {
            try {
                if(inputStream != null) {
                    inputStream.close();
                }
                if(inReader != null) {
                    inReader.close();
                }
                if(outWriter != null) {
                    outWriter.close();
                }
                if(fileOutputStream != null) {
                    fileOutputStream.close();
                }
            }catch (IOException exception) {
                exception.printStackTrace();
            }
        }
    }

    /*
     * @description 获取处理操作后的文件名称
     * @author 三文鱼
     * @date 15:04 2022/5/23
     * @return java.lang.String
     **/
    public static String getNowDate() {
        SimpleDateFormat simpleDateFormat = new SimpleDateFormat("yyyy-MM-dd-HH-mm-ss");
        Date date = new Date(System.currentTimeMillis());
        return simpleDateFormat.format(date);
    }

    /*
     * @description 从路径中获取文件名称
     * @author 三文鱼先生
     * @date 15:11 2022/5/23 
     * @param str
     * @return java.lang.String
     **/
    public static String getNameFromPath(String str) {
        String[] str1 = str.split("\\\\");
        String[] str2 = str1[str1.length - 1].split("\\.");
        return str2[0];
    }
}

测试

ClearWebString.class

/**
 * @author 三文鱼先生
 * @title
 * @description
 * @date 2022/5/23
 **/
public class ClearWebString {
    public static void main(String[] args) {
        String path = "F:\\学习记录\\斗破苍穹第一章.txt";
        ClearWebInTest.clearEasyWebString(path);
    }
}