正则表达式

最新推荐文章于 2022-05-25 11:03:15 发布

A-bird

最新推荐文章于 2022-05-25 11:03:15 发布

阅读量151

点赞数

分类专栏： Java基础文章标签：正则表达式

本文链接：https://blog.csdn.net/weixin_44912013/article/details/108936967

版权

Java基础专栏收录该内容

20 篇文章 0 订阅

订阅专栏

正则表达式的详细用法：正则表达式

正则表达式

- 实例一：抓取网页中的email地址(爬虫)
- 实例二：代码统计小程序

实例一：抓取网页中的email地址(爬虫)

假设我们手头上有一些优质的资源, 打算分享给网友, 于是便到贴吧上发出一个留邮箱发资源的帖子. 没想到网友热情高涨, 留下了近百个邮箱. 但逐个复制发送太累了, 我们考虑用程序实现.
这里不展开讲发邮件部分, 重点应用已经学到的正则表达式从网页中截取所有的邮箱地址.
首先获取一个帖子的html代码随便找了一个, 点击跳转, 在浏览器中点击右键保存html文件
接下来看代码:

public class Demo12 {
    public static void main(String[] args) {
        BufferedReader br = null;
        try {
            br = new BufferedReader(new FileReader("C:\\emailTest.html"));
            String line = "";
            while((line = br.readLine()) != null){//读取文件的每一行
                parse(line);//解析其中的email地址
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
            if(br != null){
                try {
                    br.close();
                    br = null;
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    private static void parse(String line){
        Pattern p = Pattern.compile("[\\w[.-]]+@[\\w[.-]]+\\.[\\w]+");
        Matcher m = p.matcher(line);
        while(m.find()){
            System.out.println(m.group());
        }
    }
}
//输出结果
2819531636@qq.com
2819531636@qq.com
2405059759@qq.com
2405059759@qq.com
1013376804@qq.com
...

解析正则表达式：“[\w[.-]]+”，\w匹配数字、字母和下划线，[.-]匹配点和下划线，[]表示里面只有一个字符，里面的内容用来限制字符的范围，+表示字符出现一次或者多次，”@“匹配”@“。

实例二：代码统计小程序

统计一个项目中一共有多少行代码, 多少行注释, 多少个空白行. 不妨对自己做过的项目进行统计
下面是具体的代码, 除了判断空行用了正则表达式外, 判断代码行和注释行用了String类的api

public class Demo13 {
    private static long codeLines = 0;
    private static long commentLines = 0;
    private static long whiteLines = 0;
    private static String filePath = "C:\\TankOnline";

    public static void main(String[] args) {
        process(filePath);
        System.out.println("codeLines : " + codeLines);
        System.out.println("commentLines : " + commentLines);
        System.out.println("whiteLines : " + whiteLines);
    }

    /**
     * 递归查找文件
     * @param pathStr
     */
    public static void process(String pathStr){
        File file = new File(pathStr);
        if(file.isDirectory()){//是文件夹则递归查找
            File[] fileList = file.listFiles();
            for(File f : fileList){
                String fPath = f.getAbsolutePath();
                process(fPath);
            }
        }else if(file.isFile()){//是文件则判断是否是.java文件
            if(file.getName().matches(".*\\.java$")){
                parse(file);
            }
        }
    }

    private static void parse(File file) {
        BufferedReader br = null;
        try {
            br = new BufferedReader(new FileReader(file));
            String line = "";
            while((line = br.readLine()) != null){
                line = line.trim();//清空每行首尾的空格
                if(line.matches("^[\\s&&[^\\n]]*$")){//注意不是以\n结尾, 因为在br.readLine()会去掉\n
                    whiteLines++;
                }else if(line.startsWith("/*") || line.startsWith("*") || line.endsWith("*/")){
                    commentLines++;
                }else if(line.startsWith("//") || line.contains("//")){
                    commentLines++;
                }else{
                    if(line.startsWith("import") || line.startsWith("package")){//导包不算
                        continue;
                    }
                    codeLines++;
                }
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if(null != br){
                try {
                    br.close();
                    br = null;
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
}
//输出结果
codeLines : 1139
commentLines : 124
whiteLines : 172

贪婪模式、非贪婪模式和独占模式：

Greedy quantifiers 贪婪模式
X?	X, once or not at all
X*	X, zero or more times
X+	X, one or more times
X{n}	X, exactly n times
X{n,}	X, at least n times
X{n,m}	X, at least n but not more than m times
 
Reluctant quantifiers 非贪婪模式(勉强的, 不情愿的)
X??	X, once or not at all
X*?	X, zero or more times
X+?	X, one or more times
X{n}?	X, exactly n times
X{n,}?	X, at least n times
X{n,m}?	X, at least n but not more than m times
 
Possessive quantifiers  独占模式
X?+	X, once or not at all
X*+	X, zero or more times
X++	X, one or more times
X{n}+	X, exactly n times
X{n,}+	X, at least n times
X{n,m}+	X, at least n but not more than m times

具体例子见原文！

A-bird

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
正则表达式

正则表达式的详细用法：正则表达式正则表达式实例一：抓取网页中的email地址(爬虫)实例2: 代码统计小程序实例一：抓取网页中的email地址(爬虫)假设我们手头上有一些优质的资源, 打算分享给网友, 于是便到贴吧上发出一个留邮箱发资源的帖子. 没想到网友热情高涨, 留下了近百个邮箱. 但逐个复制发送太累了, 我们考虑用程序实现.这里不展开讲发邮件部分, 重点应用已经学到的正则表达式从网页中截取所有的邮箱地址.首先获取一个帖子的html代码随便找了一个, 点击跳转, 在浏览器中点击右键保存htm
复制链接

扫一扫