Java通过Jsoup解析Html

最新推荐文章于 2024-04-23 17:29:36 发布

江上一小白

最新推荐文章于 2024-04-23 17:29:36 发布

阅读量2.3k

点赞数 2

分类专栏：随笔工具文章标签： java html 开发语言

本文链接：https://blog.csdn.net/qq_43429919/article/details/123007517

版权

随笔同时被 2 个专栏收录

10 篇文章 1 订阅

订阅专栏

工具

5 篇文章 0 订阅

订阅专栏

Java通过Jsoup解析Html

从HTML文件中解析需要的数据，通常使用正则匹配可以实现，也可以使用Jsoup实现

Jsoup官网：https://jsoup.org/

maven

maven项目在pom.xml中引入jsoup依赖包

        <dependency>
            <!-- jsoup HTML parser library @ https://jsoup.org/ -->
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.14.3</version>
        </dependency>

简单使用

jsoup不仅可以处理HTML页面文件或内容，同时也支持HTMl的字段，可以更加需要直接解析其中文本，也可以解析指定标签中的内容

从HTML中获取文本内容

所有的HTML标签都去除，文本内容通过空格拼接

    public static void testStr(){

        String html = "<ol class=\" list-paddingleft-2\" style=\"width: 758.094px; white-space: normal;\">" +
                "<li><p>第一段</p></li>" +
                "<li><p>第二段</p></li>" +
                "<li><p>第三段</p></li></ol>" +
                "<ul class=\" list-paddingleft-2\" style=\"width: 758.094px; white-space: normal;\">" +
                "<li><p>111</p></li>" +
                "<li><p>222</p></li>" +
                "<li><p>333</p></li>" +
                "<li><p>444<br/></p></li>" +
                "</ul><p><br/></p>";
        Document document = Jsoup.parse(html);
        String resultView = document.text();
        System.out.println("resultView：" + resultView);
    }

解析出文本内容

    resultView：第一段 第二段 第三段 111 222 333 444

从HTML中获取指定标签内容

    public static void testFile(){
        File file = new File("D:\\MyProject\\fileStorage\\response.html");
        try {
            Document document = Jsoup.parse(file, "utf-8");
            /* 解析内容
           <td width="670">
                <table class="resultView" width="95%" height="60" align="center" cellspacing="0">
                    <tbody class="resultTBody">
                        <tr align="center">
                            <td width="20%">
                                抱歉！没有查询到相关记录。
                            </td>
                        </tr>
                    </tbody>
                </table>
            </td>
            */

            // 获取 class="resultView" 标签中内容
            String resultView = document.select("table[class=resultView]").html();
            System.out.println("resultView：" + resultView);

            String text = document.select("table[class=resultView]").text();
            System.out.println("text：" + text);

        } catch (IOException e) {
            System.out.println(e);
        }
    }

解析出文本内容

resultView：<tbody class="resultTBody"> 
 <tr align="center"> 
  <td width="20%"> 抱歉！没有查询到相关记录。 </td> 
 </tr> 
</tbody>

text：抱歉！没有查询到相关记录。

江上一小白

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
Java通过Jsoup解析Html

Java通过Jsoup解析Html从HTML文件中解析需要的数据，通常使用正则匹配可以实现，也可以使用Jsoup实现Jsoup官网：https://jsoup.org/mavenmaven项目在pom.xml中引入jsoup依赖包 <dependency>  <groupId>org.jsou
复制链接

扫一扫