jsoup 获取html中body内容,JSoup - 通过标签解析HTML标签

I'm actually developping a text parser in Java and I was asked to enhance it by parsing HTML with it.

The parser's purpose is to divide the file parsed into 3 other files, one with all the words contained in the file, one with all sentences and the other with all questions.

The *.txt part works perfectly, but I got a problem when parsing HTML.

I create a temporary file with *.txt extension and pass it in my text parser, but if I pass an URL with HTML file linked which is formed like this:

... some HTML here ...

  • n1
  • n2
  • n2

This is a question ?

This is a sentence .

... some other text ...

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值