Jsoup解析html字符串

现有一段html代码如下

<p>20200310<img src="/downloadImg?id=7566876320816252412" title="835637e39dc0bdb4c29f5e1adb5528a.png" alt="835637e39dc0bdb4c29f5e1adb5528a.png"/></p><p style="line-height: 16px;"><img src="http://localhost/static/ueditor/dialogs/attachment/fileTypeImages/icon_txt.gif"/><a style="font-size:12px; color:#0066cc;" target="_blank" href="/tellEditor/previewOrDownload/6405467898840828674?&hasDownload=true" title="本地数据库连接.txt">本地数据库连接 .txt</a></p><p style="line-height: 16px;"><img src="http://localhost/static/ueditor/dialogs/attachment/fileTypeImages/icon_txt.gif"/><a style="font-size:12px; color:#0066cc;" target="_blank" href="/tellEditor/previewOrDownload/3250930489916801852?&hasDownload=true" title="工作安排计划.xls">工作安排计划.xls</a></p><p style="line-height: 16px;"><img src="http://localhost/static/ueditor/dialogs/attachment/fileTypeImages/icon_txt.gif"/><a style="font-size:12px; color:#0066cc;" target="_blank" href="/tellEditor/previewOrDownload/8833048008381252472?&hasDownload=true" title="项目开发帮助文档.docx">项目开发帮助文档.docx</a></p><p><br/></p>

需要从中取出src="/downloadImg?id=7566876320816252412",href="/tellEditor/previewOrDownload/6405467898840828674?&hasDownload=true",href="/tellEditor/previewOrDownload/3250930489916801852?&hasDownload=true"和href="/tellEditor/previewOrDownload/8833048008381252472?&hasDownload=true"中的id。想起之前自己写爬虫用过的Jsoup可以解析html,我的做法如下

		String html = "<p>20200310<img src=\"/downloadImg?id=7566876320816252412\" title=\"835637e39dc0bdb4c29f5e1adb5528a.png\" alt=\"835637e39dc0bdb4c29f5e1adb5528a.png\"/></p><p style=\"line-height: 16px;\"><img src=\"http://localhost/static/ueditor/dialogs/attachment/fileTypeImages/icon_txt.gif\"/><a style=\"font-size:12px; color:#0066cc;\" target=\"_blank\" href=\"/tellEditor/previewOrDownload/6405467898840828674?&hasDownload=true\" title=\"本地数据库连接 -自己.txt\">本地数据库连接 -自己.txt</a></p><p style=\"line-height: 16px;\"><img src=\"http://localhost/static/ueditor/dialogs/attachment/fileTypeImages/icon_txt.gif\"/><a style=\"font-size:12px; color:#0066cc;\" target=\"_blank\" href=\"/tellEditor/previewOrDownload/3250930489916801852?&hasDownload=true\" title=\"工作安排计划.xls\">工作安排计划.xls</a></p><p style=\"line-height: 16px;\"><img src=\"http://localhost/static/ueditor/dialogs/attachment/fileTypeImages/icon_txt.gif\"/><a style=\"font-size:12px; color:#0066cc;\" target=\"_blank\" href=\"/tellEditor/previewOrDownload/8833048008381252472?&hasDownload=true\" title=\"187项目开发帮助文档.docx\">187项目开发帮助文档.docx</a></p><p><br/></p>";
		Document document = Jsoup.parse(html);
		Elements imgElements = document.select("img[title]");//获取带src属性的img标签
		Elements aElements = document.select("a[href]");//获取带有href的a标签
		List<String> imgStrings = new ArrayList<String>();
		List<String> aStrings = new ArrayList<String>();
		for(Element element:imgElements) {
			String src = element.attr("src");
			imgStrings.add(src);
		}
		for(Element element:aElements) {
			String href = element.attr("href");
			aStrings.add(href);
		}
		
		for(String aString:aStrings) {
			System.out.println("附件id:"+aString.substring(aString.indexOf("d/")+2, aString.indexOf("?")));
		}
		for(String imgString:imgStrings) {
			System.out.println("图片id:"+imgString.substring(16));
		}

结果如下
在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值