用Jsoup进行链接提取

最新推荐文章于 2023-07-17 16:58:52 发布

狂爵

最新推荐文章于 2023-07-17 16:58:52 发布

阅读量600

点赞数

分类专栏： Java Web 文章标签： java Jsoup

本文链接：https://blog.csdn.net/loveainitian/article/details/32336235

版权

Java Web 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

项目用到对文章内的图片和附件连接进行提取

// 检索WebContent中的图片和附件
			Document doc = Jsoup.parse(info.getWebc().getWebContent());
			// 当前页中的图片
			Elements srcLinks = doc.select("img[src]");
			String imagesPath = "";
			for (Element link : srcLinks) {
				// 剔除标签，只剩链接路径
				String imagesPaths = link.attr("src");
				String ht = imagesPaths.substring(0, 4);
				String htt = imagesPaths.substring(0, 1);
				if (!ht.equals("http") && htt.equals("/")) {
					imagesPath = imagesPaths.trim().replaceAll(ContextPath, "");
					imagesPath = imagesPath.substring(imagesPath.lastIndexOf("/") + 1);
				} else {
					imagesPath = "";
				}
				//System.out.println("---导入WebContent中的图片---" + imagesPath);
				if (!imagesPath.equals("")) {
					importCopy("/html/"+strSiteID+"/"+strColumnID+"/"+keyID+"/"+imagesPath, path, strInfoID,"/html/"+strSiteID+"/"+strColumnID+"/"+keyID+"/");
				}
			}
			// 提取所有的href连接
			String filePaths = "";
			Elements linehrefs = doc.select("a[href]");
			for (Element link : linehrefs) {
				filePaths = link.attr("href").trim().replaceAll(ContextPath, "");
				filePaths = filePaths.substring(filePaths.lastIndexOf("/") + 1);
				//System.out.println("---导入WebContent中的文件---" + filePaths);
				if (!filePaths.equals("")) {
					importCopy("/html/"+strSiteID+"/"+strColumnID+"/"+keyID+"/"+filePaths, path, strInfoID,"/html/"+strSiteID+"/"+strColumnID+"/"+keyID+"/");
				}
			}

狂爵

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用Jsoup进行链接提取

项目用到对文章内的图片和附件连接进行提取// 检索WebContent中的图片和附件 Document doc = Jsoup.parse(info.getWebc().getWebContent()); // 当前页中的图片 Elements srcLinks = doc.select("img[src]"); String imagesPath = "";
复制链接

扫一扫

专栏目录