任务
尝试使用joup爬取网页,摘除所有的链接和文本信息
尝试使用joup
编写Main代码
使用joup可以直接连接到URL网站,得到网站的HTML文件,从中摘出href标签的超文本,即可得到链接和链接文本,将其打印出来,就是清洗得到的结果。
public static void main(String[] args) {
try {
// 目标网站的URL
String url = "https://www.baidu.com/s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE";
// 使用Jsoup.connect()方法连接到网站,并获取Document对象
Document document = Jsoup.connect(url).get();
// 提取并打印页面标题
String title = document.title();
System.out.println("页面标题: " + title);
// 提取所有链接
Elements links = document.select("a[href]");
for (Element link : links) {
// 获取并打印每个链接的href属性
String linkHref = link.attr("href");
System.out.println("链接: " + linkHref);
String linkText = link.text();
System.out.println("链接文本: " + linkText);
}
} catch (Exception e) {
e.printStackTrace();
}
}
爬取百度资讯搜索_通义千问
页面标题: 百度资讯搜索_通义千问
链接: /
链接文本:
链接: /
链接文本: 百度首页
链接: https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F
链接文本: 登录
链接: https://www.baidu.com/s?&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 网页
链接: http://image.baidu.com/i?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&dyTabStr=MCwxLDMsMiw2LDQsNSw4LDcsOQ%3D%3D&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 图片
链接: /sf/vsearch?pd=video&tn=vsearch&lid=c1cdf4500000f24e&ie=utf-8&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&rsv_spt=7&rsv_bp=1&f=8&oq=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&rsv_pq=c1cdf4500000f24e
链接文本: 视频
链接: http://tieba.baidu.com/f?fr=wwwt&ie=utf-8&dyTabStr=MCwxLDMsMiw2LDQsNSw4LDcsOQ%3D%3D&kw=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 贴吧
链接: http://zhidao.baidu.com/q?ct=17&pn=0&tn=ikaslist&rn=10&fr=wwwt&ie=utf-8&dyTabStr=MCwxLDMsMiw2LDQsNSw4LDcsOQ%3D%3D&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 知道
链接: http://wenku.baidu.com/search?lm=0&od=0&ie=utf-8&dyTabStr=MCwxLDMsMiw2LDQsNSw4LDcsOQ%3D%3D&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 文库
链接: https://b2b.baidu.com/s?fr=wwwt&q=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 采购
链接: https://map.baidu.com/?newmap=1&ie=utf-8&from=pstab&s=s%26wd%3D%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 地图
链接: http://www.baidu.com/more/
链接文本: 更多
链接: https://top.baidu.com/board?platform=pc&sa=pcindex_a_right
链接文本:
链接: javascript:void(0);
链接文本: 换一换
链接文本: 派出所成了最放心的晚托班
链接: https://baijiahao.baidu.com/s?id=1800849270801169189&wfr=spider&for=pc
链接文本: 两个企业样本透见AI杭州发展新趋势
链接: https://baijiahao.baidu.com/s?id=1800849270801169189&wfr=spider&for=pc
链接文本:
链接: https://baijiahao.baidu.com/s?id=1800849270801169189&wfr=spider&for=pc
链接文本: 杭州网
链接: https://baijiahao.baidu.com/s?id=1800352528510752915&wfr=spider&for=pc
链接文本: 通义大模型降价不到一周,有头部企业调用量翻了100倍
链接: https://baijiahao.baidu.com/s?id=1800352528510752915&wfr=spider&for=pc
链接文本: 齐鲁壹点
链接: https://baijiahao.baidu.com/s?id=1798567672571266931&wfr=spider&for=pc
链接文本: 阿里云通义千问APP更名为通义APP,免费开放全栈能力
链接: https://baijiahao.baidu.com/s?id=1798567672571266931&wfr=spider&for=pc
链接文本:
链接: https://baijiahao.baidu.com/s?id=1798567672571266931&wfr=spider&for=pc
链接文本: 大象新闻
链接: https://baijiahao.baidu.com/s?id=1799910206296311247&wfr=spider&for=pc
链接文本: 百词斩等已接入通义千问,四川大模型调用量年内有望增长数十倍
链接: https://baijiahao.baidu.com/s?id=1799910206296311247&wfr=spider&for=pc
链接文本:
链接: https://baijiahao.baidu.com/s?id=1799910206296311247&wfr=spider&for=pc
链接文本: 上游新闻
链接: https://baijiahao.baidu.com/s?id=1799559339118108439&wfr=spider&for=pc
链接文本: 「数字风洞」AI大模型安全测评内容安全篇丨通义千问Qwen-72B(开源...
链接: https://baijiahao.baidu.com/s?id=1799559339118108439&wfr=spider&for=pc
链接文本:
链接: https://baijiahao.baidu.com/s?id=1799559339118108439&wfr=spider&for=pc
链接文本: 中国发展网
链接: https://baijiahao.baidu.com/s?id=1800808122385550311&wfr=spider&for=pc
链接文本: 中文大模型排位赛开打,阿里百度腾讯等20款国产大模型角逐“最强...
链接: https://baijiahao.baidu.com/s?id=1800808122385550311&wfr=spider&for=pc
链接文本:
链接: https://baijiahao.baidu.com/s?id=1800808122385550311&wfr=spider&for=pc
链接文本: 新闻晨报
链接: https://baijiahao.baidu.com/s?id=1800463164169100381&wfr=spider&for=pc
链接文本: 通义千问助力精准学打造多模态教育大模型,将发布首个AI辅学机
链接: https://baijiahao.baidu.com/s?id=1800463164169100381&wfr=spider&for=pc
链接文本:
链接: https://baijiahao.baidu.com/s?id=1800463164169100381&wfr=spider&for=pc
链接文本: 钱江晚报
链接: https://baijiahao.baidu.com/s?id=1799654301983578368&wfr=spider&for=pc
链接文本: 通义千问GPT-4级主力模型降价97%
链接: https://baijiahao.baidu.com/s?id=1799654301983578368&wfr=spider&for=pc
链接文本:
链接: https://baijiahao.baidu.com/s?id=1799654301983578368&wfr=spider&for=pc
链接文本: 杭州日报
链接: https://baijiahao.baidu.com/s?id=1800707177433003344&wfr=spider&for=pc
链接文本: 西湖区:争当长三角政务服务改革城区样板
链接: https://baijiahao.baidu.com/s?id=1800707177433003344&wfr=spider&for=pc
链接文本:
链接: https://baijiahao.baidu.com/s?id=1800707177433003344&wfr=spider&for=pc
链接文本: 钱江晚报
链接: https://baijiahao.baidu.com/s?id=1798572768798687802&wfr=spider&for=pc
链接文本: 阿里云发布通义千问2.5
链接: https://baijiahao.baidu.com/s?id=1798572768798687802&wfr=spider&for=pc
链接文本: 光明网
链接: http://www.baidu.com/s?rsv_xinwen=1&wd=%CD%A8%D2%E5%C7%A7%CE%CA
链接文本: 去网页搜:通义千问
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AEapp%E4%B8%8B%E8%BD%BD%E5%AE%98%E7%BD%91%E6%9C%80%E6%96%B0%E7%89%88&rsv_dl=news_b_rs
链接文本: 通义千问app下载官网最...
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE%E5%9C%A8%E5%93%AA%E9%87%8C%E6%89%93%E5%BC%80&rsv_dl=news_b_rs
链接文本: 通义千问在哪里打开
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE%E4%B8%8B%E8%BD%BDapp%E5%85%8D%E8%B4%B9%E5%AE%89%E8%A3%85&rsv_dl=news_b_rs
链接文本: 通义千问下载app免费安装
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E7%81%B5%E7%A0%81&rsv_dl=news_b_rs
链接文本: 通义灵码
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E7%99%BE%E5%BA%A6ai%E6%99%BA%E8%83%BD%E9%97%AE%E7%AD%94%E5%9C%A8%E7%BA%BF&rsv_dl=news_b_rs
链接文本: 百度ai智能问答在线
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE%E8%B7%B3%E8%88%9E%E6%80%8E%E4%B9%88%E5%BC%84&rsv_dl=news_b_rs
链接文本: 通义千问跳舞怎么弄
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%90%AC%E6%82%9F&rsv_dl=news_b_rs
链接文本: 通义听悟
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E7%99%BE%E5%BA%A6ai%E6%96%87%E5%BF%83%E4%B8%80%E8%A8%80%E5%AE%98%E7%BD%91&rsv_dl=news_b_rs
链接文本: 百度ai文心一言官网
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE%E5%92%8C%E6%96%87%E5%BF%83%E4%B8%80%E8%A8%80%E7%9A%84%E5%8C%BA%E5%88%AB&rsv_dl=news_b_rs
链接文本: 通义千问和文心一言的区别
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E5%95%86%E6%B1%A4%E7%A7%91%E6%8A%80&rsv_dl=news_b_rs
链接文本: 商汤科技
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=10
链接文本: 2
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=20
链接文本: 3
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=30
链接文本: 4
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=40
链接文本: 5
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=50
链接文本: 6
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=60
可以看到,虽然有很多干扰信息,但是总的来说是爬取成功了,得到了想要的URL
爬取知乎
<!doctype html>
<html lang="zh-cn">
<head>
<meta charset="UTF-8">
<script></script>
</head>
<body>
<script>var a=['aXNBcnJheQ==','bGVuZ3Ro','W29iamVjdCBBcmd1bWVudHNd','SW52YWxpZCBhdHRlbXB0IHRvIHNwcmVhZCBub24taXRlcmFibGUgaW5zdGFuY2U=','bm93','c2xpY2U=','cG9w','am9pbg==','dW5zaGlmdA==','Y29uY2F0','c3RyaW5n','c2hpZnQ=','Y2hhckNvZGVBdA==','bnVtYmVy','ZnJvbUNoYXJDb2Rl','QUJkNVdBQUFTQUFBU0FBQVRnQUJWQUFBVGdBQ1VnQUFXQUFEVGdBQlZFQUFUZ0FDVUVnQUd3UUFTTUFBU01BQVRnQUJWQUFBVGdBQ1VnQUFXQUFFVGdBQlZFQUFUZ0FDVUVnQUd3UUFTTUFBU01BQVRnQUJWQUFBVGdBQ1VnQUFFQUFBQ0FBQkNJQVBHd1FSRXdBQkRBQUJFQUFDREFBQkVBQUREQUFCRUFBRURBQUJFQUFGREFBQkVBQUdEQUFCRUFBSERBQUJFQUFJREFBQkVBQUpXQUFGU0FBQVRnQUdWQUFBVUFBQUdBQUtXUUFIR3dRSk13QUdPQUVpQ0FBQkNJQU9Hd1FSU01BQUV3QUtEQUFDRElBS0d3UVRTTUFBRXdBQ09BRWxUZ0FJVkFBQVRnQUpRQUFCRXdBTFRnQUlWQUFBVGdBSlFBQUJFd0FNVGdBSVZBQUFUZ0FKUUFBQkV3QU5EQUFMRElBTUd3UUFEQUFORzJBQUV3QU9EQUFMQ0lBREd3UUNFd0FQREFBT0RJQVBHd1FGTXdBR09BSEdDQUFCQ0lBTkd3UVJTTUFBRXdBS0RBQUNESUFLR3dRVFNNQUFFd0FDT0F
...(省略)
知乎使用上面的原代码没有得到任何输出,只好直接输出HTML试试,结果发现,采取了反爬从措施,无法取得结果。
爬取CSDN
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="referrer" content="always">
<meta name="report" content="{"spm":"1018.2226","disabled":"true"}">
<meta name="csdn-baidu-search" content="{"keyword":""}">
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0;">
<link rel="icon" href="https://csdnimg.cn/public/favicon.ico">
<title></title>
<script src="https://csdnimg.cn/public/common/libs/jquery/jquery-1.9.1.min.js"></script>
<script src="https://g.csdnimg.cn/common/csdn-report/report.js"></script>
<script src="https://g.csdnimg.cn/baidu-search/1.0.12/baidu-search.js"></script>
<script>var CFG = {
API_URL: '//so.csdn.net/so/',
js_insert_first: true,
js_insert_count: 0
}</script>
<style>.hiddenToolbar {
display: none !important;
}</style>
<link href="https://csdnimg.cn/release/searchv2-fe/css/chunk-3f347618.b0ddc6ee.css" rel="prefetch">
<link href="https://csdnimg.cn/release/searchv2-fe/css/chunk-41bc631b.c129fb20.css" rel="prefetch">
<link href="https://csdnimg.cn/release/searchv2-fe/css/chunk-4f49e98c.7f76accb.css" rel="prefetch">
<link href="https://csdnimg.cn/release/searchv2-fe/css/chunk-6ff9512d.429545ad.css" rel="prefetch">
<link href="https://csdnimg.cn/release/searchv2-fe/js/chunk-3f347618.3a30234e.js" rel="prefetch">
<link href="https://csdnimg.cn/release/searchv2-fe/js/chunk-41bc631b.b1475de8.js" rel="prefetch">
<link href="https://csdnimg.cn/release/searchv2-fe/js/chunk-4f49e98c.9fda438d.js" rel="prefetch">
<link href="https://csdnimg.cn/release/searchv2-fe/js/chunk-6ff9512d.41a70fc7.js" rel="prefetch">
<link href="https://csdnimg.cn/release/searchv2-fe/css/element-ui.6b92dc4c.css" rel="preload" as="style">
<link href="https://csdnimg.cn/release/searchv2-fe/css/highlight.9276efd2.css" rel="preload" as="style">
<link href="https://csdnimg.cn/release/searchv2-fe/css/index.183186f5.css" rel="preload" as="style">
<link href="https://csdnimg.cn/release/searchv2-fe/js/element-ui.1410c515.js" rel="preload" as="script">
<link href="https://csdnimg.cn/release/searchv2-fe/js/highlight.6f38c3f5.js" rel="preload" as="script">
<link href="https://csdnimg.cn/release/searchv2-fe/js/index.1fc42b3e.js" rel="preload" as="script">
<link href="https://csdnimg.cn/release/searchv2-fe/css/element-ui.6b92dc4c.css" rel="stylesheet">
<link href="https://csdnimg.cn/release/searchv2-fe/css/highlight.9276efd2.css" rel="stylesheet">
<link href="https://csdnimg.cn/release/searchv2-fe/css/index.183186f5.css" rel="stylesheet">
</head>
<body style="position: relative;">
<noscript>
<strong>We're sorry but search-fe-v2 doesn't work properly without JavaScript enabled. Please enable it to continue.</strong>
</noscript>
<div id="app"></div>
<script src="https://g.csdnimg.cn/common/csdn-login-box/csdn-login-box.js"></script>
<script src="https://g.csdnimg.cn/user-ordercart/3.0.1/user-ordercart.js"></script>
<script src="https://g.csdnimg.cn/lib/qrcode/1.0.0/qrcode.min.js"></script>
<script src="https://g.csdnimg.cn/user-ordertip/5.0.4_so_v2/user-ordertip.js"></script>
<script>const header = document.createElement('script')
header.type = 'text/javascript'
header.prod = 'so'
header.skin = 'black'
header.domain = '//so.csdn.net/so/'
if (
location.pathname.includes('/chat') ||
location.pathname.includes('/so/ai') ||
location.pathname.includes('/so/ask')
) {
// PC端显示C知道自己的toolbar
if (
navigator.userAgent.match(/(iPhone|iPod|Android|ios|iOS|iPad|Backerry|WebOS|Symbian|Windows Phone|Phone)/i)
) {
header.src = '//csdnimg.cn/public/common/toolbar/js/m_toolbar-2.1.2.js'
const link = document.createElement('link')
link.rel = 'stylesheet'
link.href = '//csdnimg.cn/public/common/toolbar/content_toolbar_css/m_toolbar-1.1.1.css'
document.head.appendChild(link)
// 兼容app
if (document.cookie.includes('CSDN-APP') || /csdn/i.test(window.navigator.userAgent)) {
document.body.className = 'csdn-app'
}
}
} else {
header.src = 'https://g.csdnimg.cn/common/csdn-toolbar/csdn-toolbar.js'
}
document.body.appendChild(header)</script>
<script>// 判断是不是ie浏览器
if (!!window.ActiveXObject || 'ActiveXObject' in window) {
// 判断是不是ie10以上
if (!/msie [6|7|8|9]/i.test(navigator.userAgent)) {
//ie10以上
if (!window.upgrade) {
window.upgrade = true
let s = document.createElement('script')
s.src = 'https://g.csdnimg.cn/browser_upgrade/1.0.2/browser_upgrade.js'
let x = document.getElementsByTagName('script')[0]
x.parentNode.insertBefore(s, x)
}
}
}</script>
<script>window.onload = function() {
if (window.csdn && typeof window.csdn.configuration_tool_parameterv === 'function') {
window.csdn.configuration_tool_parameterv({
need_change_function: function(flag) {
let c_toolbar = $('#csdn-toolbar')
let s_toolbar = $('.so-toolbar')
let advert = $('#csdn-toolbar .toolbar-advert')
if (flag === 'fixed') {
if (advert.length) advert.hide()
s_toolbar.addClass('fixed').css('top', '0px')
c_toolbar.addClass('hiddenToolbar')
} else if (flag === 'noFixed') {
if (advert.length) advert.show()
s_toolbar.removeClass('fixed')
c_toolbar.removeClass('hiddenToolbar')
}
}
})
}
}</script>
<script src="//g.csdnimg.cn/fixed-sidebar/1.1.6/fixed-sidebar.js"></script>
<script src="//g.csdnimg.cn/user-tooltip/2.4/user-tooltip.js"></script>
<script src="https://csdnimg.cn/release/searchv2-fe/js/element-ui.1410c515.js"></script>
<script src="https://csdnimg.cn/release/searchv2-fe/js/highlight.6f38c3f5.js"></script>
<script src="https://csdnimg.cn/release/searchv2-fe/js/index.1fc42b3e.js"></script>
<script src="https://csdnimg.cn/release/searchv2-fe/js/chunk-vendors.ed98c7a2.js"></script>
</body>
</html>
页面标题:
进程已结束,退出代码为 0
CSDN同样查不出结果,使用的不加载策略反爬。
结果
Joup面对没有反爬措施的网站可以提取出URL,但是一旦有反爬的措施,joup就不能使用了。
接下来继续实现这个功能,考虑使用selenium绕开URL的反爬,或者找到一些没有反爬措施的资源综合类型网站。