项目实训-joup尝试爬取

任务

尝试使用joup爬取网页,摘除所有的链接和文本信息

尝试使用joup

编写Main代码

使用joup可以直接连接到URL网站,得到网站的HTML文件,从中摘出href标签的超文本,即可得到链接和链接文本,将其打印出来,就是清洗得到的结果。

    public static void main(String[] args) {
        try {
            // 目标网站的URL
            String url = "https://www.baidu.com/s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE";
            // 使用Jsoup.connect()方法连接到网站,并获取Document对象
            Document document = Jsoup.connect(url).get();
            // 提取并打印页面标题
            String title = document.title();
            System.out.println("页面标题: " + title);
            // 提取所有链接
            Elements links = document.select("a[href]");
            for (Element link : links) {
                // 获取并打印每个链接的href属性
                String linkHref = link.attr("href");
                System.out.println("链接: " + linkHref);
                String linkText = link.text();
                System.out.println("链接文本: " + linkText);
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

爬取百度资讯搜索_通义千问

页面标题: 百度资讯搜索_通义千问
链接: /
链接文本: 
链接: /
链接文本: 百度首页
链接: https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F
链接文本: 登录
链接: https://www.baidu.com/s?&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 网页
链接: http://image.baidu.com/i?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&ie=utf-8&dyTabStr=MCwxLDMsMiw2LDQsNSw4LDcsOQ%3D%3D&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 图片
链接: /sf/vsearch?pd=video&tn=vsearch&lid=c1cdf4500000f24e&ie=utf-8&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&rsv_spt=7&rsv_bp=1&f=8&oq=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&rsv_pq=c1cdf4500000f24e
链接文本: 视频
链接: http://tieba.baidu.com/f?fr=wwwt&ie=utf-8&dyTabStr=MCwxLDMsMiw2LDQsNSw4LDcsOQ%3D%3D&kw=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 贴吧
链接: http://zhidao.baidu.com/q?ct=17&pn=0&tn=ikaslist&rn=10&fr=wwwt&ie=utf-8&dyTabStr=MCwxLDMsMiw2LDQsNSw4LDcsOQ%3D%3D&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 知道
链接: http://wenku.baidu.com/search?lm=0&od=0&ie=utf-8&dyTabStr=MCwxLDMsMiw2LDQsNSw4LDcsOQ%3D%3D&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 文库
链接: https://b2b.baidu.com/s?fr=wwwt&q=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 采购
链接: https://map.baidu.com/?newmap=1&ie=utf-8&from=pstab&s=s%26wd%3D%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE
链接文本: 地图
链接: http://www.baidu.com/more/
链接文本: 更多
链接: https://top.baidu.com/board?platform=pc&sa=pcindex_a_right
链接文本: 
链接: javascript:void(0);
链接文本: 换一换

链接文本: 派出所成了最放心的晚托班
链接: https://baijiahao.baidu.com/s?id=1800849270801169189&wfr=spider&for=pc
链接文本: 两个企业样本透见AI杭州发展新趋势
链接: https://baijiahao.baidu.com/s?id=1800849270801169189&wfr=spider&for=pc
链接文本: 
链接: https://baijiahao.baidu.com/s?id=1800849270801169189&wfr=spider&for=pc
链接文本: 杭州网
链接: https://baijiahao.baidu.com/s?id=1800352528510752915&wfr=spider&for=pc
链接文本: 通义大模型降价不到一周,有头部企业调用量翻了100倍
链接: https://baijiahao.baidu.com/s?id=1800352528510752915&wfr=spider&for=pc
链接文本: 齐鲁壹点
链接: https://baijiahao.baidu.com/s?id=1798567672571266931&wfr=spider&for=pc
链接文本: 阿里云通义千问APP更名为通义APP,免费开放全栈能力
链接: https://baijiahao.baidu.com/s?id=1798567672571266931&wfr=spider&for=pc
链接文本: 
链接: https://baijiahao.baidu.com/s?id=1798567672571266931&wfr=spider&for=pc
链接文本: 大象新闻
链接: https://baijiahao.baidu.com/s?id=1799910206296311247&wfr=spider&for=pc
链接文本: 百词斩等已接入通义千问,四川大模型调用量年内有望增长数十倍
链接: https://baijiahao.baidu.com/s?id=1799910206296311247&wfr=spider&for=pc
链接文本: 
链接: https://baijiahao.baidu.com/s?id=1799910206296311247&wfr=spider&for=pc
链接文本: 上游新闻
链接: https://baijiahao.baidu.com/s?id=1799559339118108439&wfr=spider&for=pc
链接文本: 「数字风洞」AI大模型安全测评内容安全篇丨通义千问Qwen-72B(开源...
链接: https://baijiahao.baidu.com/s?id=1799559339118108439&wfr=spider&for=pc
链接文本: 
链接: https://baijiahao.baidu.com/s?id=1799559339118108439&wfr=spider&for=pc
链接文本: 中国发展网
链接: https://baijiahao.baidu.com/s?id=1800808122385550311&wfr=spider&for=pc
链接文本: 中文大模型排位赛开打,阿里百度腾讯等20款国产大模型角逐“最强...
链接: https://baijiahao.baidu.com/s?id=1800808122385550311&wfr=spider&for=pc
链接文本: 
链接: https://baijiahao.baidu.com/s?id=1800808122385550311&wfr=spider&for=pc
链接文本: 新闻晨报
链接: https://baijiahao.baidu.com/s?id=1800463164169100381&wfr=spider&for=pc
链接文本: 通义千问助力精准学打造多模态教育大模型,将发布首个AI辅学机
链接: https://baijiahao.baidu.com/s?id=1800463164169100381&wfr=spider&for=pc
链接文本: 
链接: https://baijiahao.baidu.com/s?id=1800463164169100381&wfr=spider&for=pc
链接文本: 钱江晚报
链接: https://baijiahao.baidu.com/s?id=1799654301983578368&wfr=spider&for=pc
链接文本: 通义千问GPT-4级主力模型降价97%
链接: https://baijiahao.baidu.com/s?id=1799654301983578368&wfr=spider&for=pc
链接文本: 
链接: https://baijiahao.baidu.com/s?id=1799654301983578368&wfr=spider&for=pc
链接文本: 杭州日报
链接: https://baijiahao.baidu.com/s?id=1800707177433003344&wfr=spider&for=pc
链接文本: 西湖区:争当长三角政务服务改革城区样板
链接: https://baijiahao.baidu.com/s?id=1800707177433003344&wfr=spider&for=pc
链接文本: 
链接: https://baijiahao.baidu.com/s?id=1800707177433003344&wfr=spider&for=pc
链接文本: 钱江晚报
链接: https://baijiahao.baidu.com/s?id=1798572768798687802&wfr=spider&for=pc
链接文本: 阿里云发布通义千问2.5
链接: https://baijiahao.baidu.com/s?id=1798572768798687802&wfr=spider&for=pc
链接文本: 光明网
链接: http://www.baidu.com/s?rsv_xinwen=1&wd=%CD%A8%D2%E5%C7%A7%CE%CA
链接文本: 去网页搜:通义千问
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AEapp%E4%B8%8B%E8%BD%BD%E5%AE%98%E7%BD%91%E6%9C%80%E6%96%B0%E7%89%88&rsv_dl=news_b_rs
链接文本: 通义千问app下载官网最...
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE%E5%9C%A8%E5%93%AA%E9%87%8C%E6%89%93%E5%BC%80&rsv_dl=news_b_rs
链接文本: 通义千问在哪里打开
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE%E4%B8%8B%E8%BD%BDapp%E5%85%8D%E8%B4%B9%E5%AE%89%E8%A3%85&rsv_dl=news_b_rs
链接文本: 通义千问下载app免费安装
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E7%81%B5%E7%A0%81&rsv_dl=news_b_rs
链接文本: 通义灵码
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E7%99%BE%E5%BA%A6ai%E6%99%BA%E8%83%BD%E9%97%AE%E7%AD%94%E5%9C%A8%E7%BA%BF&rsv_dl=news_b_rs
链接文本: 百度ai智能问答在线
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE%E8%B7%B3%E8%88%9E%E6%80%8E%E4%B9%88%E5%BC%84&rsv_dl=news_b_rs
链接文本: 通义千问跳舞怎么弄
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%90%AC%E6%82%9F&rsv_dl=news_b_rs
链接文本: 通义听悟
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E7%99%BE%E5%BA%A6ai%E6%96%87%E5%BF%83%E4%B8%80%E8%A8%80%E5%AE%98%E7%BD%91&rsv_dl=news_b_rs
链接文本: 百度ai文心一言官网
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE%E5%92%8C%E6%96%87%E5%BF%83%E4%B8%80%E8%A8%80%E7%9A%84%E5%8C%BA%E5%88%AB&rsv_dl=news_b_rs
链接文本: 通义千问和文心一言的区别
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&wd=%E5%95%86%E6%B1%A4%E7%A7%91%E6%8A%80&rsv_dl=news_b_rs
链接文本: 商汤科技
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=10
链接文本: 2
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=20
链接文本: 3
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=30
链接文本: 4
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=40
链接文本: 5
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=50
链接文本: 6
链接: /s?rtt=1&bsst=1&cl=2&tn=news&ie=utf-8&word=%E9%80%9A%E4%B9%89%E5%8D%83%E9%97%AE&x_bfe_rqs=03E8000000000000000048&x_bfe_tjscore=0.100000&tngroupname=organic_news&newVideo=12&goods_entry_switch=1&rsv_dl=news_b_pn&pn=60

可以看到,虽然有很多干扰信息,但是总的来说是爬取成功了,得到了想要的URL

爬取知乎

<!doctype html>
<html lang="zh-cn">
 <head>
  <meta charset="UTF-8">
  <script></script>
 </head>
 <body>
  <script>var a=['aXNBcnJheQ==','bGVuZ3Ro','W29iamVjdCBBcmd1bWVudHNd','SW52YWxpZCBhdHRlbXB0IHRvIHNwcmVhZCBub24taXRlcmFibGUgaW5zdGFuY2U=','bm93','c2xpY2U=','cG9w','am9pbg==','dW5zaGlmdA==','Y29uY2F0','c3RyaW5n','c2hpZnQ=','Y2hhckNvZGVBdA==','bnVtYmVy','ZnJvbUNoYXJDb2Rl','QUJkNVdBQUFTQUFBU0FBQVRnQUJWQUFBVGdBQ1VnQUFXQUFEVGdBQlZFQUFUZ0FDVUVnQUd3UUFTTUFBU01BQVRnQUJWQUFBVGdBQ1VnQUFXQUFFVGdBQlZFQUFUZ0FDVUVnQUd3UUFTTUFBU01BQVRnQUJWQUFBVGdBQ1VnQUFFQUFBQ0FBQkNJQVBHd1FSRXdBQkRBQUJFQUFDREFBQkVBQUREQUFCRUFBRURBQUJFQUFGREFBQkVBQUdEQUFCRUFBSERBQUJFQUFJREFBQkVBQUpXQUFGU0FBQVRnQUdWQUFBVUFBQUdBQUtXUUFIR3dRSk13QUdPQUVpQ0FBQkNJQU9Hd1FSU01BQUV3QUtEQUFDRElBS0d3UVRTTUFBRXdBQ09BRWxUZ0FJVkFBQVRnQUpRQUFCRXdBTFRnQUlWQUFBVGdBSlFBQUJFd0FNVGdBSVZBQUFUZ0FKUUFBQkV3QU5EQUFMRElBTUd3UUFEQUFORzJBQUV3QU9EQUFMQ0lBREd3UUNFd0FQREFBT0RJQVBHd1FGTXdBR09BSEdDQUFCQ0lBTkd3UVJTTUFBRXdBS0RBQUNESUFLR3dRVFNNQUFFd0FDT0F
                 
...(省略)

知乎使用上面的原代码没有得到任何输出,只好直接输出HTML试试,结果发现,采取了反爬从措施,无法取得结果。

爬取CSDN

<!doctype html>
<html lang="en">
 <head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge">
  <meta name="referrer" content="always">
  <meta name="report" content="{&quot;spm&quot;:&quot;1018.2226&quot;,&quot;disabled&quot;:&quot;true&quot;}">
  <meta name="csdn-baidu-search" content="{&quot;keyword&quot;:&quot;&quot;}">
  <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0;">
  <link rel="icon" href="https://csdnimg.cn/public/favicon.ico">
  <title></title>
  <script src="https://csdnimg.cn/public/common/libs/jquery/jquery-1.9.1.min.js"></script>
  <script src="https://g.csdnimg.cn/common/csdn-report/report.js"></script>
  <script src="https://g.csdnimg.cn/baidu-search/1.0.12/baidu-search.js"></script>
  <script>var CFG = {
        API_URL: '//so.csdn.net/so/',
        js_insert_first: true,
        js_insert_count: 0
      }</script>
  <style>.hiddenToolbar {
        display: none !important;
      }</style>
  <link href="https://csdnimg.cn/release/searchv2-fe/css/chunk-3f347618.b0ddc6ee.css" rel="prefetch">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/chunk-41bc631b.c129fb20.css" rel="prefetch">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/chunk-4f49e98c.7f76accb.css" rel="prefetch">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/chunk-6ff9512d.429545ad.css" rel="prefetch">
  <link href="https://csdnimg.cn/release/searchv2-fe/js/chunk-3f347618.3a30234e.js" rel="prefetch">
  <link href="https://csdnimg.cn/release/searchv2-fe/js/chunk-41bc631b.b1475de8.js" rel="prefetch">
  <link href="https://csdnimg.cn/release/searchv2-fe/js/chunk-4f49e98c.9fda438d.js" rel="prefetch">
  <link href="https://csdnimg.cn/release/searchv2-fe/js/chunk-6ff9512d.41a70fc7.js" rel="prefetch">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/element-ui.6b92dc4c.css" rel="preload" as="style">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/highlight.9276efd2.css" rel="preload" as="style">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/index.183186f5.css" rel="preload" as="style">
  <link href="https://csdnimg.cn/release/searchv2-fe/js/element-ui.1410c515.js" rel="preload" as="script">
  <link href="https://csdnimg.cn/release/searchv2-fe/js/highlight.6f38c3f5.js" rel="preload" as="script">
  <link href="https://csdnimg.cn/release/searchv2-fe/js/index.1fc42b3e.js" rel="preload" as="script">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/element-ui.6b92dc4c.css" rel="stylesheet">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/highlight.9276efd2.css" rel="stylesheet">
  <link href="https://csdnimg.cn/release/searchv2-fe/css/index.183186f5.css" rel="stylesheet">
 </head>
 <body style="position: relative;">
  <noscript>
   <strong>We're sorry but search-fe-v2 doesn't work properly without JavaScript enabled. Please enable it to continue.</strong>
  </noscript>
  <div id="app"></div>
  <script src="https://g.csdnimg.cn/common/csdn-login-box/csdn-login-box.js"></script>
  <script src="https://g.csdnimg.cn/user-ordercart/3.0.1/user-ordercart.js"></script>
  <script src="https://g.csdnimg.cn/lib/qrcode/1.0.0/qrcode.min.js"></script>
  <script src="https://g.csdnimg.cn/user-ordertip/5.0.4_so_v2/user-ordertip.js"></script>
  <script>const header = document.createElement('script')
      header.type = 'text/javascript'
      header.prod = 'so'
      header.skin = 'black'
      header.domain = '//so.csdn.net/so/'
      if (
        location.pathname.includes('/chat') ||
        location.pathname.includes('/so/ai') ||
        location.pathname.includes('/so/ask')
      ) {
        // PC端显示C知道自己的toolbar
        if (
          navigator.userAgent.match(/(iPhone|iPod|Android|ios|iOS|iPad|Backerry|WebOS|Symbian|Windows Phone|Phone)/i)
        ) {
          header.src = '//csdnimg.cn/public/common/toolbar/js/m_toolbar-2.1.2.js'
          const link = document.createElement('link')
          link.rel = 'stylesheet'
          link.href = '//csdnimg.cn/public/common/toolbar/content_toolbar_css/m_toolbar-1.1.1.css'
          document.head.appendChild(link)
          // 兼容app
          if (document.cookie.includes('CSDN-APP') || /csdn/i.test(window.navigator.userAgent)) {
            document.body.className = 'csdn-app'
          }
        }
      } else {
        header.src = 'https://g.csdnimg.cn/common/csdn-toolbar/csdn-toolbar.js'
      }
      document.body.appendChild(header)</script>
  <script>// 判断是不是ie浏览器
      if (!!window.ActiveXObject || 'ActiveXObject' in window) {
        // 判断是不是ie10以上
        if (!/msie [6|7|8|9]/i.test(navigator.userAgent)) {
          //ie10以上
          if (!window.upgrade) {
            window.upgrade = true
            let s = document.createElement('script')
            s.src = 'https://g.csdnimg.cn/browser_upgrade/1.0.2/browser_upgrade.js'
            let x = document.getElementsByTagName('script')[0]
            x.parentNode.insertBefore(s, x)
          }
        }
      }</script>
  <script>window.onload = function() {
        if (window.csdn && typeof window.csdn.configuration_tool_parameterv === 'function') {
          window.csdn.configuration_tool_parameterv({
            need_change_function: function(flag) {
              let c_toolbar = $('#csdn-toolbar')
              let s_toolbar = $('.so-toolbar')
              let advert = $('#csdn-toolbar .toolbar-advert')
              if (flag === 'fixed') {
                if (advert.length) advert.hide()
                s_toolbar.addClass('fixed').css('top', '0px')
                c_toolbar.addClass('hiddenToolbar')
              } else if (flag === 'noFixed') {
                if (advert.length) advert.show()
                s_toolbar.removeClass('fixed')
                c_toolbar.removeClass('hiddenToolbar')
              }
            }
          })
        }
      }</script>
  <script src="//g.csdnimg.cn/fixed-sidebar/1.1.6/fixed-sidebar.js"></script>
  <script src="//g.csdnimg.cn/user-tooltip/2.4/user-tooltip.js"></script>
  <script src="https://csdnimg.cn/release/searchv2-fe/js/element-ui.1410c515.js"></script>
  <script src="https://csdnimg.cn/release/searchv2-fe/js/highlight.6f38c3f5.js"></script>
  <script src="https://csdnimg.cn/release/searchv2-fe/js/index.1fc42b3e.js"></script>
  <script src="https://csdnimg.cn/release/searchv2-fe/js/chunk-vendors.ed98c7a2.js"></script>
 </body>
</html>
页面标题: 

进程已结束,退出代码为 0

CSDN同样查不出结果,使用的不加载策略反爬。

结果

Joup面对没有反爬措施的网站可以提取出URL,但是一旦有反爬的措施,joup就不能使用了。

接下来继续实现这个功能,考虑使用selenium绕开URL的反爬,或者找到一些没有反爬措施的资源综合类型网站。

  • 27
    点赞
  • 11
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值