隐藏自己是爬虫装作客户爬取豆瓣网

本文介绍了如何伪装成正常用户来爬取豆瓣网站的数据。首先通过浏览器的开发者工具观察网络请求,获取用户信息,然后将这些信息整合到爬虫代码中,运行以实现伪装爬取。
摘要由CSDN通过智能技术生成

1、在游览器输入douban.com,右击查看代码,点击network,点击红点,然后刷新一下,再点击停止,选取第一时间段,查看用户信息。
在这里插入图片描述

2:把信息复制进代码

#@File : testUrllib.py
#@Software : PyCharm
import urllib.request
url="https://www.douban.com"
headers={
   
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.162 Safari/537.36"
}
req=urllib.request.Request(url=url,headers=headers,)
response=urllib.request.urlopen(req)
print(response.read().decode("utf-8"))

3:运行得出结果

E:\pythoncode\code\douban\venv\Scripts\python.exe E:/pythoncode/code/douban/venv/test/testUrllib.py
<!DOCTYPE HTML>
<html lang="zh-cmn-Hans" class="ua-windows ua-webkit">
<head>
<meta charset="UTF-8">
<meta name="google-site-verification" content="ok0wCgT20tBBgo9_zat2iAcimtN4Ftf5ccsh092Xeyw" />
<meta name="description" content="提供图书、电影、音乐唱片的推荐、评论和价格比较,以及城市独特的文化生活。">
<meta name="keywords" content="豆瓣,广播,登陆豆瓣">
<meta property="qc:admins" content="2554215131764752166375" />
<meta property="wb:webmaster" content="375d4a17a4fa24c2" />
<meta name="mobile-agent" content="format=html5; url=https://m.douban.com">
<title>豆瓣</title>
<script>
function set_cookie(t,e,o,n){
   var i,a,r=new Date;r.setTime(r.getTime()+24*(e||30)*60*60*1e3),i="; expires="+r.toGMTString();for(a in t)document.cookie=a+"="+t[a]+i+"; domain="+(o||"douban.com")+"; path="+(n||"/")}function get_cookie(t){
   var e,o,n=t+"=",i=document.cookie.split(";");for(e=0;e<i.length;e++){
   for(o=i[e];" "==o.charAt(0);)o=o.substring(1,o.length);if(0===o.indexOf(n))return o.substring(n.length,o.length).replace(/\"/g,"")}return null}window.Douban=window.Douban||{};var Do=function(){Do.actions.push([].slice.call(arguments))};Do.ready=function(){Do.actions.push([].slice.call(arguments))},Do.add=Do.define=function(t,e){Do.mods[t]=e},Do.global=function(){Do.global.mods=Array.prototype.concat(Do.global.mods,[].slice.call(arguments))},Do.global.mods=[],Do.mods={},Do.actions=[],Douban.init_show_login=function(t){Do("dialog",function(){var t="/j/misc/login_form";dui.Dialog({title:"登录",url:t,width:/device-mobile/i.test(document.documentElement.className)?.9*document.documentElement.offsetWidth:350,cache:!0,callback:function(t,e){e.node.addClass("dialog-login"),e.node.find("h2").css("display","none"),e.node.find(".hd h3").replaceWith(e.node.find(".bd h3")),e.node.find("form").css({border:"none",width:"auto",padding:"0"}),e.update()}}).open()})},Do(function(){function t(t,e){var o=["ref="+encodeURIComponent(location.pathname)];for(var n in e)e.hasOwnProperty(n)&&o.push(n+"="+e[n]);window._SPLITTEST&&o.push("splittest="+window._SPLITTEST),localStorage.setItem("report",(localStorage.getItem("report")||"")+"_moreurl_separator_"+o.join("&"))}!function(){"localStorage"in window||(window.localStorage=function(){var t=document;if(!t.documentElement.addBehavior)throw"don't support localstorage or userdata.";var e="_localstorage_ie",o=t.createElement("input");o.type="hidden";var n=function(n){return function(){t.body.appendChild(o),o.addBehavior("#default#userData");var i=new Date;i.setDate(i.getDate()+365),o.expires=i.toUTCString(),o.load(e);var a=n.apply(o,arguments);return t.body.removeChild(o),a}};return{getItem:n(function(t){return this.getAttribute(t)}),setItem:n(function(t,o){this.setAttribute(t,o),this.save(e)}),removeItem:n(function(t){this.removeAttribute(t),this.save(e)}),clear:n(function(){for(var t,o=this.XMLDocument.documentElement.attributes,n=0;t=o[n];n++)this.removeAttribute(t.name);this.save(e)})}}())}(),$(window).one("load",function(){var t=localStorage.getItem("report");if(t){t=t.split("_moreurl_separator_");var e=function(o){return""==o?void e(t.shift()):void $.get("undefined"==typeof _MOREURL_REQ?"/stat.html?"+o:_MOREURL_REQ+"?"+o,function(){return t.length?(e(t.shift()),void localStorage.setItem("report",t.join("_moreurl_separator_"))):void localStorage.removeItem("report")})};e(t.shift())}}),window.moreurl=t,$(document).click(function(e){var o=e.target,n=$(o).data("moreurl-dict");n&&t(o,n)}),$.ajax_withck=function(t){return"POST"==t.type&&(t.data=$.extend(t.data||{},{ck:get_cookie("ck")})),$.ajax(t)},$.postJSON_withck=function(t,e,o){return $.post_withck(t,e,o,"json")},$.post_withck=function(t,e,o,n){return $.isFunction(e)&&(n=o,o=e,e={}),$.ajax({type:"POST",url:t,data:$.extend(e,{ck:get_cookie("ck")}),success:o,dataType:n||"text"})},$("html").click(function(t){var e=$(t.target),o=e.attr("class");o&&$(o.match(/a_(\w+)/gi)).each($.proxy(function(e,o){var n=Douban[o.replace(/^a_/,"init_")];"function"==typeof n&&(t.preventDefault(),n.call(this,t))},e[0]))})});

Do.add('dialog', {
   path: 'https://img3.doubanio.com/f/shire/383a6e43f2108dc69e3ff2681bc4dc6c72a5ffb0/js/ui/dialog.js', type: 'js', requires: ['https://img3.doubanio.com/f/shire/8377b9498330a2e6f056d863987cc7a37eb4d486/css/ui/dialog.css']});
Do.global('https://img3.doubanio.com/f/sns/b5793c2d7c298173d57ecf7d96708b5615336def/js/sns/fp/base.js', 'dialog');
</script>
<link rel="stylesheet" href="https://img3.doubanio.com/f/shire/929d7e5bfb15cd179ff6df68bbd3d7e501681909/css/core/_init_.css">
<link rel="stylesheet" href="https://img3.doubanio.com/f/sns/8c089d263cf21ddcb983a4646b639a29ee1cd744/css/sns/anonymous_home.css">
<style type="text/css">
.rec_topics_name{
   
    display: inline-block;
    margin-bottom: 6px;
    font-size: 14px;
    line-height: 1.3;
    color: #3377aa;
}
.rec_topics_subtitle{
   
    display: block;
    margin-bottom: 15px;
    font-size: 13px;
    line-height: 1;
    color: #aaaaaa;
    white-space: nowrap;
    overflow: hidden;
    text-overflow: ellipsis;
}
.rec_topics_label{
   
    transform: translateY(-0.5px);
    display: inline-block;
    font-size: 13px;
    margin-left: 2px;
}
.rec_topics{
   
    line-height: 1;
    margin-bottom: 15px;
}
.rec_topics:last-child{
   
    margin-bottom: 0;
}
.rec_topics_label_ad{
   
    color: #c9c9c9;
    -moz-transform: scale(0.91);
    -webkit-transform: scale(0.91);
    transform: scale(0.91);
}

html[class*=ua-ff] .rec_topics_subtitle{
   
    line-height: 14px;
}

</style>
</head>

<body class='gray-mode'>


  <div id="anony-nav">
    <div class="anony-nav-links">
      <ul>
        <li>
          <a target="_blank" class="lnk-book" href="https://book.douban.com">豆瓣读书</a>
        </li>
        <li> 
          <a target="_blank" class="lnk-movie" href="https://movie.douban.com">豆瓣电影</a>
        </li>
        <li>
          <a target="_blank" class="lnk-music" href="https://music.douban.com">豆瓣音乐</a>
        </li>
        <li>
          <a target="_blank" class="lnk-events" href="https://www.douban.com/location/">豆瓣同城</a>
        </li>
        <li>
          <a target="_blank" class="lnk-group" href="https://www.douban.com/group/">豆瓣小组</a>
        </li>
        <li>
          <a target="_blank" class="lnk-read" href="https://read.douban.com">豆瓣阅读</a>
        </li>
        <li>
          <a target="_blank" class="lnk-fm" href="https://douban.fm">豆瓣FM</a>
        </li>
        <li>
          <a target="_blank" class="lnk-shijian" href="https://time.douban.com/?dt_time_source=douban-web_anonymous_index_top_nav">豆瓣时间</a>
        </li>
        <li>
          <a target="_blank" class="lnk-market" href="https://market.douban.com?utm_campaign=anonymous_top_nav&utm_source=douban&utm_medium=pc_web">豆瓣豆品</a>
        </li>
      </ul>
    </div>

    <h1><a href="https://www.douban.com">豆瓣</a></h1>

    <div class="anony-srh">
    <form action="https://www.douban.com/search" method="get">
      <span class="inp"><input type="text" maxlength="60" size="12" placeholder="书籍、电影、音乐、小组、小站、成员" name="q" autocomplete="off"></span>
    <span class="bn"><input type="submit" value="搜索"></span>
    </form>
    </div>
  </div>



<div id="anony-reg-new" style="background-image: url(https://img9.doubanio.com/view/puppy_image/raw/public/16ad990ec23uwn790d4.jpg)">
  <div class="wrapper">
    <div class="login">
      <iframe style="height: 300px; width: 300px;" frameborder='0' src="//accounts.douban.com/passport/login_popup?login_source=anony"></iframe>
    </div>
    <div class="app">
      <p class="app-title">豆瓣<span>6.0</span></p>
      <p class="app-slogan"></p>
      <a href="https://www.douban.com/doubanapp/app?channel=nimingye" class="lnk-app">下载豆瓣 App</a>
      <div class="app-qr">
        <a href="javascript: void 0;" class="lnk-qr" id="expand-qr"><img src="https://img3.doubanio.com/f/sns/0c708de69ce692883c1310053c5748c538938cb0/pics/sns/anony_home/icon_qrcode_green.png" width="28" height="28" /></a>
        <div class="app-qr-expand">
          <img src="https://img3.doubanio.com/f/sns/1cad523e614ec4ecb6bf91b054436bb79098a958/pics/sns/anony_home/doubanapp_qrcode.png" width="160" height="160" />
          <p>iOS / Android 扫码直接下载</p>
        </div>
      </div>
    </div>
  </div>
  <script>
  Do(function() {
   
    var app_qr = $('.app-qr');
    app_qr.hover(function() {
   
      app_qr.addClass('open');
    }, function() {
   
      app_qr.removeClass('open');
    });
  });
  </script>
</div>




      
<div id="anony-sns" class="section">
  <div class="wrapper">
  
<!-- douban ad begin -->
<div id="dale_anonymous_homepage_top_for_crazy_ad"></div>
<!-- douban ad end -->

  
  <div class="side">
  <div style="margin:10px 0px;">
    <div id="dale_anonymous_homepage_right_top"></div>
  </div>
  <div class="online">
    <ul>
      






<div class="mod">
    
    <h2>
        热门话题
            &nbsp;&middot;&nbsp;&middot;&nbsp;&middot;&nbsp;&middot;&nbsp;&middot;&nbsp;&middot;
            <span class="pl">&nbsp;(
                
                    <a href="/gallery/" target="_self">去话题广场</a>
                ) </span>
    </h2>


    <ul>
        
            <li class="rec_topics">
                    <a href="https://www.douban.com/gallery/topic/143419/?from=hot_topic_anony_sns" class="rec_topics_name">收集城市里的野草</a>
                    
                    <span class="rec_topics_subtitle">19.0万次浏览</span>
            </li>
            <li class="rec_topics">
                    <a href="https://www.douban.com/gallery/topic/143594/?from=hot_topic_anony_sns" class="rec_topics_name">你是如何对抗人生的荒谬</a>
                    
                    <span class="rec_topics_subtitle">138.7万次浏览</span>
            </li>
            <li class="rec_topics">
                    <a href="https://www.douban.com/gallery/topic/143526/?from=hot_topic_anony_sns" class="rec_topics_name">我和父母辈都读过的一本书</a>
                    
                    <span class="rec_topics_subtitle">67.4万次浏览</span>
            </li>
            <li class="rec_topics">
                    <a href="https://www.douban.com/gallery/topic/142487/?from=hot_topic_anony_sns" class="rec_topics_name">诗中志与智的碰撞</a>
                    
                    <span class="rec_topics_subtitle">4101次浏览</span>
            </li>
            <li class="rec_topics">
                    <a href="https://www.douban.com/gallery/topic/142515/?from=hot_topic_anony_sns" class="rec_topics_name">一人一段小语种朗读</a>
                    
                    <span class="rec_topics_subtitle">1509次浏览</span>
            </li>
            <li class="rec_topics">
                    <a href="https://www.douban.com/gallery/topic/144059/?from=hot_topic_anony_sns" class="rec_topics_name">纪念《一一》首映20周年</a>
                    
                    <span class="rec_topics_subtitle">新话题 · 7.2万次浏览</span>
            </li>
    </ul>
</div>

      <!-- douban ad begin -->
      <li>
        <div id="dale_homepage_online_activity_promo_1"></div>
      </li>
      <li>
        <div id="dale_anonymous_homepage_doublemint"></div>
      </li>
      <!-- douban ad end -->
    </ul>
  </div>
</div>
  <div class="main">
<div class="mod">

  
    <h2>
        热点内容
            ······
            <span class="pl">&nbsp;(
                
                    <a href="https://www.douban.com/explore/" target="_self">更多</a>
                ) </span>
    </h2>

  <div class="albums">
    <ul>
      <li>
      <div class="pic">
          <a href="https://www.douban.com/photos/album/1873859895/"><img src="https://img3.doubanio.com/f/shire/a1fdee122b95748d81cee426d717c05b5174fe96/pics/blank.gif" data-origin="https://img9.doubanio.com/view/photo/albumcover/public/p2601063575.jpg" alt="" /></a>
      </div>
      <a href="https://www.douban.com/photos/album/1873859895/">后疫情时期的武汉</a>
      <span class="num">48张照片</span>
      <li>
      <div class="pic">
          <a href="https://www.douban.com/photos/album/1617538425/"><img src="https://img3.doubanio.com/f/shire/a1fdee122b95748d81cee426d717c05b5174fe96/pics/blank.gif" data-origin="https://img3.doubanio.com/view/photo/albumcover/public/p2275271473.jpg" alt="" /></a>
      </div>
      <a href="https://www.douban.com/photos/album/1617538425/">姥爷的诗</a>
      <span class="num">19张照片</span>
      <li>
      <div class="pic">
          <a href="https://www.douban.com/photos/album/1690398143/"><img src="https://img3.doubanio.com/f/shire/a1fdee122b95748d81cee426d717c05b5174fe96/pics/blank.gif" data-origin="https://img1.doubanio.com/view/photo/albumcover/public/p2561870128.jpg" alt="" /></a>
      </div>
      <a href="https://www.douban.com/photos/album/1690398143/">孤独宇航员</a>
      <span class="num">186张照片</span>
      <li>
      <div class="pic">
          <a href="https://www.douban.com/photos/album/1677601786/"><img src="https://img3.doubanio.com/f/shire/a1fdee122b95748d81cee426d717c05b5174fe96/pics/blank.gif" data-origin="/pics/photo_album.png" alt="" /></a>
      </div>
      <a href="https://www.douban.com/photos/album/1677601786/">「墟墓之間」──法國</a>
      <span class="num">209张照片</span>
    </ul>
  </div>
  <div class="notes">
    <ul>
      <li class="first">
      <div class="title">
          <a href="https://www.douban.com/note/761902299/">所以爹妈在我们这个阶段在干吗</a>
      </div>
      <div class="author">
        菀的日记
      </div>
      <p>看到今天很多人转的那个关于男女职业冲突多是女方退出的讨论,想到一些不相干的。 ...</p>
      </li>

      <li><a href="https://www.douban.com/note/760049660/">我们每年要面对上千名露宿者,但只能帮助他们中的30个脱离这种生活</a></li>
      <li><a href="https://www.douban.com/note/761920723/">太阳城札记</a></li>
      <li><a href="https://www.douban.com/note/760200500/">为什么我说这几张抗疫海报毫无「苏」味儿</a></li>
      <li><a href="https://www.douban.com/note/758716654/">我的故乡在黑龙江省的一个县里,它快要消失了</a></li>
      <li><a href="https://www.douban.com/note/760646654/">一次深夜神游</a></li>
      <li><a href="https://www.douban.com/note/762207583/">关于《游城南记注》的一篇讨论文字</a></li>
      <li><a href="https://www.douban.com/note/758818708/">对年轻人而言,贫困感是如影随形的吗?</a></li>
      <li><a href="https://www.douban.com/note/760019703/">一脚一个春天</a></li>
      <li><a href="https://www.douban.com/note/759941208/">我在德邦上夜班的一年</a></li>
    </ul>
  </div>
</div>
</div>
  </div>
  

</div>








      
<div id="anony-time" class="section">
  <div class="wrapper">
  
  
    <div class="sidenav">
        <h2 class="section-title"><a href="https://time.douban.com?dt_time_source=douban-web_anonymous">豆瓣时间</a></h2>
    </div>

  <div class="side"></div>
  <div class="main">
    
    <h2>
        热门专栏
            &nbsp;&middot;&nbsp;&middot;&nbsp;&middot;&nbsp;&middot;&nbsp;&middot;&nbsp;&middot;
            <span class="pl">&nbsp;(
                
                    <a href="https://time.douban.com?dt_time_source=douban-web_anonymous" target="_self">更多</a>
                ) </span>
    </h2>


    



<ul class="time-list">
        <li>
            
            <a class="cover video new " href="https://m.douban.com/time/column/194?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img1.doubanio.com/dae/niffler/niffler/images/ffb879fe-91d3-11ea-ac42-de2223cd0c89.jpg" alt="简食知味——20道全生素食料理课">
            </a>
            <a class="title" href="https://m.douban.com/time/column/194?dt_time_source=douban-web_anonymous" target="_blank">简食知味——20道全生素食料理课</a>
            <span class="type">视频专栏</span>
        </li>
        <li>
            
            <a class="cover audio new " href="https://m.douban.com/time/column/193?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img3.doubanio.com/dae/niffler/niffler/images/c1a3cfde-8543-11ea-a58d-5aaa293ef1e1.jpg" alt="人心可测——姜振宇的微表情读心术">
            </a>
            <a class="title" href="https://m.douban.com/time/column/193?dt_time_source=douban-web_anonymous" target="_blank">人心可测——姜振宇的微表情读心术</a>
            <span class="type">音频专栏</span>
        </li>
        <li>
            
            <a class="cover audio new " href="https://m.douban.com/time/column/192?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img1.doubanio.com/dae/niffler/niffler/images/6cf22398-757e-11ea-8774-fea672b52f7a.jpg" alt="生死之间:10堂课学会如何与疾病共处">
            </a>
            <a class="title" href="https://m.douban.com/time/column/192?dt_time_source=douban-web_anonymous" target="_blank">生死之间:10堂课学会如何与疾病共处</a>
            <span class="type">音频专栏</span>
        </li>
        <li>
            
            <a class="cover audio new " href="https://m.douban.com/time/column/191?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img3.doubanio.com/dae/niffler/niffler/images/eb08fc6c-7348-11ea-afb6-5a5bcc9605bf.jpg" alt="爱我,请先懂我——纽约大学艺术治疗师的儿童心理成长课">
            </a>
            <a class="title" href="https://m.douban.com/time/column/191?dt_time_source=douban-web_anonymous" target="_blank">爱我,请先懂我——纽约大学艺术治疗师的儿童心理成长课</a>
            <span class="type">音频专栏</span>
        </li>
        <li>
            
            <a class="cover audio new " href="https://m.douban.com/time/column/190?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img1.doubanio.com/dae/niffler/niffler/images/7df08276-6cea-11ea-b1b3-fea672b52f7a.jpg" alt="罪恶的背后——人人必修的60堂犯罪心理学">
            </a>
            <a class="title" href="https://m.douban.com/time/column/190?dt_time_source=douban-web_anonymous" target="_blank">罪恶的背后——人人必修的60堂犯罪心理学</a>
            <span class="type">音频专栏</span>
        </li>
        <li>
            
            <a class="cover audio new " href="https://m.douban.com/time/column/188?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img9.doubanio.com/dae/niffler/niffler/images/8e457bfe-5872-11ea-916d-4e50984eeed6.jpg" alt="用性别之尺丈量世界——18堂思想课解读女性问题">
            </a>
            <a class="title" href="https://m.douban.com/time/column/188?dt_time_source=douban-web_anonymous" target="_blank">用性别之尺丈量世界——18堂思想课解读女性问题</a>
            <span class="type">音频专栏</span>
        </li>
        <li>
            
            <a class="cover video new " href="https://m.douban.com/time/column/189?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img1.doubanio.com/dae/niffler/niffler/images/a4283c6c-5f80-11ea-b2da-560baafd962b.jpg" alt="微电影剧作——如何在剧作中运用观众心理学">
            </a>
            <a class="title" href="https://m.douban.com/time/column/189?dt_time_source=douban-web_anonymous" target="_blank">微电影剧作——如何在剧作中运用观众心理学</a>
            <span class="type">视频专栏</span>
        </li>
        <li>
            
            <a class="cover video new " href="https://m.douban.com/time/column/187?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img9.doubanio.com/dae/niffler/niffler/images/0b1c1886-4cb1-11ea-a5fc-ce0829b4ff74.jpg" alt="电影产业破壁课——13小时重塑电影世界观">
            </a>
            <a class="title" href="https://m.douban.com/time/column/187?dt_time_source=douban-web_anonymous" target="_blank">电影产业破壁课——13小时重塑电影世界观</a>
            <span class="type">视频专栏</span>
        </li>
        <li>
            
            <a class="cover audio  new" href="https://m.douban.com/time/column/186?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img9.doubanio.com/dae/niffler/niffler/images/bc181fca-48da-11ea-af67-8ac79e38c4d6.jpg" alt="不准无聊!精品大师课免费放送">
            </a>
            <a class="title" href="https://m.douban.com/time/column/186?dt_time_source=douban-web_anonymous" target="_blank">不准无聊!精品大师课免费放送</a>
            <span class="type">音频专栏</span>
        </li>
        <li>
            
            <a class="cover audio new " href="https://m.douban.com/time/column/185?dt_time_source=douban-web_anonymous" target="_blank">
                <img src="https://img9.doubanio.com/dae/niffler/niffler/images/f3573202-3389-11ea-81ed-3e551a2d8b14.jpg" alt="懂得这些再去穿越——古代天文学里的星空密码">
            </a>
            <a class="title" href="https://m.douban.com/time/column/185?dt_time_source=douban-web_anonymous" target="_blank">懂得这些再去穿越——古代天文学里的星空密码
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值