python爬虫request第一例

python爬虫基础学习

request模块:

介绍:python中原生的一枚基于网络请求的模块,功能非常强大,简单便捷,效率极高。
作用:模拟浏览器发请求。
如何使用(requests模块的编码流程):
(1)指定url
(2)发起请求
(3)获取响应数据
(4)持久化存储

环境安装:

    pip install requests

实战编码:
需求:爬取搜狗首页的页面数据

# 需求:爬取搜狗首页的页面数据
import requests
if __name__ == "__main__":
    # step_1:指定url
    url = 'https://www.sogou.com/'
    # step_2:发起请求
    #get方法会返回一个响应对象
    response = requests.get(url=url, verify=False)
    # step_3:获取响应数据.text返回的是字符串形式的响应数据
    page_text = response.text
    print(page_text)
    # step_4:持久化存储
    with open('./sougou.html','w',encoding='utf-8') as fp:
        fp.write(page_text)
    print('爬取数据结束!!!')

爬取结果如下:

<!DOCTYPE html><html lang="cn"><head> <meta name="baidu_union_verify" content="efd6e8ce094119528f66c2d380f6ec94">
<meta name='360_ssp_verify' content='651669fb99b77a4e4efae7ec25d6796a' /> <meta name="viewport" content="width=device-width,minimum-scale=1,maximum-scale=1,user-scalable=no"><script>window._speedMark = new Date();
window.lead_ip = '183.63.119.21';
window.now = 1716200667181;</script><script type="text/javascript">/*file=static/js/resourceErrorReport.js*/!function(a){var n=(new Date).getTime(),r=a.location.protocol;function c(e,t){var o=(new Date).getTime()-n;(new Image).src=["//pb.sogou.com/pv.gif?uigs_productid=wapapp&type=resource-error&stype=",e,"&timestamp=",o,"&protocol=",r,"&host=",encodeURIComponent(a.location.host),"&path=",encodeURIComponent(a.location.pathname),"&resource=",encodeURIComponent(t)].join("")}function e(e){if((e=e||a.event)&&"error"===e.type){var t=e.srcElement?e.srcElement:e.target;if(t){var o,n,r=t.tagName;"LINK"===r?(n="css",(o=t.getAttribute("href"))&&o.match(/\.css($|\?)/)&&c(n,o)):"SCRIPT"===r&&(n="js",(o=t.getAttribute("src"))&&o.match(/\.js($|\?)/)&&c(n,o))}}}r&&(r=r.substring(0,r.length-1)),a.addEventListener?a.addEventListener("error",e,!0):a.attachEvent&&a.attachEvent("onerror",e)}(window);</script><meta charset="utf-8"><link rel="dns-prefetch" href="//img01.sogoucdn.com"><link rel="dns-prefetch" href="//img02.sogoucdn.com"><link rel="dns-prefetch" href="//img03.sogoucdn.com"><link rel="dns-prefetch" href="//img04.sogoucdn.com"><link rel="dns-prefetch" href="//dlweb.sogoucdn.com"><title>搜狗搜索引擎 - 上网从搜狗开始</title><link rel="shortcut icon" href="/images/logo/new/favicon.ico?v=4" type="image/x-icon"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><link rel="search" type="application/opensearchdescription+xml" href="/content-search.xml" title="搜狗搜索"><meta name="keywords" content="搜狗搜索,网页搜索,微信搜索,视频搜索,图片搜索,音乐搜索,新闻搜索,软件搜索,问答搜索,百科搜索,购物搜索"><meta name="description" content="搜狗搜索是全球第三代互动式搜索引擎,支持微信公众号和文章搜索、知乎搜索、英文搜索及翻译等,通过自主研发的人工智能算法为用户提供专业、精准、便捷的搜索服务。"><link rel="stylesheet" type="text/css" href="//dlweb.sogoucdn.com/pcsearch/web/index/css/index_style_39e6e10.css"><style>#voice-btn{display:none}.wrapper .suggestion{border:1px solid #e8e8e8;width:653px;-moz-box-shadow:0 1px 8px rgba(0,0,0,.1);-webkit-box-shadow:0 1px 8px rgba(0,0,0,.1);box-shadow:0 1px 8px rgba(0,0,0,.1);border-top-left-radius:0;border-top-right-radius:0;border-bottom-right-radius:2px;border-bottom-left-radius:2px;top:43px}.wrapper .suglist{width:206px}.wrapper .suglist .keyword{color:#7a77c8}.big-scn .suggestion{width:820px}.big-scn .suglist{width:236px}.wrapper .suglist{padding:4px 0}input[type=text]::-ms-clear{display:none}</style><!-- indexSnippetToHeader start --> <style>#voice-btn{display:none}</style> <!-- indexSnippetToHeader end --></head><body color-style="white"><div class="wrapper" id="wrap"><div class="header"> <div class="top-nav"><ul><li class="cur"><span>网页</span></li><li><a onclick="st(this,'73141200','weixin')" href="http://weixin.sogou.com/" uigs-id="nav_weixin" id="weixinch">微信</a></li><li><a onclick="st(this,'40051200','zhihu')" href="http://zhihu.sogou.com/" uigs-id="nav_zhihu" id="zhihu">知乎</a></li><li><a onclick="st(this,'40030500','pic')" href="http://pic.sogou.com" uigs-id="nav_pic" id="pic">图片</a></li><li><a onclick="st(this,'40030600','video')" href="https://v.sogou.com/" uigs-id="nav_v" id="video">视频</a></li><li><a href="http://mingyi.sogou.com?fr=common_index_nav" uigs-id="nav_mingyi" id="mingyi" onclick="st(this,'','myingyi')">医疗</a></li><li><a href="http://hanyu.sogou.com?fr=pcweb_index_nav" uigs-id="nav_hanyu" id="hanyu" onclick="st(this,'','hanyu')">汉语</a></li><li><a href="http://fanyi.sogou.com?fr=common_index_nav_pc" uigs-id="nav_fanyi" id="fanyi" onclick="st(this,'','fanyi')">翻译</a></li><li><a onclick="st(this,'web2ww','wenwen')" href="https://wenwen.sogou.com/?ch=websearch" uigs-id="nav_wenwen" id="index_more_wenwen">问问</a></li><li><a onclick="st(this,'web2ww','baike')" href="http://baike.sogou.com/Home.v" uigs-id="nav_baike" id="index_baike">百科</a></li><li><a onclick="st(this,'40031000')" href="http://map.sogou.com" uigs-id="nav_map" id="map">地图</a></li><li class="show-more"><a href="javascript:void(0);" id="more-product">更多<i class="m-arr"></i></a><div class="pos-more" id="products-box" style="top:40px"><span class="ico-san"></span><a onclick="st(this)" href="http://zhishi.sogou.com" uigs-id="nav_zhishi" id="index_more_zhishi">知识</a><a onclick="st(this,'40051205')" href="http://as.sogou.com/" uigs-id="nav_app" id="index_more_appli">应用</a><span class="all"><a onclick="st(this,'40051206')" href="http://www.sogou.com/docs/more.htm?v=1" uigs-id="nav_all" target="_blank">全部</a></span></div></li></ul></div><div class="user-box"><div class="local-weather" id="local-weather"><div class="wea-box" id="cur-weather" style="display:none"></div><div class="pos-more" id="detail-weather" style="top:40px;left:-110px"></div></div><span class="line" id="user-box-line" style="display:none"></span>  <a href="javascript:void(0)" id="cniil_wza" style="float:left;text-decoration:none;color:#000;opacity:.75;padding-left:8px;margin-right:20px;line-height:14px;position:relative;top:5px">无障碍</a>  </div></div><div class="content" id="content"><div class="pos-header" id="top-float-bar"><div class="part-one"></div><div class="part-two" id="card-tab-layer"><div class="c-top" id="top-card-tab"></div></div></div><div class="logo2" id="logo-s"><span></span></div><div class="logo" id="logo-l"><span></span></div> <div class="search-box querybox-focus" id="search-box"><form action="/web" name="sf" id="sf"><span class="sec-input-box"><input type="text" class="sec-input active" name="query" id="query" maxlength="100" len="80" autocomplete="off"></span><span class="enter-input"><input type="submit" value="搜狗搜索" id="stb"></span><input type="hidden" name="_asf" value="www.sogou.com"> <input type="hidden" name="_ast"> <input type="hidden" name="w" value="01019900"> <input type="hidden" name="p" value="40040100"> <input type="hidden" name="ie" value="utf8"> <input type="hidden" name="from" value="index-nologin"> <input type="hidden" name="s_from" value="index"><div class="keywords-tips" id="keywordsTips" style="display:none"><i></i><p>“<strong id="keywordsTipsStrong">369</strong>”后面的文字被忽略,搜狗的查询限制在40个汉字以内。</p></div></form></div>  </div><div class="ft" id="footer"  style="display:none" ><a href="//e.qq.com?from=sougou01" target="_blank" uigs-id="footer_tuiguang">企业推广</a><span class="line"></span><a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" uigs-id="footer_disclaimer">免责声明</a><span class="line"></span><a href="https://fankui.sogou.com/index.php/web/web/index/type/4" target="_blank" uigs-id="footer_feedback">意见反馈及投诉</a><span class="line"></span><a href="https://www.sogou.com/docs/privacy.htm?v=1" target="_blank" uigs-id="footer_private">隐私政策</a><br><span class="g">药品医疗器械网络信息服务备案:(京)网药械信息备字(2021)第00047号</span><br>&copy;&nbsp;2004-2024&nbsp;Sogou.com&nbsp;/&nbsp;<a href="http://www.12377.cn" class="g" target="_blank">网上有害信息举报专区</a>&nbsp;/&nbsp;<span class="g">京网文(2019)6117-724号</span>&nbsp;/&nbsp;<a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP证050897号</a>&nbsp;/&nbsp;<a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP备11001839号-1</a>&nbsp;/&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a></div>  <div class="ft-v1" id="QRcode-footer" style="padding-bottom:28px"><div class="ft-info"><a uigs-id="mid_pinyin" href="http://pinyin.sogou.com/" target="_blank"><i class="i1"></i>搜狗输入法</a><span class="line"></span><a uigs-id="mid_liulanqi" href="http://ie.sogou.com/" target="_blank"><i class="i2"></i>浏览器</a><span class="line"></span><a uigs-id="mid_daohang" href="http://123.sogou.com/" target="_blank"><i class="i3"></i>网址导航</a><br><a href="//e.qq.com?from=sougou01" target="_blank" class="g">企业推广</a>&nbsp;-&nbsp;<a href="http://www.sogou.com/docs/terms.htm?v=1" target="_blank" class="g">免责声明</a>&nbsp;-&nbsp;<a href="https://fankui.sogou.com/index.php/web/web/index/type/4" target="_blank" class="g">意见反馈及投诉</a>&nbsp;-&nbsp;<a href="https://www.sogou.com/docs/privacy.htm?v=1" target="_blank" class="g" uigs-id="footer_private">隐私政策</a><br><span class="g">药品医疗器械网络信息服务备案:(京)网药械信息备字(2021)第00047号</span><br>&copy;&nbsp;2004-2024&nbsp;Sogou.com&nbsp;/&nbsp;<a href="http://www.12377.cn" class="g" target="_blank">网上有害信息举报专区</a>&nbsp;/&nbsp;<span class="g">京网文(2019)6117-724号</span>&nbsp;/&nbsp;<a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP证050897号</a>&nbsp;/&nbsp;<a class="g" href="https://beian.miit.gov.cn/" target="_blank">京ICP备11001839号-1</a>&nbsp;/&nbsp;<a href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000025" class="ba" target="_blank">京公网安备11000002000025号</a></div>  <div class="fit-older"></div>  </div> <div class="kuozhan" id="QRcode-box" style="display:none"><a href="javascript:void(0);" id="miniQRcode"></a><span id="QRcode"></span></div><a href="javascript:void(0);" class="back-top" id="back-top"></a></div><script>var SugPara, uigs_para, msBrowserName = navigator.userAgent.toLowerCase(),msIsSe = false,msIsMSearch = false, hasDoodle = false, queryinput = document.getElementById('query');</script><script>/*file=static/js/indexjs.js*/function indexjsInit(e,n){function o(){try{window.external.metasearch("make_connection","www.google.com.hk")}catch(e){}}uigs_para={uigs_productid:"webapp",type:"webindex_new",stype:"nologin",scrnwi:screen.width,scrnhi:screen.height,uigs_pbtag:"A",uigs_cookie:"SUID,sct",protocol:"https:"==location.protocol.toLowerCase()?"https":"http"},window.loginCardConfig={},SugPara={queryboxid:"search-box",enableSug:!0,sugType:"web",domain:"w.sugg.sogou.com",productId:"web",sugFormName:"sf",inputid:"query",submitId:"stb",suggestRid:"01015002",normalRid:"01019900",useParent:1,sugglocation:"index",showVr:!0,showHotwords:!0,suggAbtestObject:e},/se 2\.x/i.test(msBrowserName)&&(msIsSe=!0),/metasr/i.test(msBrowserName)&&(msIsMSearch=!0),queryinput&&msIsSe&&msIsMSearch&&(queryinput.addEventListener?(queryinput.addEventListener("keypress",o,!1),queryinput.addEventListener("keydown",o,!1)):queryinput.attachEvent?(queryinput.attachEvent("onkeypress",o),queryinput.attachEvent("onkeydown",o)):(queryinput.onkeypress=o,queryinput.onkeydown=o)),window.m_s_index=function(){var e=document.sf.query,o=Math.round(1e3*((new Date).getTime()+Math.random()));e.focus(),new RegExp("kw=([^&]+)").test(location.search)&&0==e.value.length&&(e.value=decodeURIComponent(RegExp.$1)),document.cookie.indexOf("SUV=")<0&&(document.cookie="SUV="+o+";path=/;expires=Sun, 29 July 2026 00:00:00 UTC;domain="+function(){var e=document.domain;return e.indexOf("sogou.com")==e.length-9?".sogou.com":e.indexOf("soso.com")==e.length-8?".soso.com":-1!=e.indexOf("sogo.com")?".sogo.com":void 0}()),n&&((new Image).src="//pb6.sogou.com/v6")},window.st=function(e,o,n,t){var u=document.sf.query,s=encodeURIComponent(u.value),i={news:"http://news.sogou.com/news?ie=utf8&query=",web:"web?ie=utf8&query=",weixin:"http://weixin.sogou.com/weixin?type=2&ie=utf8&query=",zhihu:"http://zhihu.sogou.com/zhihu?ie=utf8&query=",pic:"http://pic.sogou.com/pics?ie=utf8&query=",video:"https://v.sogou.com/v?ie=utf8&query=",myingyi:"https://www.sogou.com/web?m2web=mingyi.sogou.com&ie=utf8&query=",overseas:"http://english.sogou.com?b_o_e=1&ie=utf8&fr=pcweb_index_nav&query=",scholar:"http://scholar.sogou.com?ie=utf8&fr=common_index_nav&query=",fanyi:"http://fanyi.sogou.com/?fr=common_index_nav_pc&ie=utf8&keyword=",wenwen:"http://wenwen.sogou.com/s/?ch=websearch&w=",hanyu:"https://hanyu.sogou.com/?query=",science:"https://baike.sogou.com/kexue/home.htm?query="},r=i[n]||e.href;function c(e){return-1<e.indexOf("?")?"&":"?"}u&&""!==u.value&&(["hanyu"].includes(n)?r=r.match(/.*(?=\?query\=)/)[0]+{hanyu:{index:"",result:"result"}}[n].result+"?query="+s:i[n]?r=i[n]+s:0<r.indexOf("kw=")?r=r.replace(new RegExp("kw=[^&$]*"),"kw="+s):r+=c(r)+"kw="+s),o&&(r+=c(r)+"p="+o),t&&0<t.length&&(r+="#"+t),!u||""!=u.value||"wenwen"!=n&&"science"!=n||(r=e.href),e.href=r},window.cid=function(e,o){var n=document.sf.query,t=encodeURIComponent(n.value);t?"web2ww"===o?e.href+="s/?cid=web2ww&w="+t:"web2bk"===o&&(e.href+="Search.e?sp=S"+t+"&cid=web2bk"):e.href+="?cid="+o},window.m_s_index()}indexjsInit({"suggestHistoryStrategy1":"","suggestHistoryStrategy2":"0|1|2|3|4|5|6|7|8","suggHistoryAbtest":""}, true);</script><script src="//dlweb.sogoucdn.com/pcsearch/js/jquery-1.11.0.min_8fc25e2.js"></script><script src="//dlweb.sogoucdn.com/pcsearch/js/lib/jquery.mousewheel.min_639d1c3.js"></script><script src="//dlweb.sogoucdn.com/pcsearch/js/lib/juicer-min_2a2bf35.js"></script><script src="//dlweb.sogoucdn.com/pcsearch/js/pb_v.1.9.6.min_2030e16.js"></script><script src="//search.sogoucdn.com/websearch/pc/static/js/sugg.40833b1d.js"></script><script src="//dlweb.sogoucdn.com/pcsearch/web/index/js/searchbase_453304b.js"></script>  <script defer="defer" async type="text/javascript" src="//dlweb.sogoucdn.com/barrier_free/pc/wzaV15/aria.js?appid=c4d5562ec7daa12a5a351cbe1a292da1" charset="utf-8"></script></body></html><!--zly-->
  • 7
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值