python 爬虫框架_Python实战:爬虫框架(6)

07e99302217438e3f0e5c4739d08478f.png

数据挖掘

用于通过互联网到各个服务器获取数据

数据

公开数据:客户端浏览器访问网页所看到的数据

隐私数据:服务器内部没有暴露,具有隐私权限的数据

爬虫

网络爬虫:公开数据

蠕虫爬虫:携带具有攻击性病毒的爬虫,隐私数据

网络数据类型

服务器返回的数据类型?

html、json、xml、text、file文件

爬虫过程

真实访问过程:

1.浏览器:请求url访问 - 2.服务器验证:通过 - 3.返回url对应的数据(html、json、xml、text、file文件)最多的页面:html结构 - 4.浏览器渲染数据:显示

爬虫模拟过程:

1.代码模拟浏览器:请求url访问 - 2.服务器验证:脚本请求(恶意访问),反爬虫验证,突破 - 3.返回url对应的数据(html、json、xml、text、file文件)最多的页面:html结构 - 4.返回脚本处理:代码

开发者模式功能

学会:使用谷歌浏览器或其他浏览器的开发者模式

快捷键:F12 |Fn+F12

功能:阅读网页的源代码结构和数据,network网络(抓包工具)可以抓去该网页加载的所有的url接口

9d0396c47dbe8cf5b48557da0b9727dd.png

爬虫进阶

1.基础

urlib3 请求基础框架

re 正则表达式,解析框架

2.中级

requests 基于urlib3封装请求框架,简化,易学习,功能更加丰富和强大

bs4 基于re封装解析框架,简化,易学习,功能更加丰富和强大

3.高级

大型网站:反爬技术厉害

selenium 调用本地真实浏览器去请求数据

数据量:一天3亿

scrapy 分布式爬虫,有N台计算机同时运作处理

爬虫框架学习文档:

快速上手 - Requests 2.18.1 文档爬虫框架学习文档:

Beautiful Soup 4.4.0 文档爬虫框架学习文档:

快速上手 - Requests 2.18.1 文档​requests.readthedocs.io Beautiful Soup 4.4.0 文档​beautifulsoup.readthedocs.io
9a1e0089349e5108edee6290a0051fbf.png

requests框架一:请求函数

http协议:短链接请求

1.GET请求

浏览器的地址栏,携带数据可见,地址栏长度限制:数据大小 <= 485kb,可以提交一般小容量的:文本数据

2.POST请求

携带数据不可见,无数据大小限制

3.请求两个对象

request请求对象:请求操作,提交数据

response响应对象:验证是否成功,获取数据

# 1.验证请求对象类型
response_get = requests.get(url='http://httpbin.org/get')

response_post = requests.post(url='http://httpbin.org/post')

print(type(requests),type(response_get))

运行结果

runcell(0, '/Users/lpf/Desktop/安康学院pyhton实训/python实训/第三天/requests框架教程.py')
<class 'module'> <class 'requests.models.Response'>

requests框架二:请求函数提交数据

https://mp.weixin.qq.com/wxamp/basicprofile/index?token=1509181277&lang=zh_CN

域名?提交数据

数据格式:name=value

GET请求参数:params={'user':'lpf'}

POST请求参数:data={'user':'lpf'}

# 2.请求提交参数

response_get = requests.get(url='http://httpbin.org/get',params={'user':'lpf'})

response_post = requests.post(url='http://httpbin.org/post',data={'user':'lpf'})


print('response_get',response_get.url)

print('response_post',response_post.url)

运行结果

runcell(0, '/Users/lpf/Desktop/安康学院pyhton实训/python实训/第三天/requests框架教程.py')
<class 'module'> <class 'requests.models.Response'>
response_get http://httpbin.org/get?user=lpf
response_post http://httpbin.org/post

requests框架三:请求函数是否成功和网页的编码格式

浏览器验证访问状态码

F12启动,点击network,F5刷新,随便找一个name点击,查看General-Status Code

成功:200

无法加载页面:404,403.....

服务异常:505,503.....

dd2630879ef8b1b7a2bf8d13541f38e3.png
# 3.查看状态码和网页编码
response_get = requests.get(url='http://www.baidu.com')

print('status_code-',response_get.status_code)
print('encoding-',response_get.encoding)

运行结果

runfile('/Users/lpf/Desktop/安康学院pyhton实训/python实训/第三天/requests框架教程.py')
<class 'module'> <class 'requests.models.Response'>
response_get http://httpbin.org/get?user=lpf
response_post http://httpbin.org/post
status_code- 200
encoding- ISO-8859-1

requests框架四:获取返回数据

text:字符流,能看懂,效率低

content:字节流,看不懂,效率高

# 4.获取网页源码 text字符流 content字节流(二进制)
print('text-',response_get.text)
print('content-',response_get.content)

运行结果

runfile('/Users/lpf/Desktop/安康学院pyhton实训/python实训/第三天/requests框架教程.py')
<class 'module'> <class 'requests.models.Response'>
response_get http://httpbin.org/get?user=lpf
response_post http://httpbin.org/post
status_code- 200
encoding- None
text- <!DOCTYPE html><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="never" name="referrer"><title>百度一下,你就知道</title><style>html,body{height:100%}html{overflow-y:auto}body{font:12px arial;background:#fff}body,p,form,ul,li{margin:0;padding:0;list-style:none}body,form{position:relative}td{text-align:left}img{border:0}a{color:#00c}a:active{color:#f60}input{border:0;padding:0}#wrapper{position:relative;_position:;min-height:100%}#head{padding-bottom:100px;text-align:center;}#ftCon{height:100px;position:absolute;bottom:23px;text-align:left;width:100%;margin:0 auto;z-index:0;overflow:hidden}.ftCon-Wrapper{overflow:hidden;margin:0 auto;text-align:center;}#qr{display:inline-block;}#qr .qr-item{float:left}#qr .qr-item-2{margin-left:33px}#qr .qr-img{float:left;width:60px;height:60px}#qr .qr-item-1 .qr-img{background:url(http://s1.bdstatic.com/r/www/cache/static/home/img/qrcode/zbios_a4b2d86f.png) 0 0 no-repeat}.qr-img{background:url(http://s1.bdstatic.com/r/www/cache/static/home/img/qrcode/nuomi_510f7472.png) 0 0 no-repeat}.qr-text{float:left;color:#999;line-height:23px;margin:8px 0 0 10px}#qr .qr-text a{color:#999;text-decoration:none}#qr .qr-text p{text-align:left}#qr .qr-text b{color:#666;font-weight:700}#qr .qr-text span{letter-spacing:1px}#ftConw{display:inline-block;text-align:left;margin-left:33px;line-height:22px;position:relative;top:-2px;}#ftConw,#ftConw a{color:#999}#lh a{margin-left:25px}#lh #setf{margin-left:0}#wrapper{min-width:810px;height:100%;min-height:600px}#head{position:relative;padding-bottom:0;height:100%;min-height:600px}#head .head_wrapper{height:100%}#form{margin:22px auto 0;width:641px;text-align:left;z-index:100}#form .bdsug{top:35px}#kw{position:relative}#cp .icico,#gw .gwico{width:14px;height:17px;display:inline-block;overflow:hidden;background:url(http://s1.bdstatic.com/r/www/cache/static/global/img/icons_0e814c16.png) no-repeat;}#cp .icico{background-position:-600px -96px;position:relative;top:3px}#gw .gwico{background-position:-623px -96px;position:relative;top:3px;margin-right:6px}.s_btn{width:95px;height:32px;padding-top:2px9;font-size:14px;background-color:#ddd;background-position:0 -48px;cursor:pointer}.s_btn{width:100px;height:36px;color:#fff;font-size:15px;letter-spacing:1px;background:#3385ff;border-bottom:1px solid #2d78f4;outline:medium;-webkit-appearance:none;-webkit-border-radius:0}.s_btn.btnhover{background:#317ef3;border-bottom:1px solid #2868c8;box-shadow:1px 1px 1px #ccc}.s_btn_wr{width:97px;height:34px;display:inline-block;background-position:-120px -48px;z-index:0;vertical-align:top}.s_btn_wr{width:auto;height:auto;border-bottom:1px solid transparent;}.s_ipt_wr{height:34px}.s_ipt_wr.bg,.s_btn_wr.bg,#su.bg{background-image:none}.s_ipt_wr{border:1px solid #b6b6b6;border-color:#7b7b7b #b6b6b6 #b6b6b6 #7b7b7b;background:#fff;display:inline-block;vertical-align:top;width:539px;margin-right:0;border-right-width:0;border-color:#b8b8b8 transparent #ccc #b8b8b8;overflow:hidden}.s_ipt{width:526px;height:22px;font:16px/18px arial;line-height:22px9;margin:6px 0 0 7px;padding:0;background:0 0;border:0;outline:0;-webkit-appearance:none}.bdsug{position:absolute;width:418px;background:#fff;display:none;border:1px solid #817f82}.bdsug li{width:511px;color:#000;font:14px arial;line-height:25px;padding:0 8px;position:relative;cursor:default}.bdsug{top:35px;width:538px;border-color:#ccc;box-shadow:1px 1px 3px #ededed;-webkit-box-shadow:1px 1px 3px #ededed;-moz-box-shadow:1px 1px 3px #ededed;-o-box-shadow:1px 1px 3px #ededed}.s_form{position:relative;top:38.2%}.s_form_wrapper{position:relative;top:-191px}#u1{z-index:2;color:#fff;position:absolute;right:0;top:0;margin:19px 0 5px 0;padding:0 96px 0 0}#u1 a:link,#u1 a:visited{color:#666;text-decoration:none}#u1 a:hover,#u1 a:active{text-decoration:underline}#u1 a:active{color:#00c}#u1 a.bri,#u1 a.bri:visited{display:inline-block;position:absolute;right:10px;width:60px;height:23px;float:left;color:#fff;background:#38f;line-height:24px;font-size:13px;text-align:center;overflow:hidden;border-bottom:1px solid #38f;margin-left:19px;margin-right:2px}#u1 a.mn,#u1 a.mn:visited{float:left;color:#333;font-weight:700;line-height:24px;margin-left:20px;font-size:13px;text-decoration:underline}</style></head><body><div id="wrapper"><div id="head"><div class="head_wrapper"><div class="s_form"><div class="s_form_wrapper"><div id="lg"><img src="http://www.baidu.com/img/bd_logo1.png"width="270"height="129"></div><form id="form"name="f"action="https://www.baidu.com/s"class="fm"method="get"><span class="bg s_ipt_wr"><span id="ipt_photo"></span><input id="kw"name="wd"class="s_ipt"value=""maxlength="255"autocomplete="off"><input type="hidden"name="ie"value="utf-8"><input type="hidden"name="tn"value="02049043_10_pg&ch=1"></span><span class="bg s_btn_wr"><input type="submit"id="su"value="百度一下"class="bg s_btn"></span></form></div></div><div id="u1"><a href="http://news.baidu.com"class="mn">新闻</a><a href="http://www.hao123.com"class="mn">hao123</a><a href="http://map.baidu.com"class="mn">地图</a><a href="http://v.baidu.com"class="mn">视频</a><a href="http://tieba.baidu.com"class="mn">贴吧</a><a href="http://xueshu.baidu.com"class="mn">学术</a><a href="http://passport.baidu.com/?login&u=http%3A%2F%2Fwww.baidu.com"class="mn">登录</a><a href="/gaoji/preferences.html"class="mn">设置</a><a href="/more/"class="bri"style="display:block">更多产品</a></div></div></div><div id="ftCon"><div class="ftCon-Wrapper"><div id="qr"><div class="qr-item qr-item-1"><div class="qr-img"></div><div class="qr-text"><p><b>手机百度</b></p><p><span>快人一步</span></p></div></div><div class="qr-item qr-item-2"><div class="qr-img"></div><div class="qr-text"><p><b>百度糯米</b></p><p><span>一元大餐</span></p></div></div></div><div id="ftConw"><p id="lh"><a id="setf"href="/cache/sethelp/help.html"target="_blank">把百度设为主页</a><a href="http://home.baidu.com">关于百度</a><a href="http://ir.baidu.com">About  Baidu</a><a href="http://e.baidu.com">百度推广</a></p><p id="cp">©2016 Baidu <a href="/duty/">使用百度前必读</a> <a href="http://jianyi.baidu.com/">意见反馈</a> 京ICP证030173号 <i class="icico"></i></p><a id="gw"target="_blank"href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000001"><i class="gwico"></i>京公网安备11000002000001号</a></div></div></div><div id="wrapper_wrapper"></div></div><script type="text/javascript">var D=document,R=function(l){var r,t,s='';for(var i=l;i--;){r=Math.random();t=r.toString(36).substr(2,1);s+=r>0.5?t.toUpperCase():t}return s.substr(0,l)},N=function(n){var s=D.getElementsByName(n);if(s.length)return s[0]},S=function(e,p){if(p)for(var n in p)e.setAttribute(n,p[n]),e.n=p[n]},T=function(c){((1-0.1).toFixed(0)=='0')?top.location.replace(c):(/webkit/i.test(navigator.userAgent))?~function(a){S(a,{rel:'noreferrer',target:'_top',href:c});D.body.appendChild(a);if(a.click){a.click()}else{try{var b=D.createEvent('Event');b.initEvent('click',true,true);a.dispatchEvent(b)}catch(e){}}}(D.createElement('a')):~function(d){~function(){D.write(d);D.close()}()}('<meta http-equiv="refresh" content="0;url='+c+'"/>')};D.oncontextmenu=function(){return false};try{if(D.URL.match('#')){top.location.replace('http://www.baidu.com/s?'+location.hash.replace(/^#/,''))}N("f").onsubmit=function(){var a=N('wd'),b=N('tn');if(a&&b){if(a.value.length>0){T('http://www.baidu.com/s?wd='+a.value+'&rsv_op='+R(96)+'&tn='+b.value+'&rsv_su='+R(96))}return false}}}catch(e){}</script></body></html>
text- b'<!DOCTYPE html><html><head><meta http-equiv="content-type" content="text/html;charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=Edge"><meta content="never" name="referrer"><title>xe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8bxefxbcx8cxe4xbdxa0xe5xb0xb1xe7x9fxa5xe9x81x93</title><style>html,body{height:100%}html{overflow-y:auto}body{font:12px arial;background:#fff}body,p,form,ul,li{margin:0;padding:0;list-style:none}body,form{position:relative}td{text-align:left}img{border:0}a{color:#00c}a:active{color:#f60}input{border:0;padding:0}#wrapper{position:relative;_position:;min-height:100%}#head{padding-bottom:100px;text-align:center;}#ftCon{height:100px;position:absolute;bottom:23px;text-align:left;width:100%;margin:0 auto;z-index:0;overflow:hidden}.ftCon-Wrapper{overflow:hidden;margin:0 auto;text-align:center;}#qr{display:inline-block;}#qr .qr-item{float:left}#qr .qr-item-2{margin-left:33px}#qr .qr-img{float:left;width:60px;height:60px}#qr .qr-item-1 .qr-img{background:url(http://s1.bdstatic.com/r/www/cache/static/home/img/qrcode/zbios_a4b2d86f.png) 0 0 no-repeat}.qr-img{background:url(http://s1.bdstatic.com/r/www/cache/static/home/img/qrcode/nuomi_510f7472.png) 0 0 no-repeat}.qr-text{float:left;color:#999;line-height:23px;margin:8px 0 0 10px}#qr .qr-text a{color:#999;text-decoration:none}#qr .qr-text p{text-align:left}#qr .qr-text b{color:#666;font-weight:700}#qr .qr-text span{letter-spacing:1px}#ftConw{display:inline-block;text-align:left;margin-left:33px;line-height:22px;position:relative;top:-2px;}#ftConw,#ftConw a{color:#999}#lh a{margin-left:25px}#lh #setf{margin-left:0}#wrapper{min-width:810px;height:100%;min-height:600px}#head{position:relative;padding-bottom:0;height:100%;min-height:600px}#head .head_wrapper{height:100%}#form{margin:22px auto 0;width:641px;text-align:left;z-index:100}#form .bdsug{top:35px}#kw{position:relative}#cp .icico,#gw .gwico{width:14px;height:17px;display:inline-block;overflow:hidden;background:url(http://s1.bdstatic.com/r/www/cache/static/global/img/icons_0e814c16.png) no-repeat;}#cp .icico{background-position:-600px -96px;position:relative;top:3px}#gw .gwico{background-position:-623px -96px;position:relative;top:3px;margin-right:6px}.s_btn{width:95px;height:32px;padding-top:2px9;font-size:14px;background-color:#ddd;background-position:0 -48px;cursor:pointer}.s_btn{width:100px;height:36px;color:#fff;font-size:15px;letter-spacing:1px;background:#3385ff;border-bottom:1px solid #2d78f4;outline:medium;-webkit-appearance:none;-webkit-border-radius:0}.s_btn.btnhover{background:#317ef3;border-bottom:1px solid #2868c8;box-shadow:1px 1px 1px #ccc}.s_btn_wr{width:97px;height:34px;display:inline-block;background-position:-120px -48px;z-index:0;vertical-align:top}.s_btn_wr{width:auto;height:auto;border-bottom:1px solid transparent;}.s_ipt_wr{height:34px}.s_ipt_wr.bg,.s_btn_wr.bg,#su.bg{background-image:none}.s_ipt_wr{border:1px solid #b6b6b6;border-color:#7b7b7b #b6b6b6 #b6b6b6 #7b7b7b;background:#fff;display:inline-block;vertical-align:top;width:539px;margin-right:0;border-right-width:0;border-color:#b8b8b8 transparent #ccc #b8b8b8;overflow:hidden}.s_ipt{width:526px;height:22px;font:16px/18px arial;line-height:22px9;margin:6px 0 0 7px;padding:0;background:0 0;border:0;outline:0;-webkit-appearance:none}.bdsug{position:absolute;width:418px;background:#fff;display:none;border:1px solid #817f82}.bdsug li{width:511px;color:#000;font:14px arial;line-height:25px;padding:0 8px;position:relative;cursor:default}.bdsug{top:35px;width:538px;border-color:#ccc;box-shadow:1px 1px 3px #ededed;-webkit-box-shadow:1px 1px 3px #ededed;-moz-box-shadow:1px 1px 3px #ededed;-o-box-shadow:1px 1px 3px #ededed}.s_form{position:relative;top:38.2%}.s_form_wrapper{position:relative;top:-191px}#u1{z-index:2;color:#fff;position:absolute;right:0;top:0;margin:19px 0 5px 0;padding:0 96px 0 0}#u1 a:link,#u1 a:visited{color:#666;text-decoration:none}#u1 a:hover,#u1 a:active{text-decoration:underline}#u1 a:active{color:#00c}#u1 a.bri,#u1 a.bri:visited{display:inline-block;position:absolute;right:10px;width:60px;height:23px;float:left;color:#fff;background:#38f;line-height:24px;font-size:13px;text-align:center;overflow:hidden;border-bottom:1px solid #38f;margin-left:19px;margin-right:2px}#u1 a.mn,#u1 a.mn:visited{float:left;color:#333;font-weight:700;line-height:24px;margin-left:20px;font-size:13px;text-decoration:underline}</style></head><body><div id="wrapper"><div id="head"><div class="head_wrapper"><div class="s_form"><div class="s_form_wrapper"><div id="lg"><img src="http://www.baidu.com/img/bd_logo1.png"width="270"height="129"></div><form id="form"name="f"action="https://www.baidu.com/s"class="fm"method="get"><span class="bg s_ipt_wr"><span id="ipt_photo"></span><input id="kw"name="wd"class="s_ipt"value=""maxlength="255"autocomplete="off"><input type="hidden"name="ie"value="utf-8"><input type="hidden"name="tn"value="02049043_10_pg&ch=1"></span><span class="bg s_btn_wr"><input type="submit"id="su"value="xe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8b"class="bg s_btn"></span></form></div></div><div id="u1"><a href="http://news.baidu.com"class="mn">xe6x96xb0xe9x97xbb</a><a href="http://www.hao123.com"class="mn">hao123</a><a href="http://map.baidu.com"class="mn">xe5x9cxb0xe5x9bxbe</a><a href="http://v.baidu.com"class="mn">xe8xa7x86xe9xa2x91</a><a href="http://tieba.baidu.com"class="mn">xe8xb4xb4xe5x90xa7</a><a href="http://xueshu.baidu.com"class="mn">xe5xadxa6xe6x9cxaf</a><a href="http://passport.baidu.com/?login&u=http%3A%2F%2Fwww.baidu.com"class="mn">xe7x99xbbxe5xbdx95</a><a href="/gaoji/preferences.html"class="mn">xe8xaexbexe7xbdxae</a><a href="/more/"class="bri"style="display:block">xe6x9bxb4xe5xa4x9axe4xbaxa7xe5x93x81</a></div></div></div><div id="ftCon"><div class="ftCon-Wrapper"><div id="qr"><div class="qr-item qr-item-1"><div class="qr-img"></div><div class="qr-text"><p><b>xe6x89x8bxe6x9cxbaxe7x99xbexe5xbaxa6</b></p><p><span>xe5xbfxabxe4xbaxbaxe4xb8x80xe6xadxa5</span></p></div></div><div class="qr-item qr-item-2"><div class="qr-img"></div><div class="qr-text"><p><b>xe7x99xbexe5xbaxa6xe7xb3xafxe7xb1xb3</b></p><p><span>xe4xb8x80xe5x85x83xe5xa4xa7xe9xa4x90</span></p></div></div></div><div id="ftConw"><p id="lh"><a id="setf"href="/cache/sethelp/help.html"target="_blank">xe6x8ax8axe7x99xbexe5xbaxa6xe8xaexbexe4xb8xbaxe4xb8xbbxe9xa1xb5</a><a href="http://home.baidu.com">xe5x85xb3xe4xbax8exe7x99xbexe5xbaxa6</a><a href="http://ir.baidu.com">Aboutxc2xa0xc2xa0Baidu</a><a href="http://e.baidu.com">xe7x99xbexe5xbaxa6xe6x8exa8xe5xb9xbf</a></p><p id="cp">xc2xa92016xc2xa0Baiduxc2xa0<a href="/duty/">xe4xbdxbfxe7x94xa8xe7x99xbexe5xbaxa6xe5x89x8dxe5xbfx85xe8xafxbb</a>xc2xa0<a href="http://jianyi.baidu.com/">xe6x84x8fxe8xa7x81xe5x8fx8dxe9xa6x88</a>xc2xa0xe4xbaxacICPxe8xafx81030173xe5x8fxb7xc2xa0<i class="icico"></i></p><a id="gw"target="_blank"href="http://www.beian.gov.cn/portal/registerSystemInfo?recordcode=11000002000001"><i class="gwico"></i>xe4xbaxacxe5x85xacxe7xbdx91xe5xaex89xe5xa4x8711000002000001xe5x8fxb7</a></div></div></div><div id="wrapper_wrapper"></div></div><script type="text/javascript">var D=document,R=function(l){var r,t,s='';for(var i=l;i--;){r=Math.random();t=r.toString(36).substr(2,1);s+=r>0.5?t.toUpperCase():t}return s.substr(0,l)},N=function(n){var s=D.getElementsByName(n);if(s.length)return s[0]},S=function(e,p){if(p)for(var n in p)e.setAttribute(n,p[n]),e.n=p[n]},T=function(c){((1-0.1).toFixed(0)=='0')?top.location.replace(c):(/webkit/i.test(navigator.userAgent))?~function(a){S(a,{rel:'noreferrer',target:'_top',href:c});D.body.appendChild(a);if(a.click){a.click()}else{try{var b=D.createEvent('Event');b.initEvent('click',true,true);a.dispatchEvent(b)}catch(e){}}}(D.createElement('a')):~function(d){~function(){D.write(d);D.close()}()}('<meta http-equiv="refresh" content="0;url='+c+'"/>')};D.oncontextmenu=function(){return false};try{if(D.URL.match('#')){top.location.replace('http://www.baidu.com/s?'+location.hash.replace(/^#/,''))}N("f").onsubmit=function(){var a=N('wd'),b=N('tn');if(a&&b){if(a.value.length>0){T('http://www.baidu.com/s?wd='+a.value+'&rsv_op='+R(96)+'&tn='+b.value+'&rsv_su='+R(96))}return false}}}catch(e){}</script></body></html>'

requests框架五:定制头部

浏览器访问,隐藏携带一些脚本没有的数据,服务器验证这些数据,判断你是客户端访问,还是脚本访问

1.测试网址:千里马招标网站

http://www.qianlima.com/zb/area_305


2.浏览器访问

0200fb3a5f9391397f7b0b133db8d14c.png

3.脚本访问

response_get = requests.get(url='http://www.qianlima.com/zb/area_305')

print('status_code-',response_get.status_code)

status_code- 403

4.分析浏览器请求头部:User-Agent 用户代理

dee85bb2432133b123b6e171674e9a15.png

User-Agent:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36

定制头部提交数据

User-Agent: 用户代理

moz 火狐 (系统和版本)AppleWebKit 渲染引擎 chrome谷歌 safari苹果

head = {
    'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
}

response_get = requests.get(url='http://www.qianlima.com/zb/area_305')

print('status_code-',response_get.status_code)

response_get = requests.get(url='http://www.qianlima.com/zb/area_305',headers=head)

print('status_code-',response_get.status_code)

运行结果

status_code- 403
status_code- 200

requests框架六:代理IP

你的IP访问该网站次数频繁,IP异常,把你拉黑,再次访问404

VPN代理

电脑 - 香港 - 澳门的服务器 - 用链接的电脑代替你访问国外

proxies = {
    'http://':'51.79.160.101:8080',
    'https://':'165.225.226.119:10605',
}

response_get = requests.get(url='http://www.qianlima.com/zb/area_305',proxies=proxies,headers=head)

print('status_code-',response_get.status_code)

运行结果

status_code- 403
status_code- 200
status_code- 200

requests框架七:链接超时处理

挂你不链接:60s默认链接时间

timeout设置时间,如果超出时间,就直接报错,自杀

异常处理

try:
    proxies = {
    'http://':'51.79.160.101	:10605',
    'https://':'165.225.226.119:10605',
    }

    response_get = requests.get(url='温润雨声泪入茶',timeout=0.1,proxies=proxies,headers=head)

    print('status_code-',response_get.status_code)

except Exception:
    print('链接超时')

运行结果

链接超时

完整代码

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Wed Jul 29 15:44:35 2020

@author: lpf

@content: requests框架教程.py

"""


import requests


# 1.验证请求对象类型
response_get = requests.get(url='http://httpbin.org/get')

response_post = requests.post(url='http://httpbin.org/post')

print(type(requests),type(response_get))


# 2.请求提交参数

response_get = requests.get(url='http://httpbin.org/get',params={'user':'lpf'})

response_post = requests.post(url='http://httpbin.org/post',data={'user':'lpf'})


print('response_get',response_get.url)

print('response_post',response_post.url)


# 3.查看状态码和网页编码
response_get = requests.get(url='http://www.baidu.com')

print('status_code-',response_get.status_code)
print('encoding-',response_get.encoding)


# 4.获取网页源码 text字符流 content字节流(二进制)
print('text-',response_get.text)
print('content-',response_get.content)


# 5.反爬虫
head = {
    'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36'
}

response_get = requests.get(url='http://www.qianlima.com/zb/area_305')

print('status_code-',response_get.status_code)

response_get = requests.get(url='http://www.qianlima.com/zb/area_305',headers=head)

print('status_code-',response_get.status_code)


# 6.代理IP: ip:port
proxies = {
    'http://':'51.79.160.101	:8080',
    'https://':'165.225.226.119:10605',
}

response_get = requests.get(url='http://www.qianlima.com/zb/area_305',proxies=proxies,headers=head)

print('status_code-',response_get.status_code)


# 7.设置链接时间
# 挂你不链接:60s默认链接时间
# timeout设置时间,如果超出时间,就直接报错,自杀
# 异常处理
try:
    proxies = {
    'http://':'51.79.160.101	:10605',
    'https://':'165.225.226.119:10605',
    }

    response_get = requests.get(url='https://www.leirucha.com',timeout=0.1,proxies=proxies,headers=head)

    print('status_code-',response_get.status_code)

except Exception:
    print('链接超时')




27c9756dd45c21c351a8bc8753866e30.png
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值