mockmvc get请求 tm的一直404_Requests库的get()方法

最新推荐文章于 2023-03-23 20:40:12 发布

weixin_39978101

最新推荐文章于 2023-03-23 20:40:12 发布

阅读量405

点赞数

文章标签： mockmvc get请求 tm的一直404

本文链接：https://blog.csdn.net/weixin_39978101/article/details/111614009

版权

Request库最常用的方法，就是get()方法。

获取一个网页，最简单的代码就是

r=requests.get(url) #url输入目标网址

requests.get(url)方法就是构造一个向服务器请求资源的Request对象，这个对象是Request库内部生成的。需要注意的是，Python语言对大小写敏感的，Request对象的R是大写的。

而返回的内容，则是Response，这个Response对象返回包含了整个服务器的资源。

requests.get()完整形式为

requests.get(url,params=None,**kwargs)

url: 想要获取的网页的链接。

params: url的额外参数，字典或字节流格式，可选。

**kwargs: 12个控制访问的参数。

Request库有两个重要对象，分别是Request和Response。Request对象对应的是请求，向目标网址发送一个请求访问服务。而Response对象，是包含了爬虫返回的内容。

实例：

import requests
#get()获取网页
r = requests.get('https://www.baidu.com')
# 检查连接状态
print(r.status_code)
# 检测r的类型
print(type(r))
# 获取页面的头部信息
print(r.headers)

返回的内容为：

200
<class 'requests.models.Response'>
{'Content-Encoding': 'gzip', 'Content-Length': '1145', 'Content-Type': 'text/html', 'Server': 'bfe', 'Date': 'Tue, 24 Mar 2020 07:31:58 GMT'}

Response对象的属性，有以下几种

r.status_code： HTTP请求的返回状态，200表示连接成功，404表示失败

2. r.text： HTTP响应内容的字符串形式，即，ur对应的页面内容

3. r.encoding：从HTTP header中猜测的响应内容编码方式

4. r.apparent_encoding：从内容中分析出的响应内容编码方式（备选编码方式）

5. r.content： HTTP响应内容的二进制形式

这几个属性，都是访问网页时，必要的属性。

下面我们来阐述一下访问并爬取一个网页的主要步骤。

第一步，导入Requests库。

第二步，使用get()方法获取一个url。

第三部，检查连接状态。

第四步，如果状态码是200，那么我们就可以用Response属性来获取网页信息（如：r.text, r.encoding,r.content）；如果是404或者其他状态码，那可能是某些原因出错产生的异常。

实例：

# 1.导入requests
import requests
# 2.get()获取网页
r = requests.get('https://www.baidu.com')
# 3.检查连接状态
print(r.status_code)
# 4.查看内容
print(r.text)

运行结果：

200
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>ç¾åº¦ä¸ä¸ï¼ä½ å°±ç¥é</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=ç¾åº¦ä¸ä¸ class="bg s_btn" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>æ°é»</a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>å°å¾</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>è§é¢</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>è´´å§</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>ç»å½</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">ç»å½</a>');
                </script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">æ´å¤äº§å</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>å³äºç¾åº¦</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>ä½¿ç¨ç¾åº¦åå¿è¯»</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>æè§åé¦</a>&nbsp;äº¬ICPè¯030173å·&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

我们可以看到，整个网页内容都被爬下来了，但也发现了一个问题，内容出现了乱码

我们查看一下编码

# 查看编码
print(r.encoding)

运行结果：

ISO-8859-1

再用另外一个查看代码：

print(r.apparent_encoding)

运行结果：

utf-8

这时我们分析出内容的编码应该是utf-8格式，那么我们可以用它来替换刚开始的编码

# 替换编码
r.encoding='utf-8'
# 打印内容
print(r.text)

运行结果：

<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/bdorz/baidu.min.css><title>百度一下，你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus=autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn" autofocus></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=https://www.hao123.com name=tj_trhao123 class=mnav>hao123</a> <a href=http://map.baidu.com name=tj_trmap class=mnav>地图</a> <a href=http://v.baidu.com name=tj_trvideo class=mnav>视频</a> <a href=http://tieba.baidu.com name=tj_trtieba class=mnav>贴吧</a> <noscript> <a href=http://www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');
                </script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使用百度前必读</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a>&nbsp;京ICP证030173号&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>

我们可以看到内容能正常显示了。

为什么会这样呢？因为每个网站都有自己的编码，目的是为了能让目标文字能显示，供人阅读。

r.encoding中，如果一个网站的header部分不存在charset，则认为编码为ISO-8859-1。这就是为什么我们访问百度会是这样的编码，然而这个却不能支持中文显示。

所以还有另外的一个方法使用：r.apparent_encoding，这个方法主要通过网页内容，来分析出该网站的可能编码格式，所以，当网页未能正常显示内容时，我们就能使用这个找到该网页的编码。

（笔记）

weixin_39978101

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
mockmvc get请求 tm的一直404_Requests库的get()方法

Request库最常用的方法，就是get()方法。获取一个网页，最简单的代码就是r=requests.get(url) #url输入目标网址requests.get(url)方法就是构造一个向服务器请求资源的Request对象，这个对象是Request库内部生成的。需要注意的是，Python语言对大小写敏感的，Request对象的R是大写的。而返回的内容，则是Response，这个Response...
复制链接

扫一扫