requests，beautifulsoup

最新推荐文章于 2020-07-28 23:46:47 发布

liu_liuqiu

最新推荐文章于 2020-07-28 23:46:47 发布

阅读量117

点赞数

分类专栏： python

原文链接：http://www.cnblogs.com/wupeiqi/articles/6283017.html

版权

python 专栏收录该内容

34 篇文章 1 订阅

订阅专栏

http://www.cnblogs.com/wupeiqi/articles/6283017.html

requests模块

User-Agent ：当前客户端类型
Referer：前一次访问的网址
在这里插入图片描述
requests.Session() 用于保存客户端历史访问信息

方法关系：
requests.request() 内部参数解析

method：提交方式
url：提交地址
params：在URL中传递的参数，GET
requests.request(
method = ‘GET’,
url = ‘http://…’,
params = {‘k1’:‘v1’,‘k2’:‘v2’}
)
# http://…?k1=v1&k2=v2
data：在请求体里传递的数据（）
requests.request(
method = ‘POST’,
url = ‘http://…’,
params = {‘k1’:‘v1’,‘k2’:‘v2’} ,
data = {‘user’:‘aaa’,‘pwd’:123}
)
json：在请求体里传递的数据
requests.request(
method = ‘POST’,
url = ‘http://…’,
params = {‘k1’:‘v1’,‘k2’:‘v2’} ,
json = {‘user’:‘aaa’,‘pwd’:123}
)
==PS：==字典中嵌套字典时必须使用json传递数据
cookies：
headers：请求头
requests.request(
method = ‘POST’,
url = ‘http://…’,
params = {‘k1’:‘v1’,‘k2’:‘v2’} ,
json = {‘user’:‘aaa’,‘pwd’:123}
headers = {
‘Referer’:‘http://’, #记录上次登录的网站，可以伪造
‘User-Agent’:"(Windows NT)…" #客户端类型
}
)
files：上传文件，定制文件名
files = {
‘f1’?‘aa.txt’,open())} #其中 aa.txt 是文件名
auth：基本认证（headers中加入加密的用户名和密码）
timeout：请求和响应的超时时间
allow_redirects：是否允许重定向 True/False
proxies：代理
url = ‘http://…’,
data = ‘aaaa’,
proxies = {‘http’:‘http://…’} #data先post到代理，代理发送到url
cert：证书文件 .pem .key .
verify： False时会忽略SSL证书的存在
stream： if “False” ，the response content will be immediately downloaded.一点一点拿结果

安装：pip install requests

response = requests.get(
    url='https://www.autohome.com.cn/news/'
)
response.encoding = response.apparent_encoding

总结：
response = requests.get(‘URL’)
response.text #文本
response.content #字节码
response.encoding
response.aparent_encoding #当前编码形式
response.status_code #状态码

beautifulsoup模块

pip install beautifulsoup4

from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text,features='html.parser')
target = soup.find(id='auto-channel-lazyload-article')		#对象
li_list = target.find_all('li')		#列表

总结：
find_all() 方法的返回结果是列表,而 find() 方法直接返回结果
find_all 内参数的用法参考：https://blog.csdn.net/depers15/article/details/51934210
soup = BeautifulSoup(’…’,features=‘html.parser’)
v1 = soup.find(‘div’) 返回对象
soup.find(id=‘li’)
soup.find(‘div’,id=‘li’)
v2 = soup.find_all(‘div’) 返回对象列表

obj1 = v1			
obj2 = v2[0]		#索引取对象

obj1.attrs		获取属性，之后可用.get()或者[]索引获取相应值
obj1.text		获取文本

liu_liuqiu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
requests，beautifulsoup

requests模块pip install requestsresponse = requests.get( url='https://www.autohome.com.cn/news/')response.encoding = response.apparent_encoding总结：response = requests.get(‘URL’)response.text ...
复制链接

扫一扫