Python爬虫之requests和bs4

最新推荐文章于 2023-11-25 13:24:46 发布

天府新青年

最新推荐文章于 2023-11-25 13:24:46 发布

阅读量1.3k

点赞数

分类专栏：语法练习豆瓣文章标签： python 爬虫

本文链接：https://blog.csdn.net/qq_43517596/article/details/119618873

版权

Python爬虫之requests和bs4

1. requests使用方法

1.1 发送请求

requests:Python基于http协议进行网络请求的第三方库

requests.get(url,,headers) - 发送get请求
requests.post(url,,headers) - 发送post请求

参数：
url - 请求地址（一个网站的网址，接口的地址，图片地址等）
headers - 设置请求头（设置cookie和User-Agent的时候用）
params - 设置参数
proxies - 设置代理

发送get请求，参数直接拼接到url中

requests.get(‘http://api.tianapi.com/auto/index?key=c9d408fefd8ed4081a9079d0d6165d43&num=10’)

发送post请求，参数设置在params中

params={
   
    'key':'c9d408fefd8ed4081a9079d0d6165d43',
    'num':10
}
requests.post('http://api.tianapi.com/auto/index',params=params)

1.2 获取响应信息

response=requests.get(‘http://www.yingjiesheng.com/’)

设置编码方式（乱码的时候才需要）

response.encoding=‘GBK’

获取响应头

print(response.headers)

获取响应体

a.获取text值（用于请求网页，直接拿到网页源代码）

print(response.text)

b.获取json解析结果（用于返回json数据的数据接口）

print(response.json())

c.获取content值（获取二进制类型的原数据，用于图片、视频、音频的下载）

print(response.content)

2.添加请求头

2.1 只添加User-Agent

headers={
   
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
 }
response=requests.get('https://www.51job.com/',headers=headers)
response.encoding='gbk'
print(response.text)

2.2 同时添加User-Agent和cookie

headers={
   
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36',
    'cookie':'自己的cookie码'
}

response=requests.get('https://www.zhihu.com/',headers=headers)
print(response.text)

3.json解析

import requests

response=requests.get('https://www.toutiao.com/hot-event/hot-board/?origin=toutiao_pc&_signature=_02B4Z6wo00f01k.O6AwAAIDDZESzymP6Zp5P6uyAAPLvqanBbyCJEJJP8E.2Pol60fAraR8Zuvkny9gsdVRqamqSqAjbC0WRO65XKkqkN3dsrKIyrPCcZVn40kghH6SLPb-hGdDuVDqJI1b6c8')

all_news=response.json()['data']
for news in all_news:
    print(news['Title'])
    print(news[

最低0.47元/天解锁文章

天府新青年

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python爬虫之requests和bs4

Python爬虫之requests和bs41. requests使用方法1.1 发送请求requests:Python基于http协议进行网络请求的第三方库requests.get(url,,headers) - 发送get请求requests.post(url,,headers) - 发送post请求参数：url - 请求地址（一个网站的网址，接口的地址，图片地址等）headers - 设置请求头（设置cookie和User-Agent的时候用）params
复制链接

扫一扫

专栏目录