爬虫第一天

最新推荐文章于 2024-09-16 18:02:08 发布

小爷爱呆桃啵啵奶茶

最新推荐文章于 2024-09-16 18:02:08 发布

阅读量191

点赞数

文章标签：爬虫 python 开发语言

本文链接：https://blog.csdn.net/m0_65216324/article/details/129802813

版权

本文介绍了Python的requests库的基本用法，包括创建虚拟环境的建议，使用requests.get发送请求，设置header如user-agent和cookie，处理编码问题，获取json数据，以及如何下载图片和音频文件。强调了在不同项目中使用独立虚拟环境的重要性。

摘要由CSDN通过智能技术生成

爬虫第一天

创建虚拟环境的建议：
学习的时候：一类项目一个虚拟环境（爬虫的虚拟环境、数据分析的虚拟环境….）
实际工作、做项目：一个项目一个虚拟环境

一、requests基本用法

请求网络数据：requests.get(请求地址)

response = requests.get('https://game.gtimg.cn/images/lol/act/img/js/heroList/hero_list.js')
response1 = requests.get('https://cd.zu.ke.com/zufang')

设置解码方式(乱码的是需要设置 - 一定要在获取请求结果之前设置)
```
response.encoding = 'utf-8'
```
获取请求结果

1)获取请求结果对应的文本数据 - 爬网页
```
import requests
print(response.text)
```
2)获取二进制格式的请求结果 - 下载图片、视频、音频
```
print(response.content)
```
3)获取请求结果json转换的结果 - json接口
```
print(response.json())
for i in response.json()['hero']:
    print(i['name'])
```

二、添加请求

发送请求：

添加header:a.浏览器伪装(user-agent)、b.免密登录(cookie)、c.设置代理(proxies)

import requests
headers={
        'cookie': 'bid=58Gyjz_NAcA; ll="118318"; douban-fav-remind=1; viewed="36164018_36221918"; ap_v=0,6.0',
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36'

}
response = requests.get('https://movie.douban.com/top250',headers=headers)

获取结果

result = response.text
print(result)

三、下载图片

获取网络图片数据

import requests
response = requests.get('https://file.gamefk.com/2021/0927/3406ae05f8d674e2b93aaed58fd092a9.jpg')
result = response.content
print(type(result))

保存数据到本地文件

with open('files/a.jpg','wb') as f:
    f.write(result)

response1 = requests.get('https://game.gtimg.cn/images/lol/act/img/vo/choose/1.ogg')
result = response1.content
print(type(result))

with open('files/aa.mp3','wb') as f:
    f.write(result)