Python爬虫

最新推荐文章于 2024-10-02 10:53:34 发布

yviul

最新推荐文章于 2024-10-02 10:53:34 发布

阅读量180

点赞数

文章标签： python 爬虫开发语言

本文链接：https://blog.csdn.net/YuanXiaoHei88/article/details/125608287

版权

文章目录

爬虫
requests

爬虫

网络爬虫（又被称为网页蜘蛛，网络机器人）就是模拟浏览器发送网络请求，接收请求响应，一种按照一定的规则，自动地抓取互联网信息的程序。
爬虫流程分为三步：
1.爬取网页。
2.根据前端的语法或者正则表达式提取数据。
3.保存网页。

requests

第三方库中的模块，通过网址向服务器发送请求，等待服务器的响应结果。

安装requests

pip install requests

获取网页

一般网站都有反爬机制，因此我们需要做最基本的UA伪装。

URL = 'https://www.bilibili.com/'
# 向B站发送请求，获取B站服务器的响应结果
response = requests.get(url=URL)
print(response)
# status_code:状态码 --> 判断服务器和网页状态
print(response.status_code)
# text：页面源代码（字符串类型）
print(response.text, type(response.text))

图片下载

import requests
URL = 'https://game.gtimg.cn/images/yxzj/coming/v2/skins//image/20220623/16559919637365.jpg'
response = requests.get(url=URL)
if response.status_code == 200:
    print(response.content)
    # 将图片写入本地文件
    photo = open('1.jpg', 'wb')
    photo.write(response.content)
    photo.close()
else:
    print(f'状态码：{response.status_code}')