requests库简单抓取链接

最新推荐文章于 2023-05-13 21:04:54 发布

.Passion

最新推荐文章于 2023-05-13 21:04:54 发布

阅读量450

点赞数

本文链接：https://blog.csdn.net/qq_43923045/article/details/103863993

版权

Python爬虫专栏收录该内容

21 篇文章 0 订阅

订阅专栏

import requests

headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
    "Accept-Encoding": "gzip, deflate",
    "Accept-Language": "zh-CN,zh;q=0.9",
    "Host": "httpbin.org",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"
}
url = 'https://www.mzitu.com/'
resp = requests.get(url, headers=headers).text
# print(resp)
# <img class='lazy' src='https:/xx'/>
# 爬取图片连接
import re
p = re.compile("<img.*?src='(.*?)'.*?/>")
res = re.findall(p,resp)
for i in res:
    print(i)

print(len(res))

要点：使用 .*? 作非贪婪匹配，然后抽取 src链接

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

.Passion

关注关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
requests库简单抓取链接

import requestsheaders = { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3", "Accept-Encoding": "gzip, defla...
复制链接

扫一扫