python爬虫（11）之BeautifulSoup模块

最新推荐文章于 2024-05-04 16:40:09 发布

过度引用

最新推荐文章于 2024-05-04 16:40:09 发布

阅读量547

点赞数 7

分类专栏： python 文章标签： python 爬虫 beautifulsoup

本文链接：https://blog.csdn.net/m0_61885507/article/details/136747589

版权

python 专栏收录该内容

39 篇文章 0 订阅

订阅专栏

1、模块介绍

所谓BeautifulSoup模块是通过html源代码进行筛选类似于正则表达式那种类型

2、代码

import os
import requests
from bs4 import BeautifulSoup
from PIL import Image
from io import BytesIO

headers = {'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0Safari/537.36 Edg/122.0.0.0'}
res = requests.get('https://www.douban.com/')
output_dir = 'downloaded_images'
os.makedirs(output_dir, exist_ok=True)
soup = BeautifulSoup(res.text, 'html.parser')
img_tags = soup.find_all('img')
for idx, img in enumerate(img_tags):
    img_url = img.get('src')
    if not img_url:
        continue
    try:
        response = requests.get(img_url)
        response.raise_for_status()  # 确保请求成功
        img_name = f'image_{idx}.jpg'
        img_path = os.path.join(output_dir, img_name)
        with open(img_path, 'wb') as file:
            file.write(response.content)
        print(f"图片 {img_name} 已下载")

        try:
            image = Image.open(BytesIO(response.content))
            image.verify()  # 验证图片是否损坏
            print(f"图片 {img_name} 通过自动检测")
        except Exception as e:
            print(f"图片 {img_name} 自动检测失败：{e}")
            os.remove(img_path)
    except Exception as e:
        print(f"下载图片失败：{e}")

具体就长这个样子（这里res.text是做了一个转化在其他的地方可能会把res中去，后面那个是指定的解释器）

过度引用

关注

7
点赞
踩
6

收藏

觉得还不错? 一键收藏
打赏
0
评论
python爬虫（11）之BeautifulSoup模块

具体就长这个样子（这里res.text是做了一个转化在其他的地方可能会把res中去，后面那个是指定的解释器）print(f"图片 {img_name} 自动检测失败：{e}")response.raise_for_status() # 确保请求成功。所谓BeautifulSoup模块是通过html源代码进行筛选类似于正则表达式那种类型。print(f"图片 {img_name} 通过自动检测")print(f"图片 {img_name} 已下载")print(f"下载图片失败：{e}")
复制链接

扫一扫