Python 图片去重，删除重复图片

蜡笔小新星

于 2024-08-29 15:40:33 发布

阅读量208

点赞数 6

文章标签： python 哈希算法开发语言经验分享学习算法

本文链接：https://blog.csdn.net/m0_54490473/article/details/141602229

版权

删除文件夹中相似图片的任务比较复杂，因为需要定义“相似”的标准，并且这通常涉及到图像内容分析。一种常见的做法是使用图像哈希（如感知哈希、平均哈希等）来比较图像之间的相似度。在Python中，可以使用一些库如Pillow（PIL的更新版）来处理图像，以及ImageHash库来生成图像哈希。

下面是一个基本的步骤说明，展示如何使用Python和这些库来识别并删除相似图片：

步骤 1: 安装必要的库

首先，你需要安装Pillow和ImageHash库。你可以使用pip来安装它们：

pip install Pillow imagehash

步骤 2: 编写代码以计算文件夹中所有图片的哈希值

接下来，你需要编写一个脚本来遍历文件夹中的所有图片，计算每张图片的哈希值，并将它们存储在一个字典中，键为哈希值，值为图片路径列表。

from PIL import Image
import imagehash
import os

def image_hash_dict(folder_path, hash_size=8):
    hash_dict = {}
    for root, dirs, files in os.walk(folder_path):
        for file in files:
            if file.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')):
                img_path = os.path.join(root, file)
                try:
                    with Image.open(img_path) as img:
                        img_hash = imagehash.average_hash(img, hash_size=hash_size)
                        if img_hash in hash_dict:
                            hash_dict[img_hash].append(img_path)
                        else:
                            hash_dict[img_hash] = [img_path]
                except Exception as e:
                    print(f"Error processing {img_path}: {e}")
    return hash_dict

# 使用示例
folder_path = '/path/to/your/images'
hash_dict = image_hash_dict(folder_path)

步骤 3: 识别并删除相似图片

在这个步骤中，你需要遍历哈希字典，找到具有多个图片路径的哈希值（表示这些图片相似），并决定哪些图片需要被删除。这通常需要你根据具体需求来编写逻辑，比如只保留第一张或最后一张，或者根据文件名、大小等其他标准来选择。

def delete_similar_images(hash_dict, keep_first=True):
    for hash_value, paths in hash_dict.items():
        if len(paths) > 1:
            to_delete = paths[1:] if keep_first else paths[:-1]
            for path in to_delete:
                try:
                    os.remove(path)
                    print(f"Deleted: {path}")
                except Exception as e:
                    print(f"Failed to delete {path}: {e}")

# 使用示例
delete_similar_images(hash_dict)

注意：删除文件是一个不可逆的操作，所以在执行删除操作之前，请确保你已经做好了备份，并且确实想要删除这些文件。此外，上面的代码只是一个基本的示例，你可能需要根据自己的具体需求对其进行调整和优化。

完整示例和解析

# 导入Pillow库，用于图像处理  
from PIL import Image  
  
# 导入imagehash库，用于生成图像哈希  
import imagehash  
  
# 导入os库，用于操作系统功能，如文件路径操作和删除文件  
import os  
  
# 定义一个函数，用于计算文件夹中所有图片的哈希值，并返回一个字典  
# 字典的键是哈希值，值是具有相同哈希值的图片路径列表  
def image_hash_dict(folder_path, hash_size=8):  
    hash_dict = {}  # 初始化一个空字典来存储哈希值和图片路径的映射  
    for root, dirs, files in os.walk(folder_path):  # 遍历指定文件夹及其子文件夹  
        for file in files:  # 遍历当前文件夹中的文件  
            if file.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif')):  # 检查文件是否是图片文件  
                img_path = os.path.join(root, file)  # 构建图片的完整路径  
                try:  
                    with Image.open(img_path) as img:  # 使用Pillow打开图片  
                        img_hash = imagehash.average_hash(img, hash_size=hash_size)  # 计算图片的哈希值  
                        if img_hash in hash_dict:  # 如果哈希值已经在字典中  
                            hash_dict[img_hash].append(img_path)  # 将图片路径添加到对应的哈希值列表中  
                        else:  
                            hash_dict[img_hash] = [img_path]  # 如果哈希值不在字典中，则创建一个新条目  
                except Exception as e:  # 如果在打开或处理图片时发生错误  
                    print(f"Error processing {img_path}: {e}")  # 打印错误信息  
    return hash_dict  # 返回包含哈希值和图片路径的字典  
  
# 调用image_hash_dict函数，并传入图片文件夹的路径  
folder_path = '/path/to/your/images'  
hash_dict = image_hash_dict(folder_path)  
  
# 定义一个函数，用于删除具有相同哈希值的图片中的重复项  
# 参数keep_first决定是否保留每组相似图片中的第一张（True）或最后一张（False）  
def delete_similar_images(hash_dict, keep_first=True):  
    for hash_value, paths in hash_dict.items():  # 遍历哈希字典  
        if len(paths) > 1:  # 如果某个哈希值对应多个图片路径（即存在相似图片）  
            to_delete = paths[1:] if keep_first else paths[:-1]  # 确定要删除的图片路径列表  
            for path in to_delete:  # 遍历要删除的图片路径  
                try:  
                    os.remove(path)  # 删除文件  
                    print(f"Deleted: {path}")  # 打印已删除的文件路径  
                except Exception as e:  # 如果删除时发生错误  
                    print(f"Failed to delete {path}: {e}")  # 打印错误信息  
  
# 调用delete_similar_images函数，并传入之前计算得到的哈希字典  
# 这里选择保留每组相似图片中的第一张  
delete_similar_images(hash_dict)

蜡笔小新星

关注

6
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python 图片去重，删除重复图片

删除文件夹中相似图片的任务比较复杂，因为需要定义“相似”的标准，并且这通常涉及到图像内容分析。一种常见的做法是使用图像哈希（如感知哈希、平均哈希等）来比较图像之间的相似度。在Python中，可以使用一些库如Pillow（PIL的更新版）来处理图像，以及ImageHash库来生成图像哈希。
复制链接

扫一扫