本文将概述如何使用感知图像哈希来检测重复和接近重复的图像。
安装依赖
安装txtai和所有依赖项。
pip install txtai[pipeline] textdistance
!wget -N https://github.com/neuml/txtai/releases/download/v3.5.0/tests.tar.gz
tar -xvzf tests.tar.gz
生成哈希
下面的示例为图像列表生成感知图像哈希。
import glob
from PIL import Image
from txtai.pipeline import ImageHash
def show(image):
width, height = image.size
return image.resize((int(width / 2.25), int((width / 2.25) * height / width)))
Get and scale images
images = [Image.open(image) for image in glob.glob(‘txtai/*jpg’)]
Create image pipeline
ihash = ImageHash()
Generate hashes
hashes = ihash(images)
hashes
[‘000000c0feffff00’,
‘0859dd04ffbfbf00’,
‘78f8f8d8f8f8f8f0’,
‘0000446c6f2f2724’,
‘ffffdf0700010100’,
‘00000