

You meant well, you intended to be a good file custodian, but somewhere along the way things got out of hand and you’ve got duplicate photos galore. Don’t be afraid to delete them and lose important photos, read on as we show you how to clean safely.

您的意思是很好,您打算成为一名优秀的文件保管人,但在此过程中某个地方出了问题,并且有大量重复的照片。 不要害怕删除它们并丢失重要照片,请继续阅读,我们会向您展示如何安全清洁。

Deleting duplicate files, especially important ones like personal photos, makes a lot of people quite anxious (and rightfully so). Nobody wants to be the one to realize that they deleted all the photos of their child’s first birthday party during a hard drive purge gone wrong.

删除重复的文件,尤其是重要的文件,例如个人照片,会使很多人感到焦虑(正确的做法是)。 没有人愿意成为一个意识到在硬盘驱动器清除错误期间删除了孩子第一次生日聚会的所有照片的人。

In this tutorial we’re going to show you how to go beyond the limited reach of  tools which simply compare file names and file sizes. Instead we’ll be using a program that combines that kind of comparison with actual image analysis to help you weed out not just perfect 1:1 file duplicates but also those piles of resized for email images, cropped images, and other modified images that might be cluttering up your hard drive.

在本教程中,我们将向您展示如何超越仅比较文件名和文件大小的工具的有限范围。 取而代之的是,我们将使用一个将这种比较与实际图像分析相结合的程序,以帮助您不仅清除完美的1:1文件重复项,而且还清除那些为电子邮件图像,裁剪图像和其他可能修改的图像而调整大小的文件弄乱了硬盘。

我需要什么? (What Do I Need?)

For the following tutorial you’ll need the following tools:


  • Visipics (Windows XP or above / WINE compatible)

    Visipics (Windows XP或更高版本/ WINE兼容)

  • An internal or external hard drive to backup the entire collection you’ll be cleaning


We can’t emphasize the second entry in the list enough; it’s reckless to unleash any file-weeding application upon your files without a proper backup in place to restore files in case of error (user, application, or otherwise).

我们不能足够强调列表中的第二个条目。 如果没有适当的备份来恢复文件以防万一(用户,应用程序或其他方式出现错误),则不顾一切地将任何杂草应用程序释放到您的文件上是不计后果的。

备份文件和最佳做法 (Backing Up Your Files and Best Practices)


We just mentioned this, but it’s important enough to merit a separate entry in the guide. You must backup your files before continuing. Ideally this means copying all your image directories (no matter how cluttered or poorly organized they are) onto an external hard drive which can be disconnected from the primary machine during the image weeding process. At minimum you should at least copy the image directories to another hard drive within your machine and/or to another directory on the disk you’re working on.

我们刚刚提到了这一点,但是值得在指南中单独输入一个条目非常重要。 您必须先备份文件,然后再继续。 理想情况下,这意味着将所有映像目录(无论它们是否杂乱无章或无序组织)都复制到外部硬盘驱动器上,在图像除草过程中可以将其与主机断开连接。 至少您至少应将映像目录复制到计算机中的另一个硬盘驱动器和/或正在处理的磁盘上的另一个目录。

Whatever you choose to do (or can do, based on the hardware you have on hand) you should not proceed unless there is, at minimum, a copy of every photo you’re working with in a location that will not be touched by the application we’re using.


In addition to making sure you’re only working with one set of files (and the other is properly backed up) the other critical thing you want to do is to decide which directory is going to be the home directory and which directory is going to be the dupe directory.


Let’s say, for example, that you have a pile of photos in C:\Pictures\ and C:\Picture Dump\. Any duplicate file finder you use will find the dupes in either directory. What you don’t want to do is to start deleting duplicates from both directories as this breaks apart the sets/collections you have.

例如,假设您在C:\ Pictures \和C:\ Picture Dump \中有一堆照片。 您使用的任何重复文件查找器都会在两个目录中找到重复文件。 你什么 不想做的是开始从两个目录中删除重复项,因为这会破坏您拥有的集合/集合。

If there is a folder called 2011 Birthday in both folders, with the same files in both folders, if you don’t pay attention to the process and delete 5 dupes from the first 2011 Birthday folder and 5 dupes from the second one, you’ll end up with a split collection that is even messier than the original pile of dupes you had on your hands.

如果两个文件夹中都有一个名为2011 Birthday的文件夹,并且两个文件夹中的文件相同,那么如果您不注意此过程,则从第一个2011 Birthday文件夹中删除5个重复项,并从第二个文件夹中删除5个重复项,则最后将得到一个拆分的集合,该集合甚至比您手上的原始一堆骗子还要混乱。

Always check to see if there is a cluster of duplicate files and remove as many of them as you can, from the duplicate directory, while leaving the home directory’s files intact. This way, when you’re done, you’ll have the lest amount of work to do reincorporating the lost files in the secondary directory into your now dupe-free and mostly clean home directory.

始终检查是否存在一堆重复的文件,并从重复目录中删除尽可能多的重复文件,同时保留主目录的文件完整。 这样,完成后,您将花费最少的工作来将辅助目录中丢失的文件重新合并到现在无重复且几乎干净的主目录中。

Before continuing, ensure your files are backed up and that you have established which directory is going to be your home directory—the place where the files will remain untouched while the duplicates elsewhere will be purged.


安装和配置VisiPics (Install and Configure VisiPics)


VisiPics is a small, free, and easy to install app. Simply download it, run the installer, and accept the license agreement. Once the installer is done the application will launch.

VisiPics是一款小型,免费且易于安装的应用程序。 只需下载它,运行安装程序,然后接受许可协议即可。 安装程序完成后,应用程序将启动。

To configure VisiPics you need to specify which directories you wish to scan and how strictly you wish VisiPics to compare the files. Visipics is not a simple duplicate file-finder—it doesn’t restrict itself to simply comparing names, file sizes, or file hashes. Visipics specifically uses image analysis algorithms to compare photos and will (depending on the settings you select) even offer two photos as duplicates that are different sizes and resolutions but otherwise the same image.

要配置VisiPics,您需要指定要扫描的目录以及希望VisiPics比较文件的严格程度。 Visipics并不是简单的重复文件查找器,它并不局限于简单地比较名称,文件大小或文件哈希。 Visipics专门使用图像分析算法来比较照片,并且(取决于您选择的设置)甚至会提供两张作为大小和分辨率不同但图像相同的副本的照片。

First, let’s pick our directories. For the purpose of this demonstration we’ll be selecting two directories that we know have duplicate files in them. In our My Documents folder we have a folder called \Picture Dump\. We took this folder and copied the images to the E:\ drive to create our duplicate set. By clicking on File –> Add Folder (or by using the folder browser pane and the Add Arrow button) we can easily add the two folders to VisiPics like so:

首先,让我们选择目录。 为了演示的目的,我们将选择两个我们知道其中有重复文件的目录。 在我的文档文件夹中,有一个名为\ Picture Dump \的文件夹。 我们使用了该文件夹,并将图像复制到E:\驱动器以创建重复集。 通过单击“文件”->“添加文件夹” (或使用文件夹浏览器窗格和“添加箭头”按钮),我们可以轻松地将两个文件夹添加到VisiPics中,如下所示:


Now would be a good time to mention that VisiPics has a Project function which allows you to save all your settings in between sessions. If you’ve spent a bit of time selecting folders (or later, tweaking settings), you’ll definitely want to take a moment to go to  File –> Save Project and secure the resulting VSP project file in a place it won’t get accidently deleted.

现在是提提VisiPics具有项目功能的好时机,该功能使您可以在两次会话之间保存所有设置。 如果您花了一些时间选择文件夹(或稍后进行调整),则肯定要花一点时间转到“文件”->“保存项目” ,并将生成的VSP项目文件保护在不会出现的位置被误删除。

Once you have your folders selected, you can then move the folders up or down in the list in order to create prioritization for the auto-select tool. Your home directory should be the directory at the top—use the up and down arrows at the right side of the folder list to change the position of the folders. You can see the rules for Auto-Select by clicking on the Auto-Select tab. The default is to select uncompressed files, lower resolution files, and smaller files, first. You can uncheck any of these options to alter the behavior of the duplicate finder. Note: Auto-Select will never actually automatically select files unless you click the Auto-Select button.

一旦选择了文件夹,就可以在列表中上下移动文件夹,以便为自动选择工具创建优先级。 您的主目录应位于顶部目录中-使用文件夹列表右侧的向上和向下箭头更改文件夹的位置。 您可以通过单击“自动选择”选项卡来查看“自动选择”规则。 默认设置是首先选择未压缩的文件,较低分辨率的文件和较小的文件。 您可以取消选中任何这些选项以更改重复查找程序的行为。 注意:除非您单击“自动选择”按钮,否则“自动选择”将永远不会真正自动选择文件。

Once you have the directories picked out and prioritized,  you can run your initial test run. No files will be deleted, this test run will simply allow you to see if you need to adjust your filter settings for better results. Go ahead and press the green play arrow in the middle of the interface panel to begin the process. Depending on how many files you have this may take anywhere from a few minutes to an hour or more with large 20,000+ file collections.

选择好目录并确定优先级后,即可运行初始测试运行。 没有文件将被删除,此测试运行将使您简单地查看是否需要调整过滤器设置以获得更好的结果。 继续并按界面面板中间的绿色播放箭头开始该过程。 取决于您拥有多少个文件,这可能需要几分钟到一个小时甚至更长的时间,并且文件数量超过20,000个。


In the case of our test run, we have two directories. One on the C drive and one on the E drive. We purposely altered some of the files on the E drive (reduced the file size, altered the dimensions, and so on) to double check Visipics’ search algorithms. Visipic found all the duplicate files, including the files with different sizes, resolutions, and file names.

在我们的测试运行中,我们有两个目录。 一个在C驱动器上,一个在E驱动器上。 我们特意更改了E驱动器上的某些文件(减小了文件大小,更改了尺寸等),以仔细检查Visipics的搜索算法。 Visipic找到了所有重复的文件,包括大小,分辨率和文件名不同的文件。


More importantly, when we used the Auto-Select button, it accurately picked out the duplicate files from the non-prioritized directory first while still respecting the Auto-Select rules that instructed it to also flag the lower-quality files for deletion like so:



Now that you have your files scanned, and you’ve hit Auto-Select to see the files that are VisiPics’ best choices, you have several options. You can bulk delete or move the fills all at once by clicking the Move and Delete buttons in the Actions section located on the right hand side of the interface. We’d, however, recommend not firing off with the Delete button unless you’ve taken a moment to look over the results and confirm that the files are the ones you want deleted.

现在,您已经扫描了文件,并且已单击“自动选择”以查看是VisiPics最佳选择的文件,您有几种选择。 通过单击界面右侧“动作”部分中的“移动”和“删除”按钮,可以一次批量删除或移动所有填充。 但是,我们建议不要使用“删除”按钮启动,除非您花了一点时间查看结果并确认文件是要删除的文件。

Move allows you to take all the duplicate files and move them somewhere new, essentially creating a backup of the dupes. If you’ve pretty sure VisiPics has selected the best files but you want to error on the side of caution, move the files to a secondary directory or drive.

Move允许您获取所有重复文件并将它们移动到新位置,从本质上创建重复文件的备份。 如果您非常确定VisiPics选择了最好的文件,但是在警告方面还是要出错,请将文件移至辅助目录或驱动器。

Finally, the safest way to use Visipics (although it is by far the most time consuming) is to go down the list and check each file by hand. While this is the surest way to ensure there are no accidental deletions, on a large collection it is very time consuming. If you’re trying to sort out a mess of 15,000 duplicate photos we’d recommend using the Move function to back them up (or rely on the original backup you created earlier in the tutorial) and simply check the first few hundred images to ensure Visipics has sorted them according to your settings—after the initial check, let the application handle deleting the dupes.

最后,使用Visipics的最安全方法(尽管到目前为止是最耗时的)是从列表中查找并手动检查每个文件。 尽管这是确保没有意外删除的最可靠方法,但是对于大型集合而言,这非常耗时。 如果您要整理15,000张重复的照片,我们建议使用“移动”功能对其进行备份(或依赖于您之前在教程中创建的原始备份),并只需检查前几百张图像以确保Visipics已根据您的设置对它们进行了排序-初步检查后,让应用程序处理删除重复项。

If you do opt to hand-check the entire list of files, we’d strongly suggest taking advantage of the previously mentioned Save Project function so that you can save the entire process at any point and return to it later without having to rescan or reflag your photos.


Regardless of how much hand-checking or automation you use, when you’re done you’ll have a tidied directory with the highest quality versions of your images—without a duplicate in sight.


Have a tip, trick, or tool for ferreting out duplicate files? Share your knowledge in the comments below.

有提示,技巧或工具来复制重复文件吗? 在下面的评论中分享您的知识。

