SAR舰船检测数据集SSDD的训练集和检测集划分代码

本文链接：https://blog.csdn.net/qq_39910747/article/details/124940519

本文介绍了经典SARShipDetectionDataset的数据集划分策略，强调了保持训练集和测试集分布一致性的必要性。通过脚本展示了如何将样本按文件编号规则划分为训练集、验证集和测试集，以确保特征学习的有效性和结果稳定性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言

SAR舰船检测数据集SSDD(SAR Ship Detection Dataset) 可以说是比较经典的数据集了，在 SAR Ship Detection Dataset (SSDD): Official Release and Comprehensive Data Analysis 里有这样一段话

The images with the last digits of the file number 1 and 9 are uniquely determined as the test set, and the rest are regarded as the training set. Such a rule can also maintain the distribution consistency of the training set and test set, which is conducive to network feature learning.

大致翻译一下，就是文件编号为1和9的最后一位的图像被严格确定为测试集，其余图像被视为训练集（本人注释：包括验证集）。这样的规则还可以保持训练集和测试集分布的一致性，有利于网络特征学习。

毕竟样本太少了，只有1160个，随机划分可能会破坏训练集和测试集之间的分布一致性，导致结果不一样。而且，对于每一个样本都是十分珍贵的。但是对于训练集和验证集的划分，论文并没有给出明确的规定。但是给出了一个建议是建立交叉验证集。这里我是给出了尾号8作为验证集，这样验证集中就包括了近岸和远海目标。

所以写了个脚本分一下训练集和检测集。

代码

suffix_1 = list(range(1,1160,10))
suffix_9 = list(range(9,1160,10))
suffix_8 = list(range(8,1160,10)) # 验证集不想用尾号8可以改
suffix_1_9 = suffix_1+suffix_9
suffix_1_9.sort()
#-----------------------test---------------------#
test = [str(i).zfill(6) for i in suffix_1_9]

with open("ImageSets/Main/test.txt", 'w') as f:
    for i in test:
        f.write(i+'\n')
#-------------------train&val--------------------#
suf_not_1_9 = []

for i in list(range(1,1161)):
    if i not in suffix_1_9:
        suf_not_1_9.append(i)

trainval = [str(i).zfill(6) for i in suf_not_1_9]

with open("ImageSets/Main/trainval.txt", 'w') as f:
    for i in trainval:
        f.write(i+'\n')
#-----------------val----------------------------#
val = [str(i).zfill(6) for i in suffix_8]

with open("ImageSets/Main/val.txt", 'w') as f:
    for i in val:
        f.write(i+'\n')
#-----------------train--------------------------#
suf_not_1_8_9 = []

for i in suf_not_1_9:
    if i not in suffix_8:
        suf_not_1_8_9.append(i)

train = [str(i).zfill(6) for i in suf_not_1_8_9]

with open("ImageSets/Main/train.txt", 'w') as f:
    for i in train:
        f.write(i+'\n')