快速下载COCO2014数据集的脚本

#!/bin/bash

start=`date +%s`

# handle optional download dir
if [ -z "$1" ]
  then
    # navigate to ~/data
    echo "navigating to ~/data/ ..."
    mkdir -p ~/data
    cd ~/data/
    mkdir -p ./coco
    cd ./coco
    mkdir -p ./images
    mkdir -p ./annotations
  else
    # check if specified dir is valid
    if [ ! -d $1 ]; then
        echo $1 " is not a valid directory"
        exit 0
    fi
    echo "navigating to " $1 " ..."
    cd $1
fi

if [ ! -d images ]
  then
    mkdir -p ./images
fi

# Download the image data.
cd ./images
echo "Downloading MSCOCO train images ..."
curl -LO http://images.cocodataset.org/zips/train2014.zip
echo "Downloading MSCOCO val images ..."
curl -LO http://images.cocodataset.org/zips/val2014.zip

cd ../
if [ ! -d annotations]
  then
    mkdir -p ./annotations
fi

# Download the annotation data.
cd ./annotations
echo "Downloading MSCOCO train/val annotations ..."
curl -LO http://images.cocodataset.org/annotations/annotations_trainval2014.zip
echo "Finished downloading. Now extracting ..."

# Unzip data
echo "Extracting train images ..."
unzip ../images/train2014.zip -d ../images
echo "Extracting val images ..."
unzip ../images/val2014.zip -d ../images
echo "Extracting annotations ..."
unzip ./annotations_trainval2014.zip

echo "Removing zip files ..."
rm ../images/train2014.zip
rm ../images/val2014.zip
rm ./annotations_trainval2014.zip

echo "Creating trainval35k dataset..."

# Download annotations json
echo "Downloading trainval35k annotations from S3"
curl -LO https://s3.amazonaws.com/amdegroot-datasets/instances_trainval35k.json.zip

# combine train and val 
echo "Combining train and val images"
mkdir ../images/trainval35k
cd ../images/train2014
find -maxdepth 1 -name '*.jpg' -exec cp -t ../trainval35k {} + # dir too large for cp
cd ../val2014
find -maxdepth 1 -name '*.jpg' -exec cp -t ../trainval35k {} +


end=`date +%s`
runtime=$((end-start))

echo "Completed in " $runtime " seconds"

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
你可以使用以下脚本来划分COCO数据集: ```python import os import json import random from shutil import copyfile def split_coco_dataset(coco_dir, train_percentage, val_percentage, output_dir): # 创建输出目录 os.makedirs(output_dir, exist_ok=True) # 定义划分百分比 assert train_percentage + val_percentage < 1, "训练集和验证集的百分比之和不能大于1" test_percentage = 1 - train_percentage - val_percentage # 读取COCO数据集的标注文件 annotation_path = os.path.join(coco_dir, 'annotations', 'instances_trainval.json') with open(annotation_path, 'r') as f: coco_data = json.load(f) # 获取图像列表 image_list = coco_data['images'] random.shuffle(image_list) # 计算划分数量 num_images = len(image_list) num_train = int(num_images * train_percentage) num_val = int(num_images * val_percentage) # 划分图像列表 train_images = image_list[:num_train] val_images = image_list[num_train:num_train+num_val] test_images = image_list[num_train+num_val:] # 将图像复制到相应的划分目录中 for image in train_images: src_file = os.path.join(coco_dir, 'train2017', image['file_name']) dst_file = os.path.join(output_dir, 'train', image['file_name']) copyfile(src_file, dst_file) for image in val_images: src_file = os.path.join(coco_dir, 'train2017', image['file_name']) dst_file = os.path.join(output_dir, 'val', image['file_name']) copyfile(src_file, dst_file) for image in test_images: src_file = os.path.join(coco_dir, 'train2017', image['file_name']) dst_file = os.path.join(output_dir, 'test', image['file_name']) copyfile(src_file, dst_file) ``` 你可以调用 `split_coco_dataset` 函数来划分数据集。`coco_dir` 参数是COCO数据集的根目录,`train_percentage` 和 `val_percentage` 是你想要分配给训练集和验证集的百分比,`output_dir` 是划分后的数据集存储目录。确保在运行脚本之前,已经在指定的输出目录中创建了 `train`、`val` 和 `test` 子目录。 请注意,该脚本将仅复制图像文件,而不会处理标注文件。如果你还想要处理标注文件以匹配划分后的图像,请相应地修改代码。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值