近期需要做作ssd目标检测,因此首先要先构建VOC数据集。
本文主要是对已经有的标注数据进行 Augmentation
<<COMMENT #for comment a section in shell script
# this shell scirpt for data augmnetation
# created by bingolwang
# Date: 2016-7-6
# method for augmentation src img : ratate srcImage for 90 180 270 angle
# any question contact sa615168@mail.ustc.edu.cn Ples
COMMENT #for comment a section in shell script
cat imageInfo.txt |awk '{print $1}' > image_to_path.txt
cp imageInfo.txt imageInfo.txt.cache
sed -i 's/.jpg//g' imageInfo.txt.cache
sed -i 's/.jpg//g' image_to_path.txt
cat image_to_path.txt |while read line
do
# Note: anti-clock-wise rotate images , and linux cmd `convert`
# ratate images for clock-wise by default
convert ${line}.jpg -rotate 90 ${line}_270.jpg & # & for parallel
convert ${line}.jpg -rotate 180 ${line}_180.jpg &
convert ${line}.jpg -rotate 270 ${line}_90.jpg
done
# create trainval.txt & test.val
cat imageInfo.txt.cache | while read line
do
imagePath=""
width=""
height=""
xmin=""
ymin=""
xmax=""
ymax=""
eval $(echo $line|awk '{printf("imagePath=%s; width=%s;height=%s;xmin=%s;ymin=%s;xmax=%s;ymax=%s;"),$1,$2,$3,$4,$5,$6,$7}' |tr -d '\r')
# 其中tr -d '\r' 这句话必须有,否则会影响到所有用刀ymax的变量。
# 这里要注意:这个ymax变量由于是数据的行的末尾,它本身包含了一个换行符。
#tr -d '\r' 就是为了删除 换行符。只剩下字符本身
# if $xmin < 0 xmin=0
if [ `echo "$xmin<"0""|bc` -eq 1 ] # NOte: if backSpace usage in shell script
then
xmin="0"
fi
# if $ymin < 0 ymin=0
if [ `echo "$ymin<"0""|bc` -eq 1 ]
then
ymin="0"
fi
# if $xmax > width xmax=width
if [ `echo "$xmax>$width"|bc` -eq 1 ]
then
xmax=$width
fi
# if $ymax > height ymax=height
if [ `echo "$ymax>$height"|bc` -eq 1 ]
then
ymax=$height
fi
echo ${imagePath}_180.jpg $width $height $(echo "$width-$xmax"|bc) $(echo "$height-$ymax"|bc) $(echo "$width-$xmin"|bc) $(echo "$height-$ymin"|bc) >> 180.txt
echo ${imagePath}_90.jpg $height $width $ymin $(echo "$width-$xmax"|bc) $ymax $(echo "$width-$xmin-"1""|bc)>> 90.txt
echo ${imagePath}_270.jpg $height $width $(echo "$height-$ymax"|bc) $xmin $(echo "$height-$ymin-"1""|bc) $xmax >> 270.txt
done
cat imageInfo.txt 90.txt 180.txt 270.txt > result.txt
rm 180.txt 270.txt 90.txt
小结:这个shell脚本涉及:if语句,图像旋转 ,浮点数字比较,逐行读取结构化数据进行处理,并行化执行shell脚本, shell变量,shell批量注释,浮点数的运算,等技术。花费了1天的时间编写,希望对网友,以及自己有参考价值。
#附录
imageInfo.txt 的数据格式样例:
#path/to/image.jpg width height xmim ymin x_max y_max
VOC2007/JPEGImages/001.jpg 558 500 40 128 387 402
VOC2007/JPEGImages/002.jpg 1024 759 10.934 106.677 1023 758
VOC2007/JPEGImages/003.jpg 768 1024 50.9924 432.736 599.747 776.405
VOC2007/JPEGImages/ff3.jpg 677 500 130 46 533 374
VOC2007/JPEGImages/496.jpg 1123 499 109 40 950 364
VOC2007/JPEGImages/ff5.jpg 576 1024 0.0 282.261 532.137 622.122
VOC2007/JPEGImages/ff7.jpg 3024 3024 125 1320 2712 2566
VOC2007/JPEGImages/b66.jpg 600 469 37 53 585 427
VOC2007/JPEGImages/3e4.jpg 575 1024 21.5149 240.66 386.873 800.216
VOC2007/JPEGImages/41c.jpg 768 1024 57.1591 7.87365 670.6 1010.31
VOC2007/JPEGImages/199.jpg 540 960 107 286 462 844
VOC2007/JPEGImages/070.jpg 480 640 24 178 453 446
VOC2007/JPEGImages/fc0.jpg 3024 4032 788 1192 2396 3751
shell批量注释
<<COMMENT
shell script section...
shell script section...
COMMENT