标注文件格式调研

最新推荐文章于 2024-01-08 09:59:51 发布

雪急飞绪

最新推荐文章于 2024-01-08 09:59:51 发布

阅读量663

点赞数

分类专栏：调研

本文链接：https://blog.csdn.net/qq_38689395/article/details/105714010

版权

调研专栏收录该内容

5 篇文章 0 订阅

订阅专栏

类型标签

格式：label_name <prefix>input_type=attribute_name:attribute_value1,attribute_value2

可以指定多个标签和属性，使用空格分开
label_name:大类
<prefix>:
- @ 独特的属性不能更改，比如：遮挡
- ~ 可变的临时属性，比如：截断
input_type:可以选择select、checkbox、radio、number、text
attribute_name: 比如：限制速度
attribute_value: 比如：限制速度60
select和radio输入属性，可以使用特殊值：__undefined__

举例：

car person bike 三个没有属性的标签
vehicle @select=type:__undefined__,car,truck,bus,train ~radio=quality:good,bad ~checkbox=parked:false 一个具有多个属性的标签
``circle @radio=color:green,red,blue @number=radius:0,10,0.1 line square- 一个带有两个属性的标签和两个没有属性的标签

自写：

车辆 @select=type:__undefined__,汽车,面包车,卡车,公交车,摩托车,自行车,电动车,三轮车,车群,其他 ~radio=遮挡:__undefined__,无遮挡,少部分遮挡,大部分遮挡 ~checkbox=截断:false 人 @select=type:__undefined__,行人,骑手 ~radio=遮挡:__undefined__,无遮挡,少部分遮挡,大部分遮挡 ~checkbox=截断:false

下载标注文件

框坐标是从图像的左上角开始测量

框坐标

CVAT XML 数据格式

<?xml version="1.0" encoding="utf-8"?>
<annotations>
  <version>1.0</version>
  <meta>
    <task>
      <id>1062</id>
      <name>My interpolation task</name>
      <size>30084</size>
      <mode>interpolation</mode>
        <!-- 重叠的帧数 -->
      <overlap>20</overlap>
        <!-- 描述任务的URL -->
      <bugtracker></bugtracker>
      <created>2018-05-31 14:13:36.483219+03:00</created>
      <updated>2018-06-06 13:56:32.113705+03:00</updated>
      <labels>
        <label>
          <name>car</name>
          <attributes>
            <attribute>@select=model:1,2,3,4</attribute>
          </attributes>
        </label>
      </labels>
      <segments>
        <segment>
          <id>3085</id>
          <start>0</start>
          <stop>30083</stop>
          <url>http://cvat.example.com:8080/?id=3085</url>
        </segment>
      </segments>
      <owner>
        <username>admin</username>
        <email></email>
      </owner>
    </task>
    <dumped>2018-06-06 15:52:11.138470+03:00</dumped>
  </meta>
  <track id="0" label="car">
      <!-- 左上和右下 -->
    <box frame="110" xtl="634.12" ytl="37.68" xbr="661.50" ybr="71.37" outside="0" occluded="1" keyframe="1">
      <attribute name="model">1</attribute>
    </box>
    <box frame="111" xtl="634.21" ytl="38.50" xbr="661.59" ybr="72.19" outside="0" occluded="1" keyframe="0">
      <attribute name="model">1</attribute>
    </box>
    <box frame="112" xtl="634.30" ytl="39.32" xbr="661.67" ybr="73.01" outside="1" occluded="1" keyframe="1">
      <attribute name="model">1</attribute>
    </box>
  </track>
  <track id="1" label="car">
    <box frame="0" xtl="626.81" ytl="30.96" xbr="656.05" ybr="58.88" outside="0" occluded="0" keyframe="1">
      <attribute name="model">3</attribute>
    </box>
    <box frame="1" xtl="626.63" ytl="31.56" xbr="655.87" ybr="59.48" outside="0" occluded="0" keyframe="0">
      <attribute name="model">3</attribute>
    </box>
    <box frame="2" xtl="626.09" ytl="33.38" xbr="655.33" ybr="61.29" outside="1" occluded="0" keyframe="1">
      <attribute name="model">3</attribute>
    </box>
  </track>
</annotations>

PASCAL VOC 数据格式

导出的文件可以理解成 XML 的分解版

<annotation>
    <folder></folder>
    <filename>25_348.jpg</filename>
    <path></path>
    <source>
        <database>Unknown</database>
    </source>
    <size>
        <width>1280</width>
        <height>720</height>
        <depth>3</depth>
    </size>
    <!-- 是否用于分割 -->
    <segmented>0</segmented>
    <object>
        <name>car</name>
        <!-- 拍摄角度 -->
        <pose>Unspecified</pose>
        <!-- 是否被阶段（0表示完整） -->
        <truncated>0</truncated>
        <!-- 目标是否难以识别（0表示容易识别） -->
        <difficult>0</difficult>
        <!-- 左上和右下 -->
        <bndbox>
            <xmin>495.6484375</xmin>
            <ymin>350.79296875</ymin>
            <xmax>666.5</xmax>
            <ymax>469.2000000000007</ymax>
        </bndbox>
    </object>
</annotation>

YOLO 数据格式

参考

导出的数据的格式跟我们平常使用的数据格式有一定区别，它进行了归一化处理

width="1280" height="720"
# XML文件的坐标
xtl="495.65" ytl="350.79" xbr="666.50" ybr="469.20"
xtl="362.63" ytl="366.40" xbr="492.07" ybr="434.33"
# YOLO的txt文件
# 0，类别编号
# 第一个参数：归一化后的中心点x坐标
# 第二个参数：归一化后的中心点y坐标
# 第三个参数：归一化后的目标框宽度w
# 第四个参数：归一化后的目标框高度h
0 0.453964 0.569440 0.133478 0.164454
0 0.333868 0.556060 0.101119 0.094346

COCO数据格式

导出文件的属于第4种panoptic segmentation

// xml显示的坐标 左上(735,533) 右下(825,583)
<box frame="0" outside="0" occluded="0" keyframe="1" xtl="735.14" ytl="532.90" xbr="824.75" ybr="582.90"> 
// json格式
"annotations": [
    {
        "category_id": 2,
        "id": 1,
        "image_id": 0,
        "iscrowd": 0,
        "segmentation": [
         // 左上顺时针计顶点
         // 左上(735,533) 右上(825,533) 右下(825,583) 左下(735,583)
            [
                735.140625,
                532.90234375,
                824.7462158203125,
                532.90234375,
                824.7462158203125,
                582.9025268554688,
                735.140625,
                582.9025268554688
            ]
        ],
        "area": 4500.0,
        // bbox[x,y,width,height]
        "bbox": [
            735.0,
            533.0,
            90.0,
            50.0
        ]
    }
]

Data format参考

有5种数据格式： object detection, keypoint detection, stuff segmentation, panoptic segmentation, and image captioning

Object Detection

box coordinates are measured from the top left image corner and are 0-indexed ，框坐标是从图像的左上角开始测量，并从0开始索引

annotation{
    "id": int, 
    "image_id": int, 
    "category_id": int, 
    "segmentation": RLE or [polygon], 
	"area": float, 
	"bbox": [x,y,width,height], 
	"iscrowd": 0 or 1,
}

categories[{
    "id": int, 
    "name": str, 
    "supercategory": str,
}]

Keypoint Detection

"keypoints" is a length 3k array ，“关键点”是一个长度为3k的数组

annotation{
	"keypoints": [x1,y1,v1,...], 
	"num_keypoints": int, 
	"[cloned]": ...,
}

categories[{
    "keypoints": [str], 
    "skeleton": [edge], 
    "[cloned]": ...,
}]
"[cloned]": denotes fields copied from object detection annotations defined above.

Stuff Segmentation

COCO提供 json 和 png 转换的脚本

cocostuffapi

cocoAPI使用

Panoptic Segmentation

To match an annotation with an image, use the image_id field (that is annotation.image_id==image.id)，要将标注与图像匹配，需要使用image_id字段

images{
    "id": int,
    "file_name": str,
    "height": int,
    "width": int
}

categories{
    "id": int, 
    "name": str, 
    "supercategory": str, 
    "isthing": 0 or 1, 
    "color": [R,G,B],
}

annotation{
    "image_id": int, 
    "category_id": str, 
    "segments_info": [segment_info],
}

segment_info{
 	"id": int,. 
 	"category_id": int, 
 	"area": int, 
	"bbox": [x,y,width,height], 
	"iscrowd": 0 or 1,
}

Image Captioning

annotation{
    "id": int, 
    "image_id": int, 
    "caption": str,
}

Mask数据格式

PNG class mask + instance mask

导出文件包含：每一帧 png 的遮罩图片和 colormap.txt 的每种颜色的值

TFRecord数据格式

导出文件包含："description"+.tfrecord label_map.pbtxt

TFRecords其实是一种二进制文件，虽然它不如其他格式好理解，但是它能更好的利用内存，更方便复制和移动，并且不需要单独的标签文件

Tensorflow模型的graph结构可以保存为.pb文件或者.pbtxt文件，或者.meta文件，其中只有.pbtxt文件是可读的

操作方法

快捷键	方法
N	开始/停止绘制
Enter	更改框的颜色
Q	给当前框设置遮挡属性
L	锁定当前选框
TL	锁定当前帧所有选框
Esc	关闭绘制模式
H	隐藏当前选框
TH	隐藏所有选框
Delete	删除当前选框，插值模式下，删除同一物体所有选框
F	下一帧
D	上一帧
shift+B/Alt+B	增加/减少图像的亮度
Shift+C/Alt+C	增加/减少图像上的对比度
Shift+S/Alt+S	增加/减少图像上的饱和度
Ctrl+S	保存
M	进入/应用合并模式
R	插值模式下，调到当前物体的下一帧
E	插值模式下，调到当前物体的上一帧
Shift+Enter	放大模式，让属性看的更清晰
Shift+Tab	在放大模式，切换实例

雪急飞绪

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
标注文件格式调研

类型标签格式：label_name <prefix>input_type=attribute_name:attribute_value1,attribute_value2可以指定多个标签和属性，使用空格分开label_name:大类:@ 独特的属性不能更改，比如：遮挡~ 可变的临时属性，比如：截断input_type:可以选择select、checkb...
复制链接

扫一扫

专栏目录