视觉基因(visual genome)项目及数据集介绍

因为要预研VQA项目参考,趁GPU满负荷的时间,记录下这个数据集相关笔记:

官方网站定义为:

Visual Genome 是一个数据集,知识库,不断努力把结构化的图像概念和语言连接起来。

使用了众包的方式实现,由李飞飞一位同事 Michael Bernstein 提出。


截至今天2016/12/08包含:

108077张图片

540 万对区域的描述(Region Descriptions)

170 万视觉问答(Visual Question Answers)

380万对象案例(Object Instances)

280 万属性(Attributes)

230万关系(Relationships)

所有的东西都映射到 Wordnet Synsets

稍后添加数据集的详细中文描述


参考

http://visualgenome.org/

官方网站有原始论文


http://www.ccf.org.cn/sites/ccf/nry.jsp?contentId=2912552761248

该网站有中文翻译论文



以下依次是文件/返回内容/格式
images[2].zip
Return all images in jpg format. 返回所有jpg格式的图像
IMAGE_ID.jpg,
            


image_data.json.zip
Return meta data about all images 返回所有图像的元信息(图像基本情况)
Name         Type    Description
image_id    int    ID of image   图像ID
url    hyperlink string    Visual Genome-hosted image URL  图像来源的超链接URL
width    int    width of image in px   图像宽度
height    int    height of image in px  图像高度
coco_id    int    ID of the image in the coco dataset
flickr_id    int    ID of the image in the flickr dataset
由此可见该数据集引用了coco数据集和flickr数据集

[...
      {
            "image_id": 2412112,
            "url": "https://cs.stanford.edu/people/rak248/VG_100K/2370463.jpg",
            "width": 500,
            "height": 281,
            "coco_id": 547168,
            "flickr_id": 8505158818
      }
...]
                    

region_descriptions.json.zip

Return all region descriptions   区域描述

Name     Type     Description
image_id    int    ID of image containing region
regions    object array    Array of region descriptions for this image

    .region_id

    int    ID of region description

    .x

    int    x-coordinate of region bounding box

    .y

    int    y-coordinate of region bounding box

    .width

    int    width of region bounding box

    .height

    int    height of region bounding box

    .phrase

    str    region description phrase

    .synsets 同义词集

    object array    synsets in the description

        .synset_name

    str    synset name

        .entity_name

    str    string from phrase

        .entity_idx_start

    int    index where synset starts in the phrase

        .entity_idx_end

    int    index where synset ends in the phrase


[...
      {
          "image_id": 2407890,
          "regions": [...
              {
                  "region_id": 1353,
                  "x": 117,
                  "y": 79,
                  "width": 249,
                  "height": 107,
                  "phrase": "a cat sitting on a table.",
                  "synsets": [...
                      {
                          "synset_name": "cat.n.01",
                          "entity_name": "cat",
                          "entity_idx_start": 2,
                          "entity_idx_end": 5
                      },
                  ...]
              },
              {
                  "region_id": 1354,
                  "x": 116,
                  "y": 29,
                  "width": 239,
                  "height": 135,
                  "phrase": "a white cat with a tan tail and face markings",
                  "synsets": [...
                  ...]
              },
          ...]
      },
      {
          "image_id": 2407890,
          "regions": [...
          ...]
      },
...]
                    

question_answers.json.zip

All visual question answers 所有可视化的问题和答案

Name     Type     Description
image_id    int    ID of image
qas    object array    list of qas for the image

    .qa_id

    str    ID of question answer

    .question

    str    question

    .answer

    str    answer

    .question_synsets

    object array    array of sysnets in the question

        .synset_name

    str    synset name

        .entity_name

    str    string from question

        .entity_idx_start

    str    index where synset starts in the question

        .entity_idx_end

    str    index where synset ends in the question

    .answer_synsets

    object array    array of sysnets in the answer

        .synset_name

    str    synset name

        .entity_name

    str    string from answer

        .entity_idx_start

    int    index where synset starts in the answer

        .entity_idx_end

    int    index where synset ends in the answer


[...
      {
          "image_id": 2317993,
          "qas": [...
              {
                  "qa_id": 912402,
                  "question": "Where are the clouds?",
                  "answer": "sky",
                  "question_synsets": [...
                      {
                          "synset_name": "cloud.n.01",
                          "entity_name": "cloud",
                          "entity_idx_start": 14,
                          "entity_idx_end": 20
                      },
                  ...],
                  "answer_synsets": [...
                      {
                          "synset_name": "sky.n.01",
                          "entity_name": "sky",
                          "entity_idx_start": 0,
                          "entity_idx_end": 3
                      },
                  ...]
              },
          ...]
      },
...]
                    

objects.json.zip

All object instances

Name     Type     Description
image_id    int    ID of image
objects    object array    Array of object instances for this image

    .object_id

    int    ID of object

    .x

    int    x-coordinate of object bounding box

    .y

    int    y-coordinate of object bounding box

    .w

    int    width of object bounding box

    .h

    int    height of object bounding box

    .name

    str    name of object

    .synsets

    str array    synset names associated with this object


[...
      {
          "image_id": 2,
          "objects": [...
              {
                  "object_id": 1023847,
                  "x": 405,
                  "y": 34,
                  "w": 78,
                  "h": 438,
                  "name": "pole",
                  "synsets": ["pole.n.01"]
              },
              {
                  "object_id": 1023836,
                  "x": 239,
                  "y": 347,
                  "w": 136,
                  "h": 126,
                  "name": "car",
                  "synsets": ["car.n.01"]
              },
          ...]
      },
...]
                    

attributes.json.zip

All attributes in the dataset

Name    Type    Description
image_id    int    ID of image
attributes    object array    Array of attributes with object instances for this image

    .object_id

    int    ID of object

    .x

    int    x-coordinate of object bounding box

    .y

    int    y-coordinate of object bounding box

    .w

    int    width of object bounding box

    .h

    int    height of object bounding box

    .name

    str    name of object

    .synsets

    str array    synset names associated with this object

    .attributes

    str array    list of attributes associated with this object


[...
      {
          "image_id": 2,
          "attributes": [...
              {
                  "object_id": 1023847,
                  "x": 405,
                  "y": 34,
                  "w": 78,
                  "h": 438,
                  "name": "pole",
                  "synsets": ["pole.n.01"],
                  "attributes": ["brown"]
              },
              {
                  "object_id": 1023836,
                  "x": 239,
                  "y": 347,
                  "w": 136,
                  "h": 126,
                  "name": "car",
                  "synsets": ["car.n.01"],
                  "attributes": ["red", "broken"]
              },
          ...]
      },
...]
                    

relationships.json.zip

All relationships

Name    Type    Description
image_id    int    ID of image
relationships    object array    array of relationships in the image

    .relationship_id

    int    ID of relationship

    .predicate

    int    starting char index of entity

    .synsets

    str array    synset names associated with the predicate

    .subject

    int    ending char index of entity

        .object_id

    int    ID of object

        .x

    int    x-coordinate of object bounding box

        .y

    int    y-coordinate of object bounding box

        .w

    int    width of object bounding box

        .h

    int    height of object bounding box

        .name

    str    name of object

        .synsets

    str array    synset names associated with this object

    .object

    int    name of recognized entity

        .object_id

    int    ID of object

        .x

    int    x-coordinate of object bounding box

        .y

    int    y-coordinate of object bounding box

        .w

    int    width of object bounding box

        .h

    int    height of object bounding box

        .name

    str    name of object

        .synsets

    str array    synset names associated with this object


[...
      {
          "image_id": 2,
          "relationships": [...
              {
                  "relationship_id": 15947,
                  "predicate": "wears",
                  "synsets": ["wear.v.01"],
                  "subject": {
                      "object_id": 1023838,
                      "x": 324,
                      "y": 320,
                      "w": 142,
                      "h": 255,
                      "name": "man",
                      "synsets": ["man.n.01"]
                  },
                  "object": {
                      "object_id":  5071,
                      "x": 359,
                      "y": 362,
                      "w": 72,
                      "h": 81,
                      "name": "backpack",
                      "synsets": ["backpack.n.01"]
                  },
              },
          ...],
      }
...]
                    

synsets.json.zip

All the synsets and their descriptions

Name    Type    Description

    synset_name

    str    unique synset name

    synset_definition

    str    definition of synset according to WordNet


[...
      {
          "synset_name": "phonograph_record.n.01",
          "synset_definition": "sound recording consisting of a disk with a continuous groove; used to reproduce music by rotating while a phonograph needle tracks in the groove",
      },
      {
          "synset_name": "truck.n.01",
          "synset_definition": "an automotive vehicle suitable for hauling",
      }
...]
                    

region_graphs.json.zip

All the region graphs

Name    Type    Description
image_id    int    ID of image containing region
regions    object array    Array of region descriptions for this image

    .region_id

    int    ID of region description

    .x

    int    x-coordinate of region bounding box

    .y

    int    y-coordinate of region bounding box

    .width

    int    width of region bounding box

    .height

    int    height of region bounding box

    .phrase

    str    region description phrase

    .synsets

    object array    synsets in the description

        .synset_name

    str    synset name

        .entity_name

    str    string from phrase

        .entity_idx_start

    int    index where synset starts in the phrase

        .entity_idx_end

    int    index where synset ends in the phrase

    .objects

    object array    Array of object instances for this image

        .object_id

    int    ID of object

        .x

    int    x-coordinate of object bounding box

        .y

    int    y-coordinate of object bounding box

        .w

    int    width of object bounding box

        .h

    int    height of object bounding box

        .name

    str    name of object

        .synsets

    str array    synset names associated with this object

    .relationships

    object array    array of relationships in the image

        .relationship_id

    int    ID of relationship

        .predicate

    int    starting char index of entity

        .synsets

    str array    synset names associated with the predicate

        .subject_id

    int    ID of subject (found in objects list)

        .object_id

    int    ID of object (found in objects list)


[...
      {
          "image_id": 2407890,
          "regions": [...
              {
                  "region_id": 1353,
                  "x": 117,
                  "y": 79,
                  "width": 249,
                  "height": 107,
                  "phrase": "a cat sitting on a table.",
                  "synsets": [...
                      {
                          "synset_name": "cat.n.01",
                          "entity_name": "cat",
                          "entity_idx_start": 2,
                          "entity_idx_end": 5
                      },
                  ...]
                  "objects": [...
                      {
                          "object_id": 1023838,
                          "x": 324,
                          "y": 320,
                          "w": 142,
                          "h": 255,
                          "name": "cat",
                          "synsets": ["cat.n.01"]
                      },
                      {
                          "object_id":  5071,
                          "x": 359,
                          "y": 362,
                          "w": 72,
                          "h": 81,
                          "name": "table",
                          "synsets": ["table.n.01"]
                      },
                  ...],
                  "relationships": [...
                      {
                      "relationship_id": 15947,
                      "predicate": "wears",
                      "synsets": ["wear.v.01"],
                      "subject_id": 1023838,
                      "object_id":  5071,
                      }
                  ...]
              },
          ...]
      },
...]
                  

scene_graphs.json.zip

All the scene graphs

Name    Type    Description
image_id    int    ID of image containing region
objects    object array    Array of object instances for this image

    .object_id

    int    ID of object

    .x

    int    x-coordinate of object bounding box

    .y

    int    y-coordinate of object bounding box

    .w

    int    width of object bounding box

    .h

    int    height of object bounding box

    .name

    str    name of object

    .synsets

    str array    synset names associated with this object
.relationships    object array    array of relationships in the image

    .relationship_id

    int    ID of relationship

    .predicate

    int    starting char index of entity

    .synsets

    str array    synset names associated with the predicate

    .subject_id

    int    ID of subject (found in objects list)

    .object_id

    int    ID of object (found in objects list)


[...
      {
          "image_id": 2407890,
          "objects": [...
              {
                  "object_id": 1023838,
                  "x": 324,
                  "y": 320,
                  "w": 142,
                  "h": 255,
                  "name": "cat",
                  "synsets": ["cat.n.01"]
              },
              {
                  "object_id":  5071,
                  "x": 359,
                  "y": 362,
                  "w": 72,
                  "h": 81,
                  "name": "table",
                  "synsets": ["table.n.01"]
              },
          ...],
          "relationships": [...
              {
              "relationship_id": 15947,
              "predicate": "wears",
              "synsets": ["wear.v.01"],
              "subject_id": 1023838,
              "object_id":  5071,
              }
          ...]
      },
...]
                  

qa_to_region_mapping.json.zip

Mapping from qa to their corresponding region descriptions


{...
      QA_ID: REGION_DESCRIPTION_ID,
      "1885736": "2072251"
...}


  • 2
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值