将 Street View Text 数据集的 GroundTruth 标注在图像上

最新推荐文章于 2024-06-05 14:49:34 发布

chenxp2311

最新推荐文章于 2024-06-05 14:49:34 发布

阅读量9.4k

点赞数 3

分类专栏： Machine Learning 文章标签：自然场景文本 OCR SVT

本文链接：https://blog.csdn.net/u010167269/article/details/52934599

版权

Machine Learning 专栏收录该内容

21 篇文章 5 订阅

订阅专栏

自然场景图像，Street View Text 数据集是非常著名的一个数据集。所有的图像都源自于 Google Street View，这里面的图像分辨率较低，文字变化较大。

数据集给定的 ground truth 是两个 XML 文件： train.xml、test.xml，示例如下：

<?xml version="1.0" encoding="utf-8"?>
<tagset>
   <image>
      <imageName>img/14_03.jpg</imageName>
      <address>341 Southwest 10th Avenue Portland OR</address>
      <lex>LIVING,ROOM,THEATERS,KENNY,ZUKE,DELICATESSEN,CLYDE,COMMON,ACE,HOTEL,PORTLAND,ROSE,CITY,BOOKS,STUMPTOWN,COFFEE,ROASTERS,RED,CAP,GARAGE,FISH,GROTTO,SEAFOOD,RESTAURANT,AURA,RESTAURANT,LOUNGE,ROCCO,PIZZA,PASTA,BUFFALO,EXCHANGE,MARK,SPENCER,LIGHT,FEZ,BALLROOM,READING,FRENZY,ROXY,SCANDALS,MARTINOTTI,CAFE,DELI,CROWSENBERG,HALF</lex>  
      <Resolution x="1280" y="880"/>
      <taggedRectangles>
         <taggedRectangle height="75" width="236" x="375" y="253">
            <tag>LIVING</tag>
         </taggedRectangle>
         <taggedRectangle height="76" width="175" x="639" y="272">
            <tag>ROOM</tag>
         </taggedRectangle>
         <taggedRectangle height="87" width="281" x="839" y="283">
            <tag>THEATERS</tag>
         </taggedRectangle>
      </taggedRectangles>
   </image>

......

那么怎么将这里面标出的 ground truth 框框，框到原图像中呢？

我写了一个 python 脚本，通过解析 XML 文件，将框框放到原图上。通过这段代码，你可以学会怎么样去 Parse 一个 XML 文件。

代码如下：


import os, sys
import glob
from PIL import Image
from PIL import ImageDraw

from xml.etree import ElementTree

train_XML_src_dir = '/media/chenxp/Datadisk/ocr_dataset/StreetViewTextDataset/svt1/train.xml'
test_XML_src_dir = '/media/chenxp/Datadisk/ocr_dataset/StreetViewTextDataset/svt1/test.xml'

images_src_dir = '/media/chenxp/Datadisk/ocr_dataset/StreetViewTextDataset/svt1/'
images_save_dir = '/media/chenxp/Datadisk/ocr_dataset/StreetViewTextDataset/svt1/img_groundtruth'

with open(train_XML_src_dir) as f:
    tree = ElementTree.parse(f)


train_fd = open('train.txt', 'w')

for node in tree.iter('image'):
    img_name = [] # 记录保存图像名
    for each_image in node:
        if each_image.tag == 'imageName':
            img_name.append(each_image.text[4::]) # 记录保存图像名

            tmp_img = Image.open(images_src_dir + each_image.text)
            tmp_draw = ImageDraw.Draw(tmp_img)

            train_fd.write(images_src_dir + each_image.text + '\n') # 将路径写入 txt 文件

        if each_image.tag == 'taggedRectangles':
            count = 0 # count the number of taggedRectangle
            x = []; y = []
            width = []; height = []
            for each_taggedRect in each_image:
                count = count + 1
                tmp_dict = each_taggedRect.attrib # 获取坐标信息，得到的为字典
                x.append(tmp_dict['x']); tmp_x = int(tmp_dict['x'])
                y.append(tmp_dict['y']); tmp_y = int(tmp_dict['y'])
                width.append(tmp_dict['width']); tmp_w = int(tmp_dict['width'])
                height.append(tmp_dict['height']); tmp_h = int(tmp_dict['height'])
                tmp_draw.polygon((tmp_x, tmp_y, tmp_x + tmp_w, tmp_y, tmp_x + tmp_w, tmp_y + tmp_h, tmp_x, tmp_y + tmp_h), outline='red')

            tmp_img.save(os.path.join(images_save_dir, img_name[0]))
            train_fd.write(str(count) + '\n')

            for i in xrange(len(x)):
                train_fd.write(str(x[i]) + ',' + str(y[i]) + ',' + str(width[i]) + ',' + str(height[i]) + ',\n')

train_fd.close()

运行后，会产生两个文件。一个是标注 ground truth 的图像，一个是我自己需要的 txt 文件，其文件格式如下：

/home/chenxp/.../.../***.jpg # 图像绝对路径
3                            # 这张图像 ground truth 的个数
x1, y1, w, h                 # 本张图像的 ground truth 的坐标信息
x1, y1, w, h
...

几个图像示例：

鉴于 SVT 原先的 UC 地址：http://vision.ucsd.edu/~kai/svt/，已经不能正常访问了（至少我今天访问了好几次，都无法正常访问）。

我将 SVT 数据集上传到云端，包括我自己生成的标注好的图像：Street View Text Dataset

还有，如果想自己下载 SVT 数据集，可以到这个网址上注册下载：http://tc11.cvc.uab.es/datasets/SVT_1

chenxp2311

关注

3
点赞
踩
8

收藏

觉得还不错? 一键收藏
2
评论
将 Street View Text 数据集的 GroundTruth 标注在图像上

自然场景图像，Street View Text 数据集是非常著名的一个数据集。所有的图像都源自于 Google Street View，这里面的图像分辨率较低，文字变化较大。数据集给定的 ground truth 是两个 XML 文件： train.xml、test.xml。本文将 XML 中的 ground truth 信息，标注在原图像中。
复制链接

扫一扫