一文走通label studio OCR任务的半自动标注

花兰兮

已于 2023-11-23 21:25:50 修改

阅读量2.7k

点赞数 5

文章标签： ocr python

于 2023-11-22 16:57:15 首次发布

本文链接：https://blog.csdn.net/qq_34364342/article/details/134550191

版权

#站在巨人的肩膀上，搬运官方+踩坑实录

一、环境配置

1、环境安装（不是我干的，跳过）

pip install label-studio

二、创建ML后端

1、clone 整个项目：

git clone https://github.com/HumanSignal/label-studio-ml-backend.git

会得到这个东西

2、设置环境

cd label-studio-ml-backend/
pip install -U -e .

3、创建一个新的后端

label-studio-ml create my_ml_backend

会创建以下文件目录：

my_ml_backend/
├── Dockerfile #用于docker-compose.yml通过 Docker 运行 ML 后端。
├── docker-compose.yml
├── model.py #是主文件，您可以在其中实现自己的训练和推理逻辑
├── _wsgi.py #是一个帮助程序文件，用于通过 Docker 运行 ML 后端（您不需要修改它）。
├── README.md #有关如何运行 ML 后端的说明。
└── requirements.txt #是一个具有 Python 依赖项的文件。

4、运行后端服务器和启动

（1）基于docker启动

cd my_ml_backend/
docker-compose up

ML 后端服务器位于http://localhost:9090。将 ML 后端连接到 Label Studio 时，您可以使用此 URL。

label-studio start

Label Studio 开始于http://localhost:8080.

（2）无需docker启动，用于调试

Label Studio 可以自动创建从新创建的模型运行 ML 后端所需的所有必要配置和脚本。

调用您的 ML 后端my_backend并从命令行初始化 ML 后端目录./my_backend：

label-studio-ml init my_ml_backend \
--script ./my_ml_backend/model.py  create my_ml_backend --force

由于每次初始化都会创建一个文件夹，因此在调试时，会碰到文件夹重复的错误。所以添加第二行强制创建目录。

label-studio-ml start my_ml_backend

不同用户的后端应当设置不同的端口...不然多个后端会冲突启动不了

label-studio-ml start my_ml_backend -p 9091

服务器启动http://localhost:9090并在控制台中输出日志。

三、在label studio中调用ml backend

1、进入服务器8080端口

http://localhost:8080

2、用邮箱注册个人账户登录

跳过

3、创建项目后，进入设置-Machine Learning界面Add model，如下：

然后输入后端的地址，两个选项都选上就ok啦：

4、标注界面自动加载模型的预测

如图，在标注界面，就会有个后端模型为我们自动打标的INITIAL标签(不可修改)，并有个副本标签用于修改

每个样本在人工完成标注并在右下角submit后(第一次为submit后续都是update)，会显示标注的数量，用于区分有没有经过人工标注。

同时，在数据界面可以看到，自动标注的数据和OCR后端自动标注的数据：

四：如何根据具体任务和特定的模型开发模型后端

第二章提到，my_ml_backend/model.py是主要的文件，在这实现推理和训练(暂时没试过)。

这里以百度PaddleOCR为例，记录一下主要踩坑的点。

1、根据任务目标选择具体要开发的项目类型，如图，选择一个模板：

这里当然选择了OCR任务

选择完后，界面会跳转回来，用于设置标签等，不同的任务会有些不同，在Add label names输入aa并add，右边就多了一类aa标签。

2、label studio的数据结构

为了开发ML后端，还需要看上图code这里和导出的数据结果。

图1 此处的是前端的一些用于标注组件，这里的组件类型，决定了标注结果的数据格式。

图2下图是根据上面的设置标注完的数据结果：

怎么对应数据呢？

1、图1中的 <Image name="image" value="$ocr" >，就对应了图2中的红框1

2、图1中接下来还有三块内容，<Labels>,<Retangle>or<Polygon>,<Textarea>。这三块内容分别对应了OCR任务的三个产出，即标签类型(text、handwrting、aa)，区域类型（矩形、多边型）以及文本内容，在后面标注的时候均会产出数据。

<Labels>对应的数据为标注的时候使用的标签类型+区域坐标。（这也是个坑点，后面再说）

<Retangle>or<Polygon>对应的数据为标注的时候使用的区域类型+区域坐标，区域类型为矩形或者多边形。

<Textarea>对应的数据为标注的时候填写的文字和区域坐标。

这三块内容，对应了图二中的result中的数据(红色大框区域)。以图2为例，图二是<Labels>对应的数据。小框2表名了数据是<Labels>部分，即'type':'labels'，同理还有'type':'retangle'和'type':'textarea'；小框3即为标签类型“Text”+区域坐标。

可以看下其他两个的数据结构，写后端代码时就根据这些来组装结果了。

总结一下：

以OCR任务为例，在前端设置标签时，会有三个任务，即标签类型、标注区域类型和文本结果。其他任务同理，对齐数据即可。

3、构建ML后端predict函数

先根据官方的样例介绍一下：

label studio平台核心是调用my_ml_backend/model.py中的predict函数来获取模型打标的结果。因此主要需要改动的就是根据具体任务(这里为OCR)，完成predict函数中图片OCR并封装成平台给定的格式。

from label_studio_ml.model import LabelStudioMLBase


class DummyModel(LabelStudioMLBase):

    def __init__(self, **kwargs):
        # 继承一下LabelStudioMLBase
        super(DummyModel, self).__init__(**kwargs)
    
        # 然后初始化一下要用的模型即可
        from_name, schema = list(self.parsed_label_config.items())[0]
        self.from_name = from_name
        self.to_name = schema['to_name'][0]
        self.labels = schema['labels']

    def predict(self, tasks, **kwargs):
        """ 核心的推理函数，在这完成label studio中图片获取、模型推理和数据打包返回
        """
        predictions = []
        for task in tasks:
            predictions.append({
                'score': 0.987,  # prediction overall score, visible in the data manager columns
                'model_version': 'delorean-20151021',  # all predictions will be differentiated by model version
                'result': [{
                    'from_name': self.from_name,
                    'to_name': self.to_name,
                    'type': 'choices',
                    'score': 0.5,  # per-region score, visible in the editor 
                    'value': {
                        'choices': [self.labels[0]]
                    }
                }]
            })
        return predictions

    def fit(self, annotations, **kwargs):
        """边打标边训练使用的
        """
        return {'path/to/created/model': 'my/model.bin'}
    def other(self,xx)
        '''其他辅助函数
        '''
        return xx

直接贴一下个人写的基于百度paddleOCR的模型后端代码，坑点都在注释中了，大概有4个大坑，每个都坑的我非常难受...：

from typing import List, Dict, Optional
from label_studio_ml.model import LabelStudioMLBase
import cv2
import sys
sys.path.append('./PaddleOCR')#自己的目录
from paddleocr import PaddleOCR
import os
from PIL import Image
import pandas as pd
import numpy as np
import requests
from io import BytesIO
import math

global_ocr_instance = PaddleOCR(
    # 坑点1
    # PaddleOCR初始化参数，这里把模型提到外面来了，不然整页整页都在打印加载模型时的参数
    det_model_dir='./inference/ch_PP-OCRv4_det_infer/',
    rec_model_dir='./inference/ch_PP-OCRv4_rec_infer/',
    rec_char_dict_path='./PaddleOCR/ppocr/utils/ppocr_keys_v1.txt',
    lang="ch",det_algorithm='DB',use_gpu=True,ocr_version='PP-OCRv4'
    #其他推理参数根据情况自己调整
)

class NewModel(LabelStudioMLBase):
    def __init__(self,project_id=None,**kwargs):
        super(NewModel, self).__init__(**kwargs)
        self.ocr = global_ocr_instance
        self.token = '376641251a1be5a6d94******5767b2113c1afe'
        
    def predict(self, tasks, **kwargs):
        results = []
        for task in tasks:
            '''坑点2：读取图像文件。虽然label studio和模型在同一台服务器上，但是在不同的端口。这样就导致了：（1）label studio上传图片时，无法直接加载模型服务器目录下的图片；(2)模型后端无法直接读取label studio中上传的图片，source中显示的直接上传的图片目录为"/data/upload/12/bf68a25f-0034.jpg"。因此这里选择通过request请求获取数据。这里还有个小坑，每个账号有不同的token，请求的时候需要带上'''
            image_path = task['data']['ocr']
            image_url = 'http://localhost:8080'+image_path
            image = self.load_image_from_url(image_url,self.token)
            # 使用OCR模型处理图像
            ocr_results = self.ocr.ocr(np.array(image), cls=True)

            # 转换OCR结果为Label Studio所需的格式
            predictions = []
            '''#坑点3，必须带上id，上面说了，ocr任务有三个结果，如果没有id，前端就变成了3个结果'''
            ocr_id = 0
            for result in ocr_results[0]:
                points, text_score = result
                text, score = text_score
                
                x, y, width, height, rotation = self.convert_points_to_relative_xywhr(points,np.array(image))
               '''坑点4：显示的区域坐标并不是像素的绝对值位置，而是相对位置...因此要转换成百分比，并且是0-100之间的数字，所以在下面有个坐标转换函数'''

                # 标签（Labels）组件预测
                label_prediction = {
                    'from_name': 'label',
                    'id':str(ocr_id),
                    'to_name': 'image',
                    'type': 'labels',
                    'value': {
                        'x': x,
                        'y': y,
                        'width': width,
                        'height': height,
                        'rotation':rotation,
                        'labels': ['Text']
                    }
                }

                # 矩形框（Rectangle）组件预测
                
                rectangle_prediction = {
                    'from_name': 'bbox',
                    'id':str(ocr_id),
                    'to_name': 'image',
                    'type': 'rectangle',
                    'value': {
                        'x': x,
                        'y': y,
                        'width': width,
                        'height': height,
                        'rotation':rotation
                    }
                }

                # 文本区域（TextArea）组件预测
                textarea_prediction = {
                    'from_name': 'transcription',
                    'id':str(ocr_id),
                    'to_name': 'image',
                    'type': 'textarea',
                    'value': {
                        'x': x,
                        'y': y,
                        'width': width,
                        'height': height,
                        'rotation':rotation,
                        'text':[text]
                    }
                }
                
                predictions.extend([label_prediction, rectangle_prediction, textarea_prediction])
                ocr_id += 1
                
            results.append({
                'result': predictions
            })

        return results
    def load_image_from_url(self,url,token):
        headers = {'Authorization': f'Token {token}'}
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            image = Image.open(BytesIO(response.content))
            return image
        else:
            raise Exception(f"Error loading image from {url}") 


    def convert_points_to_relative_xywhr(self, points, image):
        """
        Convert a list of points representing a rectangle to relative x, y, width, height, and rotation.
        The values are relative to the dimensions of the given image.

        Points are expected to be in the order: top-left, top-right, bottom-right, bottom-left.

        The rotation is calculated as the clockwise angle between the top edge and the horizontal line.

        Args:
        - points (list of lists): A list of four points, each point is a list of two coordinates [x, y].
        - image (numpy array): An image array.

        Returns:
        - tuple: (x, y, width, height, rotation) where x and y are the relative coordinates of the top-left point,
          width and height are the relative dimensions of the rectangle, and rotation is the angle in degrees.
        """
        # Extracting points
        top_left, top_right, bottom_right, bottom_left = points

        # Image dimensions
        img_height, img_width = image.shape[:2]

        # Calculate width and height of the rectangle
        width = math.sqrt((top_right[0] - top_left[0])**2 + (top_right[1] - top_left[1])**2)
        height = math.sqrt((bottom_right[0] - top_right[0])**2 + (bottom_right[1] - top_right[1])**2)

        # Calculate rotation in radians
        dx = top_right[0] - top_left[0]
        dy = top_right[1] - top_left[1]
        angle_radians = math.atan2(dy, dx)

        # Convert rotation to degrees
        rotation = math.degrees(angle_radians)

        # The top-left point is the origin (x, y)
        x, y = top_left

        # Convert dimensions to relative values (percentage of image dimensions)
        rel_x = x / img_width * 100
        rel_y = y / img_height * 100
        rel_width = width / img_width * 100
        rel_height = height / img_height * 100

        return rel_x, rel_y, rel_width, rel_height, rotation

保存后，按照前面的教程，原...后端，启动！

再在label studio中配置一下，就能获取自动打标结果了。

这个版本是一次性获得整张图片的预打标结果的。

有空再更新一下边画框边实时获得框内的OCR结果的教程