基于 docker 把 Label Studio 的 Django DB 从 SQLite 迁移到 PostgreSQL

  • 版本信息

    Label Studio v1.12.0

  • 具体实现

    1. 进入旧 Label Studio 的 Django 管理脚本所在目录

      docker exec -it <Your-SQLite-Label-Studio-Container> bash
      cd /label-studio/label_studio
      
    2. 导出数据

      python3 manage.py dumpdata > sqlite_dump_data.json
      

      导出的时候,可能出现没有相应的表的错误提示:
      CommandError: Unable to serialize database: no such table: django_rq_queue

      笔者根据习惯,猜测只有最后一个下划线需要改成句点 .,所以此时执行以下命令去除对应的表便导出成功了(笔者并没有系统学习过数据库)

      python3 manage.py dumpdata > data_dump.json --exclude=django_rq.queue
      
      • Note❗ (猜测依据)笔者看报错异常的源码时,app 标签和数据库的 model 名只由一个句点 . 分隔,所以如果将两个或以上的形如 django_rq_queue 里的下划线 _ 都换成句点 . 执行的话,可能会遇到如下异常

        Traceback (most recent call last):
          File "/label-studio/label_studio/manage.py", line 23, in <module>
            execute_from_command_line(sys.argv)
          File "/usr/local/lib/python3.10/dist-packages/django/core/management/__init__.py", line 419, in execute_from_command_line
            utility.execute()
          File "/usr/local/lib/python3.10/dist-packages/django/core/management/__init__.py", line 413, in execute
            self.fetch_command(subcommand).run_from_argv(self.argv)
          File "/usr/local/lib/python3.10/dist-packages/django/core/management/base.py", line 354, in run_from_argv
            self.execute(*args, **cmd_options)
          File "/usr/local/lib/python3.10/dist-packages/django/core/management/base.py", line 398, in execute
            output = self.handle(*args, **options)
          File "/usr/local/lib/python3.10/dist-packages/django/core/management/commands/dumpdata.py", line 98, in handle
            excluded_models, excluded_apps = parse_apps_and_model_labels(excludes)
          File "/usr/local/lib/python3.10/dist-packages/django/core/management/utils.py", line 98, in parse_apps_and_model_labels
            model = installed_apps.get_model(label)
          File "/usr/local/lib/python3.10/dist-packages/django/apps/registry.py", line 204, in get_model
            app_label, model_name = app_label.split('.')
        ValueError: too many values to unpack (expected 2)
        Sentry is attempting to send 2 pending events
        Waiting up to 2 seconds
        Press Ctrl-C to quit
        
    3. 把用户验证相关的 json 字段手动全部删除[1]

      • 删除类似如下字段

        {
            "model": "authtoken.token",
            "pk": "3d24bdd29c698f6c09de3df9152cfdd4048cee09",
            "fields": {
                "user": 3,
                "created": "2024-3-21T00:52:39.157Z"
            }
        },
        {
            "model": "authtoken.token",
            "pk": "b658b0b1d81f8e8a7a60b4906c6568a3d7df340a",
            "fields": {
                "user": 2,
                "created": "2024-1-24T02:14:59.932Z"
            }
        },
        {
            "model": "authtoken.token",
            "pk": "f019beebd8fecc1602790b471fc9c1a84f71a6d2",
            "fields": {
                "user": 1,
                "created": "2024-1-18T00:46:15.175Z"
            }
        }
        
      • 否则你就可能遇到类似如下 Django 异常

        django.db.utils.IntegrityError: Problem installing fixture '/label-studio/label_studio/data/sqlite_data_dump.json': Could not load users.User(pk=1): duplicate key value violates unique constraint "authtoken_token_user_id_key"
        DETAIL:  Key (user_id)=(1) already exists.
        
    4. 准备连接 PostgreSQL 的 Label Studio 的 docker-compose.yml

      读者可参考修改,欢迎在评论区指出其能够变得更简洁、高效、可读的配置 👏

      name: label-studio
      
      services:
        app:
          image: heartexlabs/label-studio:latest
          ports:
            - "6666:8080"
          restart: always
          environment:
            DJANGO_DB: default
            POSTGRE_NAME: labelstudio
            POSTGRE_USER: Guido
            POSTGRE_PASSWORD: postgres_password
            POSTGRE_PORT: 5432
            POSTGRE_HOST: db
            DATA_UPLOAD_MAX_NUMBER_FILES: 10000
            LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED: "true"
            LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT: /label-studio/data
          depends_on:
            db:
              condition: service_healthy
              restart: true
          volumes:
            - type: bind
              source: /data/Guido/label-studio/data
              target: /label-studio/data
      
        db:
          image: postgres:16.3
          restart: always
          hostname: db
          environment:
            POSTGRES_DB: labelstudio
            POSTGRES_USER: Guido
            POSTGRES_PASSWORD: postgres_password
            POSTGRES_HOST_AUTH_METHOD: trust
          healthcheck:
            test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}"]
            interval: 10s
            retries: 3
            start_period: 30s
            timeout: 10s
      
    5. 启动 docker compose

      docker compose up -d

      docker compose logs -f -n 500 # 查看日志

    6. 将 Django SQLite dumpdata 出的 .json 数据文件在新的用 PostgreSQL 的容器里现身

      • 比如把 sqlite_dump_data.json 复制到挂载的目录里
      • 或者用 docker cp 复制到新容器里
    7. 进入容器准备迁移数据库

      docker exec -it label-studio-app-1 bash

    8. 迁移数据库

      python3 manage.py loaddata ../data/sqlite_data_dump.json

    9. 成功信息

      最后终端应该会打印类似如下信息,之后浏览器访问查看数据是否都加载了。

      Installed 57428 object(s) from 1 fixture(s)
      

      现在检索标注快多了,而且不会卡到崩溃啦 🤓

  • 参考链接

    1. https://github.com/HumanSignal/label-studio/issues/1658
    2. https://gist.github.com/sirodoht/f598d14e9644e2d3909629a41e3522ad
    3. https://www.jianshu.com/p/425c3725a4bb
    4. https://github.com/HumanSignal/label-studio/blob/develop/docker-compose.yml
  • 18
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值