-
版本信息
Label Studio v1.12.0
-
具体实现
-
进入旧 Label Studio 的 Django 管理脚本所在目录
docker exec -it <Your-SQLite-Label-Studio-Container> bash cd /label-studio/label_studio
-
导出数据
python3 manage.py dumpdata > sqlite_dump_data.json
导出的时候,可能出现没有相应的表的错误提示:
CommandError: Unable to serialize database: no such table: django_rq_queue
笔者根据习惯,猜测只有最后一个下划线需要改成句点
.
,所以此时执行以下命令去除对应的表便导出成功了(笔者并没有系统学习过数据库)python3 manage.py dumpdata > data_dump.json --exclude=django_rq.queue
-
Note❗ (猜测依据)笔者看报错异常的源码时,app 标签和数据库的 model 名只由一个句点
.
分隔,所以如果将两个或以上的形如django_rq_queue
里的下划线_
都换成句点.
执行的话,可能会遇到如下异常Traceback (most recent call last): File "/label-studio/label_studio/manage.py", line 23, in <module> execute_from_command_line(sys.argv) File "/usr/local/lib/python3.10/dist-packages/django/core/management/__init__.py", line 419, in execute_from_command_line utility.execute() File "/usr/local/lib/python3.10/dist-packages/django/core/management/__init__.py", line 413, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/usr/local/lib/python3.10/dist-packages/django/core/management/base.py", line 354, in run_from_argv self.execute(*args, **cmd_options) File "/usr/local/lib/python3.10/dist-packages/django/core/management/base.py", line 398, in execute output = self.handle(*args, **options) File "/usr/local/lib/python3.10/dist-packages/django/core/management/commands/dumpdata.py", line 98, in handle excluded_models, excluded_apps = parse_apps_and_model_labels(excludes) File "/usr/local/lib/python3.10/dist-packages/django/core/management/utils.py", line 98, in parse_apps_and_model_labels model = installed_apps.get_model(label) File "/usr/local/lib/python3.10/dist-packages/django/apps/registry.py", line 204, in get_model app_label, model_name = app_label.split('.') ValueError: too many values to unpack (expected 2) Sentry is attempting to send 2 pending events Waiting up to 2 seconds Press Ctrl-C to quit
-
-
把用户验证相关的 json 字段手动全部删除[1]
-
删除类似如下字段
{ "model": "authtoken.token", "pk": "3d24bdd29c698f6c09de3df9152cfdd4048cee09", "fields": { "user": 3, "created": "2024-3-21T00:52:39.157Z" } }, { "model": "authtoken.token", "pk": "b658b0b1d81f8e8a7a60b4906c6568a3d7df340a", "fields": { "user": 2, "created": "2024-1-24T02:14:59.932Z" } }, { "model": "authtoken.token", "pk": "f019beebd8fecc1602790b471fc9c1a84f71a6d2", "fields": { "user": 1, "created": "2024-1-18T00:46:15.175Z" } }
-
否则你就可能遇到类似如下 Django 异常
django.db.utils.IntegrityError: Problem installing fixture '/label-studio/label_studio/data/sqlite_data_dump.json': Could not load users.User(pk=1): duplicate key value violates unique constraint "authtoken_token_user_id_key" DETAIL: Key (user_id)=(1) already exists.
-
-
准备连接 PostgreSQL 的 Label Studio 的 docker-compose.yml
读者可参考修改,欢迎在评论区指出其能够变得更简洁、高效、可读的配置 👏
name: label-studio services: app: image: heartexlabs/label-studio:latest ports: - "6666:8080" restart: always environment: DJANGO_DB: default POSTGRE_NAME: labelstudio POSTGRE_USER: Guido POSTGRE_PASSWORD: postgres_password POSTGRE_PORT: 5432 POSTGRE_HOST: db DATA_UPLOAD_MAX_NUMBER_FILES: 10000 LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED: "true" LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT: /label-studio/data depends_on: db: condition: service_healthy restart: true volumes: - type: bind source: /data/Guido/label-studio/data target: /label-studio/data db: image: postgres:16.3 restart: always hostname: db environment: POSTGRES_DB: labelstudio POSTGRES_USER: Guido POSTGRES_PASSWORD: postgres_password POSTGRES_HOST_AUTH_METHOD: trust healthcheck: test: ["CMD-SHELL", "pg_isready -U $${POSTGRES_USER} -d $${POSTGRES_DB}"] interval: 10s retries: 3 start_period: 30s timeout: 10s
-
启动 docker compose
docker compose up -d
docker compose logs -f -n 500
# 查看日志 -
将 Django SQLite
dumpdata
出的.json
数据文件在新的用 PostgreSQL 的容器里现身- 比如把
sqlite_dump_data.json
复制到挂载的目录里 - 或者用
docker cp
复制到新容器里
- 比如把
-
进入容器准备迁移数据库
docker exec -it label-studio-app-1 bash
-
迁移数据库
python3 manage.py loaddata ../data/sqlite_data_dump.json
-
成功信息
最后终端应该会打印类似如下信息,之后浏览器访问查看数据是否都加载了。
Installed 57428 object(s) from 1 fixture(s)
现在检索标注快多了,而且不会卡到崩溃啦 🤓
-
-
参考链接
基于 docker 把 Label Studio 的 Django DB 从 SQLite 迁移到 PostgreSQL
于 2024-07-15 16:28:41 首次发布