第一步安装:(假设docker-compose已安装好)
下载 docker-compose.yml ,docker-compose.env: github下载地址
执行安装:
# 在PowerShell中运行以下命令
# cd 到docker-compose.yml 和docker-compose.env目录下
执行:docker-compose up
第二步:访问测试:
http://127.0.0.1:8000
第三步:测试问题:
1. 默认ocr不包含中文数据,需要下载中文数据包,添加到/usr/share/tesseract-ocr/5/tessdata目录下--下载地址
然后在/etc/paperless.conf中增加配置:PAPERLESS_OCR_LANGUAGE=chi_sim
2. pdf 存在签名时报错DigitalSignatureError:
需要在/etc/paperless.conf中增加配置: PAPERLESS_OCR_USER_ARGS={"invalidate_digital_signatures": true}
3. 无法上传office类型文档(word、excel、ppt)
解决:需要增加gotenberg服务
gotenberg:
image: thecodingmachine/gotenberg:latest
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
最终的yml文件内容:
# docker-compose file for running paperless from the Docker Hub.
# This file contains everything paperless needs to run.
# Paperless supports amd64, arm and arm64 hardware.
#
# All compose files of paperless configure paperless in the following way:
#
# - Paperless is (re)started on system boot, if it was running before shutdown.
# - Docker volumes for storing data are managed by Docker.
# - Folders for importing and exporting files are created in the same directory
# as this file and mounted to the correct folders inside the container.
# - Paperless listens on port 8000.
#
# In addition to that, this docker-compose file adds the following optional
# configurations:
#
# - Instead of SQLite (default), PostgreSQL is used as the database server.
#
# To install and update paperless with this file, do the following:
#
# - Copy this file as 'docker-compose.yml' and the files 'docker-compose.env'
# and '.env' into a folder.
# - Run 'docker-compose pull'.
# - Run 'docker-compose run --rm webserver createsuperuser' to create a user.
# - Run 'docker-compose up -d'.
#
# For more extensive installation and update instructions, refer to the
# documentation.
version: "3.4"
services:
broker:
image: docker.io/library/redis:7
restart: unless-stopped
volumes:
- ./redisdata:/data
db:
image: docker.io/library/postgres:15
restart: unless-stopped
volumes:
- ./pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
gotenberg:
image: thecodingmachine/gotenberg:latest
restart: unless-stopped
command:
- "gotenberg"
- "--chromium-disable-routes=true"
tika:
image: apache/tika
restart: unless-stopped
webserver:
image: ghcr.io/paperless-ngx/paperless-ngx:latest
restart: unless-stopped
depends_on:
- db
- broker
- gotenberg
- tika
ports:
- "8000:8000"
healthcheck:
test: ["CMD", "curl", "-fs", "-S", "--max-time", "2", "http://localhost:8000"]
interval: 30s
timeout: 10s
retries: 5
volumes:
- ./data:/usr/src/paperless/data
- ./media:/usr/src/paperless/media
- ./export:/usr/src/paperless/export
- ./consume:/usr/src/paperless/consume
env_file: docker-compose.env
environment:
PAPERLESS_REDIS: redis://broker:6379
PAPERLESS_TIKA_GOTENBERG_ENDPOINT: http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT: http://tika:9998
PAPERLESS_TIKA_ENABLED: 1
PAPERLESS_ADMIN_USER: admin
PAPERLESS_ADMIN_PASSWORD: admin
最终:/etc/paperless.conf(需要自己新建,创建容器时并不会自动生成)
# 设置orc默认语言为中文
PAPERLESS_OCR_LANGUAGE=chi_sim
# 解决pdf存在前面时报错:DigitalSignatureError
PAPERLESS_OCR_USER_ARGS={"invalidate_digital_signatures": true}