智能客服系统系列2-端到端智能问答系统

愚昧之山绝望之谷开悟之坡

已于 2022-06-17 17:51:42 修改

阅读量866

点赞数 2

分类专栏：客服系统 NLP实战项目 PaddlePaddle 文章标签： elasticsearch paddlepaddle python

于 2022-06-16 11:30:40 首次发布

本文链接：https://blog.csdn.net/qq_15821487/article/details/125282152

版权

NLP实战项目同时被 3 个专栏收录

202 篇文章 14 订阅

订阅专栏

PaddlePaddle

108 篇文章 9 订阅

订阅专栏

客服系统

2 篇文章 0 订阅

订阅专栏

0、参考

参考代码：https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/experimental/pipelines/examples/question-answering
可视化工具参考：https://streamlit.io/
ES安装参考官方下载：https://www.elastic.co/guide/en/enterprise-search/current/docker.html
ES安装教程参考：https://blog.csdn.net/smilehappiness/article/details/118466378
镜像：PaddlePaddle镜像汇总参考

1、本地虚拟环境搭建运行

1.1 虚拟环境搭建

anaconda环境运行：linux搭建用conda搭建虚拟环境并运行

1.2 运行指令

requirements.txt内容参考2.1

# 1) 安装 pipelines package
cd ${HOME}/PaddleNLP/applications/experimental/pipelines/
pip install -r requirements.txt
python setup.py install
# 2) 安装 RestAPI 相关依赖
python ./rest_api/setup.py install
# 3) 安装 Streamlit WebUI 相关依赖
python ./ui/setup.py install

1.3 安装 ElasticSearch，用现有的ES集群，可以不用再安装

docker参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html
安装包参考：https://www.elastic.co/cn/downloads/elasticsearch
ES安装教程参考：[https://blog.csdn.net/smilehappiness/article/details/118466378]

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.1.2-linux-x86_64.tar.gz

tar -xzvf elasticsearch-8.1.2-linux-x86_64.tar.gz

cd elasticsearch-8.1.2

./bin/elasticsearch

创建用户
useradd user-es

创建所属组：
chown user-es:user-es -R /data/mart/elasticsearch-8.1.2

切换到user-es用户
su user-es

进入bin目录
cd /data/mart/elasticsearch-8.1.2/bin

启动elasticsearch
./elasticsearch

curl http://193.168.57.39:9200/_aliases?pretty=true

查看

http://193.168.57.111:9200/baike_cities/_search?pretty

1.查看所有索引：curl ‘localhost:9200/_cat/indices?v’
2.查看test索引（后缀美化：?pretty）：curl -XGET ‘localhost:9200/test/search?pretty’
3.查看test_下所有索引：curl -XGET 'localhost:9200/test/_search?pretty’
4.模糊查询（含有student）索引名称：curl 'localhost:9200/cat/indices?v’ | grep ‘student’
5.匹配ip等于哪个字段只显示那个字段 curl -XPOST 'localhost:9200/test/_search?pretty’ -d’{“query”:{“match”:{“ip”:“2.2.2.2”}},“_source”:[“ip”,“name”]}’
6. must 相当于and（与）should相当于or： curl -XPOST ‘localhost:9200/test/_search?pretty’ -d ‘{“query”:{“bool”:{“must”:[{“term”:{“name”: “123”}},{“term”:{“type”: “3”}}]}}}’
7.时间范围以及时间排序取几条： curl -XGET ‘localhost:9200/test/_search?pretty’ -d ‘{“query”:{“range”: {“start_time”:{“lte”:1632363010000,“gte”:1610467210000 }}},“source”:[“id”,“name”,“score”,“type”,“start_time”],“sort”:[{“start_time”:{“order”:“asc”}}],“from”:1,“size”:4 }’
8.取两条数据：curl -XGET 'localhost:9200/test*/_search?pretty’ -d ‘{“from”:1,“size”:2 }’

1.4运行python rest_api/application.py 8891 报错

/bin/sh: pdftotext: command not found
Traceback (most recent call last):
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 832, in _load_or_get_component
    component_type=component_type, **component_params)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/nodes/base.py", line 67, in load_from_args
    instance = subclass(**kwargs)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/nodes/file_converter/pdf.py", line 66, in __init__
    """)
Exception: pdftotext is not installed. It is part of xpdf or poppler-utils software suite.
                
                   Installation on Linux:
                   wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.03.tar.gz &&
                   tar -xvf xpdf-tools-linux-4.03.tar.gz && sudo cp xpdf-tools-linux-4.03/bin64/pdftotext /usr/local/bin
                   
                   Installation on MacOS:
                   brew install xpdf
                   
                   You can find more details here: https://www.xpdfreader.com
                

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rest_api/application.py", line 33, in <module>
    from rest_api.controller.router import router as api_router
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines_rest_api-0.0.1a0-py3.7.egg/rest_api/controller/router.py", line 17, in <module>
    from rest_api.controller import file_upload, search, feedback, document
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines_rest_api-0.0.1a0-py3.7.egg/rest_api/controller/file_upload.py", line 59, in <module>
    Path(PIPELINE_YAML_PATH), pipeline_name=INDEXING_PIPELINE_NAME)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 278, in load_from_yaml
    overwrite_with_env_variables=overwrite_with_env_variables,
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 789, in load_from_config
    components=components)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/pipelines-0.1.0a0-py3.7.egg/pipelines/pipelines/base.py", line 835, in _load_or_get_component
    raise Exception(f"Failed loading pipeline component '{name}': {e}")
Exception: Failed loading pipeline component 'PDFFileConverter': pdftotext is not installed. It is part of xpdf or poppler-utils software suite.
                
                   Installation on Linux:
                   wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.03.tar.gz &&
                   tar -xvf xpdf-tools-linux-4.03.tar.gz && sudo cp xpdf-tools-linux-4.03/bin64/pdftotext /usr/local/bin
                   
                   Installation on MacOS:
                   brew install xpdf

报错pdftotext安装地址：http://www.xpdfreader.com/download.html

wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz && tar -xvf xpdf-tools-linux-4.04.tar.gz && sudo cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin

1.5报错解决方案，先安装依赖

pdftotext安装说明：https://github.com/jalan/pdftotext

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel
pip install pdftotext

1.51访问报错

es/preprocessor/preprocessor.py", line 265, in split
    language=self.language)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/tokenize/__init__.py", line 106, in sent_tokenize
    tokenizer = load(f"tokenizers/punkt/{language}.pickle")
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 750, in load
    opened_resource = _open(resource_url)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 876, in _open
    return find(path_, path + [""]).open()
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/nltk/data.py", line 583, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

1.52访问报错解决方案

(python37) [root@k8s-master02 pipelines]# python
Python 3.7.0 (default, Oct  9 2018, 10:31:47) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download('punkt')
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
True

1.52上传文件乱码报错

在这里插入图片描述

1.52上传文件乱码报错解决方案

把要上传的文件另存为utf-8的格式再上传

1.7运行python -m streamlit run ui/webapp_question_answering.py --server.port 8502报错

Traceback (most recent call last):
  File "/data/anaconda3/envs/python37/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/data/anaconda3/envs/python37/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/__main__.py", line 21, in <module>
    main(prog_name="streamlit")
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 204, in main_run
    _main_run(target, args, flag_options=kwargs)
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 232, in _main_run
    command_line = _get_command_line_as_string()
  File "/data/anaconda3/envs/python37/lib/python3.7/site-packages/streamlit-1.7.0-py3.7.egg/streamlit/cli.py", line 221, in _get_command_line_as_string
    cmd_line_as_list.extend(click.get_os_args())
AttributeError: module 'click' has no attribute 'get_os_args'

1.8报错解决方案

参考解决方案：https://blog.csdn.net/qq_37223079/article/details/124315174

pip install click == 8.0.0

2、如何在docker内运行

2.1、requirements.txt安装包编写

paddlenlp==2.3.3
paddleocr==2.5.0.3
requests==2.28.0
pydantic==1.9.1
mmh3==3.0.0
more-itertools==8.13.0
elasticsearch==7.10.0
SQLAlchemy==1.4.37
SQLAlchemy-Utils==0.38.2
langdetect==1.0.9
python-docx==0.8.11
nltk==3.7
pdfplumber==0.7.1
importlib-metadata==4.2.0
faiss-gpu==1.7.2

2.2、docker指令运行

docker pull registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev

nvidia-docker run -it --entrypoint=/bin/bash registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev

pip3.7 install paddlepaddle-gpu==2.3.0.post101 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

pip3.7 install -r requirements.txt

python3.7 setup.py install

python3.7 examples/question-answering/dense_qa_example.py --device gpu

2.3、dockerfile制作镜像运行，dockerfile内容

FROM registry.baidubce.com/paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7-gcc54-dev


COPY . /deploy
WORKDIR /deploy

RUN pip config set global.index-url https://mirror.baidu.com/pypi/simple \
    && python3.7 -m pip install --upgrade setuptools \
    && python3.7 -m pip install --upgrade pip \
    && python3.7 -m pip install paddlepaddle-gpu==2.3.0.post101 \
    -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html \
    # 1) 安装 pipelines package\
    && cd pipelines/ \
    && pip3.7 install -r requirements.txt \
    && python3.7 setup.py install

ENTRYPOINT export CUDA_VISIBLE_DEVICES=0 && \
           python3.7 pipelines/examples/question-answering/dense_qa_example.py --device gpu

2.3.1、dockerfile制作镜像

nvidia-docker build -t chatbot-qa:1.0.0.0630 .

nvidia-docker run --name chatbot-qa -d chatbot-qa:1.0.0.0630

nvidia-docker exec -it chatbot-qa /bin/bash

docker logs chatbot-qa

愚昧之山绝望之谷开悟之坡

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
智能客服系统系列2-端到端智能问答系统

参考代码：https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/experimental/pipelines/examples/question-answering可视化工具参考：https://streamlit.io/ES安装参考：https://www.elastic.co/guide/en/enterprise-search/current/docker.html..................
复制链接

扫一扫

专栏目录