问题1
File "/usr/local/lib/python3.6/dist-packages/cv2/__init__.py", line 8, in <module>
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
解决方案
RUN apt update
# Dependency for opencv-python (cv2). `import cv2` raises ImportError: libGL.so.1: cannot open shared object file: No such file or directory
# Solution from https://askubuntu.com/a/1015744
RUN apt install -y libgl1-mesa-glx
问题2
Traceback (most recent call last):
File "/home/jovyan/.conda/envs/paddle/bin/paddleocr", line 8, in <module>
sys.exit(main())
File "/home/jovyan/.conda/envs/paddle/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 673, in main
result = engine.ocr(img_path,
File "/home/jovyan/.conda/envs/paddle/lib/python3.8/site-packages/paddleocr/paddleocr.py", line 555, in ocr
dt_boxes, rec_res, _ = self.__call__(img, cls)
File "/home/jovyan/.conda/envs/paddle/lib/python3.8/site-packages/paddleocr/tools/infer/predict_system.py", line 71, in __call__
dt_boxes, elapse = self.text_detector(img)
File "/home/jovyan/.conda/envs/paddle/lib/python3.8/site-packages/paddleocr/tools/infer/predict_det.py", line 244, in __call__
self.input_tensor.copy_from_cpu(img)
File "/home/jovyan/.conda/envs/paddle/lib/python3.8/site-packages/paddle/fluid/inference/wrapper.py", line 38, in tensor_copy_from_cpu
self.copy_from_cpu_bind(data)
RuntimeError: (PreconditionNotMet) Cannot load cudnn shared library. Cannot invoke method cudnnGetVersion.
[Hint: cudnn_dso_handle should not be null.] (at /paddle/paddle/phi/backends/dynload/cudnn.cc:60)
解决方案
step 1: 在终端中输入ls /usr/lib |grep lib,可以看到shared library中并没有libcudnn.so和libcublas.so。
(base) jovyan@wangzy-p2-0:/usr/lib$ ls /usr/lib |grep lib
libcublas.so
libcudnn.so
step 2: 找到libcudnn.so和libcublas.so的位置 , 安装apt-get install mlocate
(base) jovyan@wangzy-p2-0:/usr/lib$ locate libcublas
/opt/conda/pkgs/cudatoolkit-10.2.89-h713d32c_10/lib/libcublas.so
/opt/conda/pkgs/cudatoolkit-10.2.89-h713d32c_10/lib/libcublas.so.10
/opt/conda/pkgs/cudatoolkit-10.2.89-h713d32c_10/lib/libcublas.so.10.2.2.89
/opt/conda/pkgs/cudatoolkit-10.2.89-h713d32c_10/lib/libcublasLt.so
/opt/conda/pkgs/cudatoolkit-10.2.89-h713d32c_10/lib/libcublasLt.so.10
/opt/conda/pkgs/cudatoolkit-10.2.89-h713d32c_10/lib/libcublasLt.so.10.2.2.89
/opt/conda/pkgs/cudatoolkit-11.7.0-hd8887f6_11/lib/libcublas.so
/opt/conda/pkgs/cudatoolkit-11.7.0-hd8887f6_11/lib/libcublas.so.11
/opt/conda/pkgs/cudatoolkit-11.7.0-hd8887f6_11/lib/libcublas.so.11.10.1.25
/opt/conda/pkgs/cudatoolkit-11.7.0-hd8887f6_11/lib/libcublasLt.so
/opt/conda/pkgs/cudatoolkit-11.7.0-hd8887f6_11/lib/libcublasLt.so.11
/opt/conda/pkgs/cudatoolkit-11.7.0-hd8887f6_11/lib/libcublasLt.so.11.10.1.25
(base) jovyan@wangzy-p2-0:/usr/lib$ locate libcudnn.so
/opt/conda/pkgs/cudnn-7.6.5-cuda10.2_0/lib/libcudnn.so
/opt/conda/pkgs/cudnn-7.6.5-cuda10.2_0/lib/libcudnn.so.7
/opt/conda/pkgs/cudnn-7.6.5-cuda10.2_0/lib/libcudnn.so.7.6.5
/opt/conda/pkgs/cudnn-8.4.1.50-hed8a83a_0/lib/libcudnn.so
/opt/conda/pkgs/cudnn-8.4.1.50-hed8a83a_0/lib/libcudnn.so.8
/opt/conda/pkgs/cudnn-8.4.1.50-hed8a83a_0/lib/libcudnn.so.8.4.1
step 3: 在shared library中创建libcudnn.so和libcublas.so
cd /usr/lib
sudo ln -s /opt/conda/pkgs/cudnn-8.4.1.50-hed8a83a_0/lib/libcudnn.so.8.4.1 libcudnn.so
sudo ln -s /opt/conda/pkgs/cudatoolkit-11.7.0-hd8887f6_11/lib/libcublasLt.so.11.10.1.25 libcublas.so
问题3:
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
No stack trace in paddle, may be caused by external reasons.
----------------------
Error Message Summary:
----------------------
FatalError: `Segmentation fault` is detected by the operating system.
[TimeInfo: *** Aborted at 1676881286 (unix time) try "date -d @1676881286" if you are using GNU date ***]
[SignalInfo: *** SIGSEGV (@0x0) received by PID 19715 (TID 0x7f0da97c4740) from PID 0 ***]
解决方案
- 终端:添加conda虚拟环境变量
export LD_LIBRARY_PATH=/home/jovyan/.conda/envs/paddle/lib:$PATH
其中,paddle
为虚拟环境变量名称,需要修改为自己对应的虚拟环境
- Jupyter:添加环境变量
%env LD_LIBRARY_PATH=/home/jovyan/.conda/envs/paddle/lib:$PATH
问题4
File "/home/jovyan/vol-1/github/PaddleOCR/test.py", line 8, in <module>
result = ocr.ocr(img_path, cls=True)
File "/home/jovyan/vol-1/github/PaddleOCR/paddleocr.py", line 523, in ocr
img = check_img(img)
File "/home/jovyan/vol-1/github/PaddleOCR/paddleocr.py", line 431, in check_img
img, flag_gif, flag_pdf = check_and_read(image_file)
File "/home/jovyan/vol-1/github/PaddleOCR/ppocr/utils/utility.py", line 93, in check_and_read
for pg in range(0, pdf.pageCount):
AttributeError: 'Document' object has no attribute 'pageCount'
解决方案
pip install pymupdf==1.18.14 -i https://pypi.tuna.tsinghua.edu.cn/simple
问题5
ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory
解决方案
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/conda/envs/paddle/lib
问题6
Exception has occurred: OSError
In user code:
File "tools/export_model.py", line 172, in <module>
main()
File "tools/export_model.py", line 165, in main
sub_model_save_path, logger)
File "tools/export_model.py", line 99, in export_single_model
paddle.jit.save(model, save_path)
File "<decorator-gen-101>", line 2, in save
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/jit.py", line 744, in save
inner_input_spec)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 517, in concrete_program_specify_input_spec
*desired_input_spec)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 427, in get_concrete_program
concrete_program, partial_program_layer = self._program_cache[cache_key]
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 723, in __getitem__
self._caches[item] = self._build_once(item)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 714, in _build_once
**cache_key.kwargs)
File "<decorator-gen-99>", line 2, in from_func_spec
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 51, in __impl__
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/dygraph_to_static/program_translator.py", line 662, in from_func_spec
outputs = static_func(*inputs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/architectures/base_model.py", line 79, in forward
x = self.backbone(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/backbones/det_mobilenet_v3.py", line 146, in forward
x = self.conv(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/paddle/debug/PaddleOCR/ppocr/modeling/backbones/det_mobilenet_v3.py", line 179, in forward
x = self.conv(x)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 917, in __call__
return self._dygraph_call_func(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/nn/layer/conv.py", line 677, in forward
use_cudnn=self._use_cudnn)
File "/usr/local/lib/python3.7/site-packages/paddle/nn/functional/conv.py", line 148, in _conv_nd
type=op_type, inputs=inputs, outputs=outputs, attrs=attrs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/layer_helper.py", line 43, in append_op
return self.main_program.current_block().append_op(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 3184, in append_op
attrs=kwargs.get("attrs", None))
File "/usr/local/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2224, in __init__
for frame in traceback.extract_stack():
ExternalError: CUDNN error(4), CUDNN_STATUS_INTERNAL_ERROR.
[Hint: 'CUDNN_STATUS_INTERNAL_ERROR'. An internal cuDNN operation failed. ] (at ../paddle/phi/backends/gpu/gpu_resources.cc:285)
[operator < conv2d_fusion > error]
File "/home/jovyan/vol-1/github/PaddleOCR2/PaddleOCR/tools/infer/predict_det.py", line 243, in __call__
self.predictor.run()
File "/home/jovyan/vol-1/github/PaddleOCR2/PaddleOCR/tools/infer/predict_system.py", line 76, in __call__
dt_boxes, elapse = self.text_detector(img)
File "/home/jovyan/vol-1/github/PaddleOCR2/PaddleOCR/paddleocr.py", line 556, in ocr
dt_boxes, rec_res, _ = self.__call__(img, cls)
File "/home/jovyan/vol-1/github/PaddleOCR2/PaddleOCR/ppocr/data/imaug/label_ops.py", line 1148, in _load_ocr_info
ocr_result = self.ocr_engine.ocr(data['image'], cls=False)[0]
File "/home/jovyan/vol-1/github/PaddleOCR2/PaddleOCR/ppocr/data/imaug/label_ops.py", line 1016, in __call__
ocr_info = self._load_ocr_info(data)
File "/home/jovyan/vol-1/github/PaddleOCR2/PaddleOCR/ppocr/data/imaug/__init__.py", line 56, in transform
data = op(data)
File "/home/jovyan/vol-1/github/PaddleOCR2/PaddleOCR/tools/infer_kie_token_ser.py", line 106, in __call__
batch = transform(data, self.ops)
File "/home/jovyan/vol-1/github/PaddleOCR2/PaddleOCR/tools/infer_kie_token_ser.py", line 149, in <module>
result, _ = ser_engine(data)
解决方案
[root@pkm-05 ~]# nvidia-smi
Fri Aug 4 19:12:16 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:5E:00.0 Off | 0 |
| N/A 53C P0 26W / 70W | 2139MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:5F:00.0 Off | 0 |
| N/A 48C P8 15W / 70W | 46MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla T4 Off | 00000000:86:00.0 Off | 0 |
| N/A 48C P8 15W / 70W | 12MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla T4 Off | 00000000:D8:00.0 Off | 0 |
| N/A 51C P8 16W / 70W | 4MiB / 15360MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
查看显卡显存占用情况,关闭无用的僵尸进程,具体参考https://blog.csdn.net/qq_39698985/article/details/130111562
查看用户进程详情: ps aux | grep 用户进程
问题7
AttributeError: module ‘paddle‘ has no attribute ‘utils‘
解决方案
重新安装paddle
pip uninstall paddlepaddle
python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
问题8
1. 问题描述:
WARNING:root:PaddlePaddle meets some problem with 4 GPUs. This may be caused by:
1. There is not enough GPUs visible on your system
2. Some GPUs are occupied by other process now
3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests
to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
WARNING:root:
Original Error is: (PreconditionNotMet) The third-party dynamic library (libnccl.so) that Paddle depends on is not configured correctly. (error code is libnccl.so: cannot open shared object file: No such file or directory)
Suggestions:
1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
2. Configure third-party dynamic library environment variables as follows:
- Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`
- Windows: set PATH by `set PATH=XXX; (at /paddle/paddle/phi/backends/dynload/dynamic_loader.cc:305)
2. 查看系统版本
(paddle) jovyan@wangzy-p3-0:~/vol-1/soft$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
3.下载安装包
下载地址:https://developer.nvidia.com/nccl/nccl-legacy-downloads
4. 安装包
(paddle) jovyan@wangzy-p3-0:~/vol-1/soft$ sudo dpkg -i nccl-local-repo-ubuntu2004-2.14.3-cuda11.7_1.0-1_amd64.deb
Selecting previously unselected package nccl-local-repo-ubuntu2004-2.14.3-cuda11.7.
(Reading database ... 25987 files and directories currently installed.)
Preparing to unpack nccl-local-repo-ubuntu2004-2.14.3-cuda11.7_1.0-1_amd64.deb ...
Unpacking nccl-local-repo-ubuntu2004-2.14.3-cuda11.7 (1.0-1) ...
Setting up nccl-local-repo-ubuntu2004-2.14.3-cuda11.7 (1.0-1) ...
The public nccl-local-repo-ubuntu2004-2.14.3-cuda11.7 GPG key does not appear to be installed.
To install the key, run this command:
sudo cp /var/nccl-local-repo-ubuntu2004-2.14.3-cuda11.7/nccl-local-44000BE4-keyring.gpg /usr/share/keyrings/
sudo cp /var/nccl-local-repo-ubuntu2004-2.14.3-cuda11.7/nccl-local-44000BE4-keyring.gpg /usr/share/keyrings/
sudo apt install libnccl2 libnccl-dev
5. 测试安装效果
(paddle) jovyan@wangzy-p3-0:~/vol-1/soft$ python
Python 3.8.13 (default, Oct 21 2022, 23:50:54)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
[4pdvGPU Msg(31110:140474696726336:libvgpu.c:805)]: Initializing...
[4pdvGPU Msg(31110:140474696726336:context.c:120)]: vdevices_pci=0000:5e:00.0
[4pdvGPU Msg(31110:140474696726336:context.c:120)]: vdevices_pci=0000:5f:00.0
[4pdvGPU Msg(31110:140474696726336:context.c:120)]: vdevices_pci=0000:86:00.0
[4pdvGPU Msg(31110:140474696726336:context.c:120)]: vdevices_pci=0000:d8:00.0
[4pdvGPU Msg(31110:140474696726336:libvgpu.c:823)]: Initialized
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0928 06:53:31.457798 31110 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.5, Driver API Version: 11.7, Runtime API Version: 11.7
W0928 06:53:31.472273 31110 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
PaddlePaddle works well on 1 GPU.
/home/jovyan/.conda/envs/paddle/lib/python3.8/site-packages/paddle/fluid/executor.py:1583: UserWarning: Standalone executor is not used for data parallel
warnings.warn(
W0928 06:53:37.109949 31110 fuse_all_reduce_op_pass.cc:79] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 2.
PaddlePaddle works well on 4 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
参考资料:
https://www.freesion.com/article/7014941903/
https://stackoverflow.com/questions/72365190/libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directory-even-whe