【开发】过程中常见的问题和错误总结

最新推荐文章于 2024-08-15 15:02:58 发布

子韵如初

最新推荐文章于 2024-08-15 15:02:58 发布

阅读量2.4w

点赞数 2

文章标签： docker docker c++ 计算机视觉 python

本文链接：https://blog.csdn.net/weixin_43953700/article/details/123698852

版权

这篇博客汇总了在Linux环境下遇到的各种依赖问题及其解决方案，包括找不到.so文件、库文件缺失、编译安装报错、Docker中网络问题、NVIDIA驱动问题、Python库安装问题、GPU内存管理等。通过检查环境变量、安装缺失包、更新或降级库版本、修改Docker配置等方式，逐一解答了这些问题。

摘要由CSDN通过智能技术生成

Python

更多关注
计算机视觉-Paper&Code - 知乎

问题	截图	解决方案	备注
相关的.so文件找不到		1、检查环境变量是否正确。2、使用查找到相应的so文件后 `find / -name "{缺失的so文件}"` 。添加到环境变量中 `export LD_LIBRARY_PATH={DIRECTORY}`
libxml2.so.2: cannot open shared object file: No such file or directory		apt-get install libxml2 -y && apt-get install openssl openssl-dev -y
libgmpxx.so.4: cannot open shared object file: No such file or directory		apt-get install libgmpxx4ldbl
安装h5py报错 error: Unable to load dependency HDF5, make sure HDF5 is installed properly error: libhdf5.so: cannot open shared object file: No such file or directory		apt-get install libhdf5-dev -y
编译安装mmcv时，出现ERROR: `Could not find a version that satisfies the requirement pytest-runner <br> ERROR: No matching distribution found for pytest-runner`		pip install pytest-runner	dockerfile中检查，先安装pytest-runner
Docker容器内pip timeout ERROR: No matching distribution found for numpy		1、使用–net host选项 `docker run --net host --name ubuntu -it ubuntu bash` 2、使用–dns选项 `docker run --dns 8.8.8.8 --dns 8.8.4.4 --name ubuntu -it ubuntu bash` 3、改dns server	原因是Docker容器内不能联网，无法使用DNS解析
docker编译引用nvidia镜像源报错 apt-get update 失败 Reading package lists… Done W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools cudatools@nvidia.com E: The repository ‘https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release’ is not signed. N: Updating from such a repository can’t be done securely, and is therefore disabled by default. N: See apt-secure(8) manpage for repository creation and user configuration details.		RUN rm /etc/apt/sources.list.d/*	删除掉nVidia的源，英伟达DNS时不时会挂掉
运行docker run NVIDIA镜像时候出现 no such file or directory): exec: “nvidia-container-runtime”: executable file not found in $PATH: : unknown.		apt-get install nvidia-container-runtime
autogluon出现 multiprocessing.context.TimeoutError		docker run --shm 4096m	多进程加载数据集DataLoader会占用大量共享内存，docker默认是64m
ssh server在平台无法启动，报错/lib/x86_64-linux-gnu/libc.so.6: version ‘GLIBC_2.25’ not found	https://apulis-gitlab.apulis.cn/apulis/apulis-wiki/-/blob/master/algorithm/wiki/docker/How-to-write-a-better-dockerfile.md		使用Ubuntu16.04的镜像会导致在平台中无法启动ssh server，更换为Ubuntu18.04
GPU训练时 `Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED`		`config = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True))`	GPU使用时，需要申请内存
horovod无法调用多卡`TensorFlow device (GPU:0) is being mapped to multiple CUDA devices`		1、如果开头调用过device_lib中的list_gpu_devices则无法调用多卡。 2、如果使用`MonitoredTrainingSession`，则需要在初始化全局变量之后执行`hvd.broadcast_global_variables`。
使用python opencv时，报错找不到libGL.so.1	[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qgPm0Xbr-1648046933271)(1.png)]	sudo apt-get update && sudo apt-get install libgl1-mesa-glx -y	安装opencv缺少packages
NotImplementedError: Cannot convert a symbolic Tensor (strided_slice_4:0) to a numpy array.			numpy版本不支持symbolic tensor
RuntimeError: mindspore/ccsrc/transform/graph_ir/convert.cc:102 FindAdapter] Can’t find OpAdapter for Div		升级mindspore版本	当前导出AIR不支持包含控制流语义的网络，类似在网络的construct中存在 for、while、if的语法。centernet转air报这个错转mindir不会报错
no module find object_detection		1、添加python sys path,sys.path.append("./object_detection") 2、export PYTHONPATH=./object_detection。 3、将相关依赖文件夹拷贝到/usr/local/lib/python/site-packages中	找不到引用的相关package
libX11.so.6:cannot open shared object file: No such file or directory		1、关闭掉plot绘图。`matplotlib.use('agg')` 2、安装libx11 `apt-get install libx11-dev`	matplotlib需要使用绘图程序，linux上不能调用图形插件，因此关掉图形显示
numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObject		pip uninstall numpy && pip install numpy	numpy版本不匹配，需要重新安装
在linux上执行脚本时出现$’\r’:command not found		1、apt-get install dos2unix 2、dos2unix 文件名	shell脚本在windows下编辑后上传到linux上执行时，windows下的换行是\r\n,而linux下是换行符\n。linux下不识别\r为回车符，所以报错。因此使用dos2unix命令将脚本文件中的\r去掉即可
NotImplementedError: Cannot convert a symbolic Tensor (2nd_target:0) to a numpy array		pip install numpy==1.19.5	numpy版本错误，需要将1.20.2降为1.19.5
git pull的时候出现 fatal: refusing to merge unrelated histories		git pull origin master --allow-unrelated-histories	由于远程仓库合并了相关commit，导致本地仓库和远程仓库实际上历史commit对不上。同样也可以重新clone解决，建立好每次修改代码前都pull的好习惯
self._abc_registry = extra._abc_registry AttributeError: type object ‘Callable’ has no attribute ‘_abc_registry’		pip uninstall typing 之后还不行就pip uninstall dataclasses
pip install autogluon安装autogluon报错 AttributeError: type object ‘Callable’ has no attribute ‘_abc_registry’		pip uninstall typing
import apt_pkg ModuleNotFoundError: No module named ‘apt_pkg’		apt-get install -y python3-apt python-apt python-dev python3-dev python3.7 -m pip install --upgrade setuptools pip
安装autogluon、ConfigSpace出现 error: command ‘x86_64-linux-gnu-gcc’ failed with exit status 1 ERROR: Could not build wheels for ConfigSpace which use PEP 517 and cannot be installed directly		sudo apt-get install python3 python-dev python3-dev build-essential libssl-dev libffi-dev libxml2-dev libxslt1-dev zlib1g-dev python-pip
pytorch出现 correct_k = correct[:k].view(-1).float().sum(0).item() RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.		对tensor进行操作时先调用contiguous()。如tensor.contiguous().view()	用多卡训练的时候tensor不连续，即tensor分布在不同的内存或显存中
CMake Error at CMakeLists.txt:183 (message): Protobuf compiler not found Call Stack (most recent call first): CMakeLists.txt:202 (RELATIVE_PROTOBUF_GENERATE_CPP)		apt-get install libprotobuf-dev protobuf-compiler -y && export CMAKE_ARGS="-DONNX_USE_PROTOBUF_SHARED_LIBS=ON"	安装onnxruntime1.2.1报错
Fitting model: KNeighborsUnif … Training model for up to 3599.93s of the 3599.93s of remaining time. BLAS : Program is Terminated. Because you tried to allocate too many memory regions.		fit添加hyperparameters ={“KNN”:,“n_jobs”:16}	autogluon automl报错
fater error: Python.h:No such file or directory		sudo apt-get install python3.7-dev
安装pycuda报错		pip install pycuda --global-option="-I/usr/local/cuda-10.0/targets/aarch64-linux/include/" --global-option="-L/usr/local/cuda-10.0/targets/aarch64-linux/lib/"
fatal: unable to access ‘http://apulis-gitlab.apulis.cn/apulis/model-gallery/’: Problem with the SSL CA cert (path? access rights?)		sudo apt install -y ca-certificates	镜像证书过期
Error: No such container:path: pytorch_backend_ptlib:/opt/conda/lib/libomp.so		apt-get install libomp5 libomp-dev -y cp /usr/lib/x86_64-linux-gnu/libomp.so .