【开发】过程中常见的问题和错误总结

这篇博客汇总了在Linux环境下遇到的各种依赖问题及其解决方案,包括找不到.so文件、库文件缺失、编译安装报错、Docker中网络问题、NVIDIA驱动问题、Python库安装问题、GPU内存管理等。通过检查环境变量、安装缺失包、更新或降级库版本、修改Docker配置等方式,逐一解答了这些问题。
摘要由CSDN通过智能技术生成

Python

更多关注
计算机视觉-Paper&Code - 知乎

问题截图解决方案备注
相关的.so文件找不到1、检查环境变量是否正确。2、使用查找到相应的so文件后 find / -name "{缺失的so文件}" 。添加到环境变量中 export LD_LIBRARY_PATH={DIRECTORY}
libxml2.so.2: cannot open shared object file: No such file or directoryapt-get install libxml2 -y && apt-get install openssl openssl-dev -y
libgmpxx.so.4: cannot open shared object file: No such file or directoryapt-get install libgmpxx4ldbl
安装h5py报错 error: Unable to load dependency HDF5, make sure HDF5 is installed properly
error: libhdf5.so: cannot open shared object file: No such file or directory
apt-get install libhdf5-dev -y
编译安装mmcv时,出现ERROR: Could not find a version that satisfies the requirement pytest-runner <br> ERROR: No matching distribution found for pytest-runnerpip install pytest-runnerdockerfile中检查,先安装pytest-runner
Docker容器内pip timeout ERROR: No matching distribution found for numpy1、使用–net host选项 docker run --net host --name ubuntu -it ubuntu bash
2、使用–dns选项 docker run --dns 8.8.8.8 --dns 8.8.4.4 --name ubuntu -it ubuntu bash
3、改dns server
原因是Docker容器内不能联网,无法使用DNS解析
docker编译引用nvidia镜像源报错 apt-get update 失败 Reading package lists… Done
W: GPG error: https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1804/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools cudatools@nvidia.com
E: The repository ‘https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 Release’ is not signed.
N: Updating from such a repository can’t be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
RUN rm /etc/apt/sources.list.d/*删除掉nVidia的源,英伟达DNS时不时会挂掉
运行docker run NVIDIA镜像时候出现
no such file or directory): exec: “nvidia-container-runtime”: executable file not found in $PATH: : unknown.
apt-get install nvidia-container-runtime
autogluon出现 multiprocessing.context.TimeoutErrordocker run --shm 4096m多进程加载数据集DataLoader会占用大量共享内存,docker默认是64m
ssh server在平台无法启动,报错/lib/x86_64-linux-gnu/libc.so.6: version ‘GLIBC_2.25’ not foundhttps://apulis-gitlab.apulis.cn/apulis/apulis-wiki/-/blob/master/algorithm/wiki/docker/How-to-write-a-better-dockerfile.md使用Ubuntu16.04的镜像会导致在平台中无法启动ssh server,更换为Ubuntu18.04
GPU训练时 Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILEDconfig = tf.compat.v1.ConfigProto(gpu_options=tf.compat.v1.GPUOptions(allow_growth=True))GPU使用时,需要申请内存
horovod无法调用多卡TensorFlow device (GPU:0) is being mapped to multiple CUDA devices1、如果开头调用过device_lib中的list_gpu_devices则无法调用多卡。 2、如果使用MonitoredTrainingSession,则需要在初始化全局变量之后执行hvd.broadcast_global_variables
使用python opencv时,报错找不到libGL.so.1[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-qgPm0Xbr-1648046933271)(1.png)]sudo apt-get update && sudo apt-get install libgl1-mesa-glx -y安装opencv缺少packages
NotImplementedError: Cannot convert a symbolic Tensor (strided_slice_4:0) to a numpy array.numpy版本不支持symbolic tensor
RuntimeError: mindspore/ccsrc/transform/graph_ir/convert.cc:102 FindAdapter] Can’t find OpAdapter for Div升级mindspore版本当前导出AIR不支持包含控制流语义的网络,类似在网络的construct中存在 for、while、if的语法。centernet转air报这个错 转mindir不会报错
no module find object_detection1、添加python sys path,sys.path.append("./object_detection")
2、export PYTHONPATH=./object_detection。 3、将相关依赖文件夹拷贝到/usr/local/lib/python/site-packages中
找不到引用的相关package
libX11.so.6:cannot open shared object file: No such file or directory1、关闭掉plot绘图。matplotlib.use('agg') 2、安装libx11 apt-get install libx11-devmatplotlib需要使用绘图程序,linux上不能调用图形插件,因此关掉图形显示
numpy.ndarray size changed, may indicate binary incompatibility. Expected 88 from C header, got 80 from PyObjectpip uninstall numpy && pip install numpynumpy版本不匹配,需要重新安装
在linux上执行脚本时出现$’\r’:command not found1、apt-get install dos2unix
2、dos2unix 文件名
shell脚本在windows下编辑后上传到linux上执行时,windows下的换行是\r\n,而linux下是换行符\n。linux下不识别\r为回车符,所以报错。因此使用dos2unix命令将脚本文件中的\r去掉即可
NotImplementedError: Cannot convert a symbolic Tensor (2nd_target:0) to a numpy arraypip install numpy==1.19.5numpy版本错误,需要将1.20.2降为1.19.5
git pull的时候出现 fatal: refusing to merge unrelated historiesgit pull origin master --allow-unrelated-histories由于远程仓库合并了相关commit,导致本地仓库和远程仓库实际上历史commit对不上。同样也可以重新clone解决,建立好每次修改代码前都pull的好习惯
self._abc_registry = extra._abc_registry
AttributeError: type object ‘Callable’ has no attribute ‘_abc_registry’
pip uninstall typing 之后还不行就pip uninstall dataclasses
pip install autogluon安装autogluon报错 AttributeError: type object ‘Callable’ has no attribute ‘_abc_registry’pip uninstall typing
import apt_pkg
ModuleNotFoundError: No module named ‘apt_pkg’
apt-get install -y python3-apt python-apt python-dev python3-dev
python3.7 -m pip install --upgrade setuptools pip
安装autogluon、ConfigSpace出现 error: command ‘x86_64-linux-gnu-gcc’ failed with exit status 1
ERROR: Could not build wheels for ConfigSpace which use PEP 517 and cannot be installed directly
sudo apt-get install python3 python-dev python3-dev build-essential libssl-dev libffi-dev libxml2-dev libxslt1-dev zlib1g-dev python-pip
pytorch出现 correct_k = correct[:k].view(-1).float().sum(0).item() RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.对tensor进行操作时先调用contiguous()。如tensor.contiguous().view()用多卡训练的时候tensor不连续,即tensor分布在不同的内存或显存中
CMake Error at CMakeLists.txt:183 (message):
Protobuf compiler not found
Call Stack (most recent call first):
CMakeLists.txt:202 (RELATIVE_PROTOBUF_GENERATE_CPP)
apt-get install libprotobuf-dev protobuf-compiler -y && export CMAKE_ARGS="-DONNX_USE_PROTOBUF_SHARED_LIBS=ON"安装onnxruntime1.2.1报错
Fitting model: KNeighborsUnif … Training model for up to 3599.93s of the 3599.93s of remaining time.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
fit添加hyperparameters ={“KNN”:,“n_jobs”:16}autogluon automl报错
fater error: Python.h:No such file or directorysudo apt-get install python3.7-dev
安装pycuda报错pip install pycuda --global-option="-I/usr/local/cuda-10.0/targets/aarch64-linux/include/" --global-option="-L/usr/local/cuda-10.0/targets/aarch64-linux/lib/"
fatal: unable to access ‘http://apulis-gitlab.apulis.cn/apulis/model-gallery/’: Problem with the SSL CA cert (path? access rights?)sudo apt install -y ca-certificates镜像证书过期
Error: No such container:path: pytorch_backend_ptlib:/opt/conda/lib/libomp.soapt-get install libomp5 libomp-dev -y
cp /usr/lib/x86_64-linux-gnu/libomp.so .
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

子韵如初

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值