1.2019-5-57:安装好anaconda3找不到conda命令:
echo 'export PATH="/code/anaconda3/bin:
P
A
T
H
"
′
>
>
/
.
b
a
s
h
r
c
s
o
u
r
c
e
/
.
b
a
s
h
r
c
c
o
n
d
a
–
v
2.2019
−
5
−
31
:
I
m
p
o
r
t
E
r
r
o
r
:
N
o
m
o
d
u
l
e
n
a
m
e
d
′
n
e
t
s
′
v
i
m
/
.
b
a
s
h
r
c
e
x
p
o
r
t
P
Y
T
H
O
N
P
A
T
H
=
PATH"' >> ~/.bashrc source ~/.bashrc conda –v 2.2019-5-31: ImportError: No module named 'nets' vim ~/.bashrc export PYTHONPATH=
PATH"′>> /.bashrcsource /.bashrcconda–v2.2019−5−31:ImportError:Nomodulenamed′nets′vim /.bashrcexportPYTHONPATH=PYTHONPATH:/code/anaconda3/lib/python3.5/site-packages/tensorflow/models/research:/code/anaconda3/lib/python3.5/site-packages/tensorflow/models/research/slim
source ~/.bashrc
3.自己下载数据集:
注释相关操作,直接bash
4.出现找不到什么文件 :直接把文件路径设置为绝对路径就行了,应该能够解决大部分的问题
5.jupyter 密码被莫名其妙修改: jupyter notebook passwd,强行覆盖修改配置文件中的密码
6. ImportError: libSM.so.6: cannot open shared object file: No such file or directory
ImportError: libXrender.so.1: cannot open shared object file: No such file or directory
ImportError: libXext.so.6: cannot open shared object file: No such file or directory
解决:apt-get install libsm6
apt-get install libxrender1
apt-get install libxext-dev
- ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory
apt-get update
apt-get install libglib2.0-dev - ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm).
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1 -p 7000:7000 -it --rm -v /mnt/cjy/code:/code --ipc=host docker.local/2018140256/dockerfile/road_extraction:torch1.1.0-1 - THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=383 error=11 : invalid argument
I actually just solved it compiling pytorch from source. Might be useful for someone else having the same problem - RuntimeError: cannot join current thread
安装4.19版本
11.ValueError: num_samples should be a positive integer value, but got num_samples=0
没有读进图
12.查看分盘情况
lsblk
========看到学弟的博客。。。想起来我还有个这个东西,写两句,写两句
娘诶。。。发现这篇不知道是什么时候的
开始瞎写来
1.安装h5py :https://stackoverflow.com/questions/29831052/error-importing-h5py
sudo pip install cython
sudo apt-get install libhdf5-dev
sudo pip install h5py
2.ImportError: No module named pywt
pip install PyWavelets
3.服务器的GUI界面远程打开
Xming + putty
4.from pip import main ImportError: cannot import name ‘main’
vi /usr/bin/pip3
from pip import main //这行也要修改
if name == 'main’:
sys.exit(main.main())//增加__main_._
5.安装提速
python -m pip install torch==0.4.0 torchvision -i https://pypi.tuna.tsinghua.edu.cn/simple
6.重启服务 docker
Sudo systemctl daemon-reload
Sudo systemctl restart docker
7.docker: Error response from daemon: create nvidia_driver_430.14: error looking up volume plugin nvidia-docker: plugin “nvidia-docker” not found.
解决 sudo service nvidia-docker start
8.查看磁盘信息
sudo hdparm -I /dev/sda
9.容器网络端口暴露:nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=0,1,2,3 -it --rm -v /home/cjy:/code --ipc=host --network host docker.local/2018140256/dockerfile/road_extraction:torch1.0-cuda10-cudnn7-tensorboard
- RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
from torch.autograd import Variable as V
loss = StableBCELoss()(logits, V(labels.float(),requires_grad=True))
AttributeError: module ‘yaml’ has no attribute ‘FullLoader’
解决:安个yaml
11.下面这个是tensorflow和cudnn版本兼容的系列错误,懂得都懂,我就瞎写,不懂的就用torch吧
ERROR: tensorflow-gpu 1.14.0 has requirement tensorboard<1.15.0,>=1.14.0, but you’ll have tensorboard 1.13.1 which is incompatible.
ERROR: tensorflow-gpu 1.14.0 has requirement tensorflow-estimator<1.15.0rc0,>=1.14.0rc0, but you’ll have tensorflow-estimator 1.13.0 which is incompatible.
查看cudnn环境
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
ImportError: libcudnn.so.7: cannot open shared object file: No such file or directory
https://github.com/tensorflow/tensorflow/issues/20271
Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
https://github.com/tensorflow/tensorflow/issues/24828
直接换tensorflow的版本以及cudnn的版本不对
12.报错:bool value of Tensor with more than one value is ambiguous
解决:
loss_function=nn.MSELoss #错误
loss_function=nn.MSELoss()#正确
13.报错:SyntaxError: non-default argument follows default argument
解决:将带默认值的放在无默认值的前面,换一下位置就行了
14.报错:RuntimeError: copy_if failed to synchronize: device-side assert triggered
解决:标签设置为0-1
有记得的就这么多,剩下不记得的遇到了再说。
====================问:如何能让自己心态好一点?=