3DStyleGAN环境配置踩坑记录

在我跑3D StyleGAN for Three Dimensional Medical Images这个github源码repo的时候再配置环境阶段遇到了诸多的问题,因此写下此篇博客记录我与tensorflow,cuda的相爱相杀

github原repo:https://github.com/sh4174/3DStyleGAN

运行环境:linux+RTXA6000 +cuda11.3+Tensorflow-gpu 2.4.1+python3.8

  1. 创建虚拟环境并安装cuda,cudann

conda create -n py38 python=3.8 #创建虚拟环境
conda activate py38 #激活虚拟环境
conda install cudatoolkit=11.3 cudnn=8.2.1  #安装cuda cudann
#安装TensorFlow和keras
pip install tensorflow-gpu==2.4.1 keras==2.4.3 -i https://pypi.douban.com/simple/

安装一些基础的库

pip install  pillow==8.2 numpy==1.21 matplotlib scipy pandas scikit-learn   tqdm  imutils PyYAML    seaborn protobuf==3.20  -i https://pypi.douban.com/simple/

安装 wget 

apt-get install wget
wget https://img.iduodou.com/images/docs/20230205/E5FCA0B3-F44C-4EF8-BDC1-9248A57F6603.zip
unzip E5FCA0B3-F44C-4EF8-BDC1-9248A57F6603.zip 

使用下面的程序测试是否安装完成

import tensorflow as tf 
print(tf.test.is_gpu_available())
gpus=tf.config.experimental.list_physical_devices(device_type='GPU')
cpus=tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus,cpus)
import tensorflow as tf

如果显示gpu的话表明安装成功

  1. 运行run_training.py的时候运行 nvcc fatal : Value 'sm_86' is not defined for option 'gpu-architecture'

在/dnnlib/tflib/custom_ops.py line142 修改为:

compile_opts += ' --gpu-architecture=sm_60'

3.报错:NotImplementedError: Cannot convert a symbolic Tensor(input)to a numpy array. This error may indicate that you’re trying to pass a Tensor to a NumPy call, which is not supported

这个报错是numpy版本和TensorFlow版本不对应的问题,我看到一些博主说是需要把numpy版本升级到 numpy==1.19.5,我使用conda install numpy==1.19.5时报错,显示了一堆网址,点击进入最后一个网址:https://anaconda.org/  

进入后出现安装命令行

conda install -c cctbx202105 numpy

不过可惜,对我依然没有用,也有博主说是需要python3.8,不过本身创建环境时就是3,8,在最后,我看到一篇博文说安装到numpy==1.19.2,问题解决,迎接后续的报错

conda install numpy==1.19.2

4.Your CUDA software stack is old. We fallback to the NVIDIA driver for some compilation. Update your CUDA version to get the best performance. The ptxas error was: ptxas fatal : Value 'sm_86' is not defined for option 'gpu-name'

这是警告不是报错。这个warning的意思是: 安装不兼容计算能力8.6(3090/3080)的CUDA和CuDNN,也可以计算,但不能完全发挥显卡的性能。可以忽略。

5.Original stack trace for 'G_synthesis_2/noise0/Initializer/random_normal/RandomStandardNormal':

运行日志如下:

Original stack trace for 'G_synthesis_2/noise0/Initializer/random_normal/RandomStandardNormal':
  File "run_training.py", line 588, in <module>
    main()
  File "run_training.py", line 583, in main
    run(**vars(args))
  File "run_training.py", line 512, in run
    dnnlib.submit_run(**kwargs)
  File "/opt/data/private/3DStyleGAN-master/dnnlib/submission/submit.py", line 343, in submit_run
    return farm.submit(submit_config, host_run_dir)
  File "/opt/data/private/3DStyleGAN-master/dnnlib/submission/internal/local.py", line 22, in submit
    return run_wrapper(submit_config)
  File "/opt/data/private/3DStyleGAN-master/dnnlib/submission/submit.py", line 280, in run_wrapper
    run_func_obj(**submit_config.run_func_kwargs)
  File "/opt/data/private/3DStyleGAN-master/training/training_loop_3d.py", line 218, in training_loop
    G_gpu = G if gpu == 0 else G.clone(G.name + '_shadow')
  File "/opt/data/private/3DStyleGAN-master/dnnlib/tflib/network.py", line 314, in clone
    net._init_graph()
  File "/opt/data/private/3DStyleGAN-master/dnnlib/tflib/network.py", line 156, in _init_graph
    out_expr = self._build_func(*self.input_templates, **build_kwargs)
  File "/opt/data/private/3DStyleGAN-master/training/networks3d_stylegan2.py", line 195, in G_main
    components.synthesis = tflib.Network('G_synthesis', func_name=globals()[synthesis_func], **kwargs)
  File "/opt/data/private/3DStyleGAN-master/dnnlib/tflib/network.py", line 99, in __init__
    self._init_graph()
  File "/opt/data/private/3DStyleGAN-master/dnnlib/tflib/network.py", line 156, in _init_graph
    out_expr = self._build_func(*self.input_templates, **build_kwargs)
  File "/opt/data/private/3DStyleGAN-master/training/networks3d_stylegan2.py", line 372, in G_synthesis_stylegan2_3d_curated_real
    noise_inputs.append(tf.get_variable('noise%d' % layer_idx, shape=shape, initializer=tf.initializers.random_normal(), trainable=False))
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 1577, in get_variable
    return get_variable_scope().get_variable(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 1320, in get_variable
    return var_store.get_variable(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 576, in get_variable
    return _true_getter(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 529, in _true_getter
    return self._get_single_variable(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 950, in _get_single_variable
    v = variables.VariableV1(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 260, in __call__
    return cls._variable_v1_call(*args, **kwargs)
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 206, in _variable_v1_call
    return previous_getter(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 199, in <lambda>
    previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variable_scope.py", line 2620, in default_variable_creator
    return variables.RefVariable(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 264, in __call__
    return super(VariableMetaclass, cls).__call__(*args, **kwargs)
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 1656, in __init__
    self._init_from_args(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/variables.py", line 1797, in _init_from_args
    initial_value = initial_value()
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/init_ops.py", line 308, in __call__
    return random_ops.random_normal(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/random_ops.py", line 94, in random_normal
    rnd = gen_random_ops.random_standard_normal(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/ops/gen_random_ops.py", line 653, in random_standard_normal
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/framework/op_def_library.py", line 748, in _apply_op_helper
    op = g._create_op_internal(op_type_name, inputs, dtypes=None,
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 3528, in _create_op_internal
    ret = Operation(
  File "/opt/tools/anaconda3/envs/py38/lib/python3.8/site-packages/tensorflow/python/framework/ops.py", line 1990, in __init__
    self._traceback = tf_stack.extract_stack()

这个问题要定位就比较难了,搜了很多资料。直到搜到一篇日文的博客,他重复到这一点

--num-gpus=8」って、8個のGPUを同時に使うという意味のようですが、google colabでは、そんなことできないのではないですかね

我再看回报错信息提到:Variablev2:GPU:1 突然意识到,我服务器上是单卡,但是运行命令时写的num-gpus=4,之前一直在输入参数时把这个参数遗漏。

  1. 遇到这个报错: /bin/sh:1:nvcc not found

刚开始是在autodl租的服务器,找不到/usr/local/cuda的命令,当转到自己服务器时,仍然有这个问题。

解决方法:

查看/usr/local/cuda/bin下是否有nvcc可执行程序,如果有则说明nvcc没有被设置为系统变量,执行如下命令

 $ cd /usr/local/cuda

发现了nvcc确实已安装,则只需执行如下命令将其加入系统变量中:

 $ sudo vi ~/.bashrc

在末尾行添加环境变量

export LD_LIBRARY_PATH=/usr/local/cuda/lib
export PATH=$PATH:/usr/local/cuda/bin

再输入 :wq 退出并保存文件

按下回车退出并保存

输入 nvcc -V 测试

使用下面代码测试TensorFlow-gpu是否安装正确:

import tensorflow as tf 
print(tf.test.is_gpu_available())
gpus=tf.config.experimental.list_physical_devices(device_type='GPU')
cpus=tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus,cpus)

自此,除开警告不能完全利用cuda性能的,项目正式运行。

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值