复现Aleth-NeRF

顾默@

已于 2024-08-05 21:01:56 修改

阅读量676

点赞数 10

分类专栏： NeRF代码复现文章标签： ubuntu linux conda

于 2024-05-17 14:45:45 首次发布

本文链接：https://blog.csdn.net/weixin_53765004/article/details/139002227

版权

NeRF代码复现专栏收录该内容

11 篇文章 2 订阅

订阅专栏

https://github.com/cuiziteng/Aleth-NeRF
https://arxiv.org/abs/2312.09093

Training Aleth-NeRF

For low-light conditions, we default set con = 12 and eta = 0.45 (Table.2’s results):

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 run.py --ginc configs/LOM/aleth_nerf/aleth_nerf_buu.gin --logbase ./logs --con 12 --eta 0.45

UserWarning: This DataLoader will create 16 worker processes in total.
Our suggested max number of worker in current system is 12, which is
smaller than what this DataLoader is going to create. Please be aware
that excessive worker creation might get DataLoader running slow or
even freeze, lower the worker number to avoid potential
slowness/freeze if necessary. warnings.warn(_create_warning_msg(

RuntimeError: CUDA out of memory. Tried to allocate 386.00 MiB (GPU 0;
15.70 GiB total capacity; 13.59 GiB already allocated; 119.00 MiB free; 13.71 GiB reserved in total by PyTorch) If reserved memory is >>
allocated memory try setting max_split_size_mb to avoid fragmentation.
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

标题解决办法：把batch_size改小，CUDA_VISIBLE_DEVICES=0,1,2,3 python3 run.py --ginc configs/LOM/aleth_nerf/aleth_nerf_buu.gin --logbase ./logs --con 12 --eta 0.45

由于运行buu，目前改buu相关的batch_size就可以
在这里插入图片描述

在这里插入图片描述

如果还报显存错再成倍减小batch_size,运行到一半又报错
RuntimeError: CUDA out of memory. Tried to allocate 1.67 GiB (GPU 0; 15.70 GiB total capacity; 3.63 GiB already allocated; 723.62 MiB free; 5.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
In call to configurable ‘run’ (<function run at 0x7f6a3fc5bd30>)

在这里插入图片描述
把前面的警告解决num_workers
依然报显存错，而且每次都是迭代到12500的时候报错

kill -9 程序号#杀死一些进程

出现新错误(猜测应该是杀死进程的时候把firfox杀死的原因)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>
在这里插入图片描述
重新把进程打开，杀死不必要的进程
然后再运行这行代码

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 run.py --ginc configs/LOM/aleth_nerf/aleth_nerf_buu.gin --logbase ./logs --con 12 --eta 0.45

报错
在这里插入图片描述

连接问题（该方法没用）
在这里插入图片描述

export | grep "proxy"
unset https_proxy#设置程序在发起网络请求时使用的代理服务器

解决办法：（成功版）

在这里插入图片描述将下载好的.pth手动复制过去

在这里插入图片描述

在这里插入图片描述
由于把.path文件手动复制过去，报错可以忽略了，epoch2忘记截图了，运行
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 run.py --ginc configs/LOM/aleth_nerf/aleth_nerf_buu.gin --logbase ./logs --con 12 --eta 0.45
直接自动进入epoch0 epoch2 epoch3了
在这里插入图片描述

成功！！！！！