https://github.com/cuiziteng/Aleth-NeRF
https://arxiv.org/abs/2312.09093
Training Aleth-NeRF
For low-light conditions, we default set con = 12 and eta = 0.45 (Table.2’s results):
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 run.py --ginc configs/LOM/aleth_nerf/aleth_nerf_buu.gin --logbase ./logs --con 12 --eta 0.45
UserWarning: This DataLoader will create 16 worker processes in total.
Our suggested max number of worker in current system is 12, which is
smaller than what this DataLoader is going to create. Please be aware
that excessive worker creation might get DataLoader running slow or
even freeze, lower the worker number to avoid potential
slowness/freeze if necessary. warnings.warn(_create_warning_msg(
RuntimeError: CUDA out of memory. Tried to allocate 386.00 MiB (GPU 0;
15.70 GiB total capacity; 13.59 GiB already allocated; 119.00 MiB free; 13.71 GiB reserved in total by PyTorch) If reserved memory is >>
allocated memory try setting max_split_size_mb to avoid fragmentation.
See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
标题解决办法:把batch_size改小,CUDA_VISIBLE_DEVICES=0,1,2,3 python3 run.py --ginc configs/LOM/aleth_nerf/aleth_nerf_buu.gin --logbase ./logs --con 12 --eta 0.45
由于运行buu,目前改buu相关的batch_size就可以
如果还报显存错再成倍减小batch_size,运行到一半又报错
RuntimeError: CUDA out of memory. Tried to allocate 1.67 GiB (GPU 0; 15.70 GiB total capacity; 3.63 GiB already allocated; 723.62 MiB free; 5.89 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
In call to configurable ‘run’ (<function run at 0x7f6a3fc5bd30>)
把前面的警告解决num_workers
依然报显存错,而且每次都是迭代到12500的时候报错
kill -9 程序号#杀死一些进程
出现新错误(猜测应该是杀死进程的时候把firfox杀死的原因)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>
重新把进程打开,杀死不必要的进程
然后再运行这行代码
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 run.py --ginc configs/LOM/aleth_nerf/aleth_nerf_buu.gin --logbase ./logs --con 12 --eta 0.45
报错
连接问题(该方法没用)
export | grep "proxy"
unset https_proxy#设置程序在发起网络请求时使用的代理服务器
解决办法:(成功版)
将下载好的.pth手动复制过去
由于把.path文件手动复制过去,报错可以忽略了,epoch2忘记截图了,运行
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 run.py --ginc configs/LOM/aleth_nerf/aleth_nerf_buu.gin --logbase ./logs --con 12 --eta 0.45
直接自动进入epoch0 epoch2 epoch3了
成功!!!!!
附加图
2024/08/05补充,今天重新换台电脑复现了一下,发现下载vgg.pth后放入的/home/uriky/.cache文件目录找不到,我后面猛然想起在隐藏文件里,之前就搞过一次忘了下面告知怎么打开隐藏文件