在Windows系统下运行pytorch程序报了如下错误:
'页面文件太小,无法完成操作。 Error loading "D:\\anaconda\\envs\\py38cu112\\lib\\site-packages\\torch\\lib\\caffe2_detectron_ops_gpu.dll" or one of its dependencies.'
根据github论坛上一位大神的解释是因为Windows系统下关于dataloader的内存分配与linux系统下不同。具体可参考如下连接
由于NVDIA硬件原因造成的Windows系统下运行pytorch的dll报错
这位大神描述了为什么报错的原因,我似懂非懂。但是按照他指定的方式运行他写的python程序之后,给torch的dll修改了内部的flags,改变dll在调用时对内存需求的大小。
具体而言是首先安装pefile,再运行fixNvpe.py
########(1).命令行下安装###########
pip install pefile
fixNvPe.py可以在这位大神的github主页上download:
下面是这个fixNvPe.py代码:
# Simple script to disable ASLR and make .nv_fatb sections read-only
# Requires: pefile ( python -m pip install pefile )
# Usage: fixNvPe.py --input path/to/*.dll
import argparse
import pefile
import glob
import os
import shutil
def main(args):
failures = []
for file in glob.glob( args.input, recursive=args.recursive ):
print(f"\n---\nChecking {file}...")
pe = pefile.PE(file, fast_load=True)
nvbSect = [ section for section in pe.sections if section.Name.decode().startswith(".nv_fatb")]
if len(nvbSect) == 1:
sect = nvbSect[0]
size = sect.Misc_VirtualSize
aslr = pe.OPTIONAL_HEADER.IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE
writable = 0 != ( sect.Characteristics & pefile.SECTION_CHARACTERISTICS['IMAGE_SCN_MEM_WRITE'] )
print(f"Found NV FatBin! Size: {size/1024/1024:0.2f}MB ASLR: {aslr} Writable: {writable}")
if (writable or aslr) and size > 0:
print("- Modifying DLL")
if args.backup:
bakFile = f"{file}_bak"
print(f"- Backing up [{file}] -> [{bakFile}]")
if os.path.exists( bakFile ):
print( f"- Warning: Backup file already exists ({bakFile}), not modifying file! Delete the 'bak' to allow modification")
failures.append( file )
continue
try:
shutil.copy2( file, bakFile)
except Exception as e:
print( f"- Failed to create backup! [{str(e)}], not modifying file!")
failures.append( file )
continue
# Disable ASLR for DLL, and disable writing for section
pe.OPTIONAL_HEADER.DllCharacteristics &= ~pefile.DLL_CHARACTERISTICS['IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE']
sect.Characteristics = sect.Characteristics & ~pefile.SECTION_CHARACTERISTICS['IMAGE_SCN_MEM_WRITE']
try:
newFile = f"{file}_mod"
print( f"- Writing modified DLL to [{newFile}]")
pe.write( newFile )
pe.close()
print( f"- Moving modified DLL to [{file}]")
os.remove( file )
shutil.move( newFile, file )
except Exception as e:
print( f"- Failed to write modified DLL! [{str(e)}]")
failures.append( file )
continue
print("\n\nDone!")
if len(failures) > 0:
print("***WARNING**** These files needed modification but failed: ")
for failure in failures:
print( f" - {failure}")
def parseArgs():
parser = argparse.ArgumentParser( description="Disable ASLR and make .nv_fatb sections read-only", formatter_class=argparse.ArgumentDefaultsHelpFormatter )
parser.add_argument('--input', help="Glob to parse", default="*.dll")
parser.add_argument('--backup', help="Backup modified files", default=True, required=False)
parser.add_argument('--recursive', '-r', default=False, action='store_true', help="Recurse into subdirectories")
return parser.parse_args()
###############################
# program entry point
#
if __name__ == "__main__":
args = parseArgs()
main( args )
运行结果如下所示:
---
Checking D:\anaconda\envs\py38cu112\Lib\site-packages\torch\lib\torch_cuda.dll...
Found NV FatBin! Size: 680.65MB ASLR: True Writable: True
- Modifying DLL
- Backing up [D:\anaconda\envs\py38cu112\Lib\site-packages\torch\lib\torch_cuda.dll] -> [D:\anaconda\envs\py38cu112\Lib\site-packages\torch\lib\torch_cuda.dll_bak]
- Writing modified DLL to [D:\anaconda\envs\py38cu112\Lib\site-packages\torch\lib\torch_cuda.dll_mod]
- Moving modified DLL to [D:\anaconda\envs\py38cu112\Lib\site-packages\torch\lib\torch_cuda.dll]
---
Checking D:\anaconda\envs\py38cu112\Lib\site-packages\torch\lib\torch_global_deps.dll...
---
Checking D:\anaconda\envs\py38cu112\Lib\site-packages\torch\lib\torch_python.dll...
---
Checking D:\anaconda\envs\py38cu112\Lib\site-packages\torch\lib\uv.dll...
Done!
记得一定要先关闭调用报错dll的pytorch程序,否则这个fixNvPe.py的程序在运行时有可能会因为pytorch正在占用该dll而修改失败!!!!!这时候fixNvPe.py最后运行结果会显示:
***WARNING**** These files needed modification but failed:
所以一定要先保证这些报错的dll没有在任何程序中被占用。
好了,就酱紫了。累了,还有下一个bug等着我去search。