重要说明:根据pytorch1.6.0中的bug fix说明,pytorch1.5.1有内存泄漏问题,具体说到了OpenMP模块和RReLU实现,所以请绕开pytorch1.5.1版本,换其他版本编译,流程类似。
-
很多时候使用官方的库没什么大问题,也很方便,但有时候也需要使用源码编写库。这里碰到一个
问题需要编写pytorch的libtorch库,记录一下大致步骤与问题; -
libtorch1.2.0之后官网提供的预编译库libtorch1.4.0/libtorch1.5.0/libtorch1.5.1/libtorch1.6.0四个版本,只支持cu92、cu101、cu102。支持cu10.0的只有libtorch1.0.0/libtorch1.0.1/libtorch1.1.0/libtorch1.2.0四个版本。
-
libtorch1.2.0和libtorch1.5.1刚好有一些接口不一样,所以想使用cuda10.0的预编译库只能自己编译。
第一步:下载源码
因为最新版本此时已经是1.6.0了,而我想要用的是1.5.1所以使用下面的命令指定了一下分支。
git clone -b v1.5.1 --recursive https://github.com/pytorch/pytorch.git
第二步:cmake生成项目
本来还想按照教程一步一步来的,看到下载的源码文件夹下有CMakeLists.txt,就激动了,毕竟之前用过cmake,还是比较熟的。
虽然官网说要使用VS2017,但是我就是要使用VS2015试一把,结果项目是可以正确生成的,编译报了三万六千多个错误。
所以果断使用cmake+vs2017(配置了CUDA10.0),生成项目,然后编译x64-release版。
vs2017编译最终也以失败告终,一共18条错误,报错:
>D:\documents\vs2015\Project\pytorch1.5.1\build_libtorch\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp(737): error C2672: “convert_to_int_of_same_size”: 未找到匹配的重载函数
10>D:\documents\vs2015\Project\pytorch1.5.1\build_libtorch\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp(741): error C3536: “i_x_nearest”: 初始化之前无法使用
所有的错误都来自下面这一段代码:
auto i_x_nearest = convert_to_int_of_same_size<scalar_t>(x_nearest);
auto i_y_nearest = convert_to_int_of_same_size<scalar_t>(y_nearest);
auto i_mask = must_in_bound ? iVec(-1)
: (i_x_nearest > iVec(-1)) & (i_x_nearest < iVec(inp_W)) &
(i_y_nearest > iVec(-1)) & (i_y_nearest < iVec(inp_H));
auto i_gInp_offset = i_y_nearest * iVec(inp_W) + i_x_nearest; // gInp is contiguous
integer_t mask_arr[iVec::size()];
i_mask.store(mask_arr);
integer_t gInp_offset_arr[iVec::size()];
i_gInp_offset.store(gInp_offset_arr);
即\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp的737至749行。
继续尝试:
使用python脚本编译
在tools的同级目录新建build文件夹,使用anaconda prompt 并activate pytorch 环境,进入到build文件夹,执行下述命令:
python ../tools/build_libtorch.py
其中遇到一些问题,主要是python引用模块路径问题,主要修改了两个文件build_libtorch.py中
from tools.build_pytorch_libs import build_caffe2
from tools.setup_helpers.cmake import CMake
改为:
from build_pytorch_libs import build_caffe2
from setup_helpers.cmake import CMake
文件build_pytorch_libs.py中
from .setup_helpers.env import IS_64BIT, IS_WINDOWS, check_negative_env_flag
from .setup_helpers.cmake import USE_NINJA
改为:
from setup_helpers.env import IS_64BIT, IS_WINDOWS, check_negative_env_flag
from setup_helpers.cmake import USE_NINJA
最终使用脚本编译通过!
[2482/2501] Building NVCC (Device) object modules/detectro...etectron_ops_gpu_generated_select_smooth_l1_loss_op.cu.obj
select_smooth_l1_loss_op.cu
select_smooth_l1_loss_op.cu
[2483/2501] Building NVCC (Device) object modules/detectro...fe2_detectron_ops_gpu_generated_upsample_nearest_op.cu.obj
upsample_nearest_op.cu
upsample_nearest_op.cu
[2484/2501] Building NVCC (Device) object modules/detectro...2_detectron_ops_gpu_generated_sigmoid_focal_loss_op.cu.obj
sigmoid_focal_loss_op.cu
sigmoid_focal_loss_op.cu
[2485/2501] Building NVCC (Device) object modules/detectro...ron_ops_gpu_generated_sigmoid_cross_entropy_loss_op.cu.obj
sigmoid_cross_entropy_loss_op.cu
sigmoid_cross_entropy_loss_op.cu
[2486/2501] Building NVCC (Device) object modules/detectro...e2_detectron_ops_gpu_generated_spatial_narrow_as_op.cu.obj
spatial_narrow_as_op.cu
spatial_narrow_as_op.cu
[2487/2501] Building NVCC (Device) object modules/detectro...2_detectron_ops_gpu_generated_softmax_focal_loss_op.cu.obj
softmax_focal_loss_op.cu
softmax_focal_loss_op.cu
[2488/2501] Building NVCC (Device) object modules/detectro...affe2_detectron_ops_gpu_generated_smooth_l1_loss_op.cu.obj
smooth_l1_loss_op.cu
smooth_l1_loss_op.cu
[2500/2501] Linking CXX shared library bin\caffe2_detectron_ops_gpu.dll
正在创建库 lib\caffe2_detectron_ops_gpu.lib 和对象 lib\caffe2_detectron_ops_gpu.exp
[2500/2501] Install the project...
-- Install configuration: "Release"
从build文件夹下的install_manifest.txt可以查看文件都被安装到了哪里。
由于install_manifest.txt内容较长不便贴在这里,直接总结结论,这里面记录了所有头文件都安装到目录:
/torch/include
所有库文件都安装到:
torch/lib
一些动态库安装到:
torch/bin
测试程序安装到:
torch/test
简单测试
将torch/test目录下的第一个测试程序AlgorithmsTest.exe拷贝到库目录 torch/lib目录下,在powershell下执行:
PS D:\documents\vs2015\Project\pytorch1.5.1\pytorch\torch\lib> .\AlgorithmsTest.exe
Running main() from ..\..\third_party\googletest\googletest\src\gtest_main.cc
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from DominatorTree
[ RUN ] DominatorTree.Test1
[ OK ] DominatorTree.Test1 (0 ms)
[ RUN ] DominatorTree.Test2
[ OK ] DominatorTree.Test2 (0 ms)
[----------] 2 tests from DominatorTree (1 ms total)
[----------] 2 tests from Subgraph
[ RUN ] Subgraph.InduceEdges
[ OK ] Subgraph.InduceEdges (0 ms)
[ RUN ] Subgraph.InduceEdgesCycle
[ OK ] Subgraph.InduceEdgesCycle (0 ms)
[----------] 2 tests from Subgraph (1 ms total)
[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (2 ms total)
[ PASSED ] 4 tests.
又测试了一把,将/torch/bin目录下的test_api.exe拷贝到/torch/lib目录下,在powershell下执行,一共运行了855个测试,通过了852个,没有通过的3个是因为需要额外的数据,而我并没有提供这些数据。
下面只粘贴部分内容供参考:
> .\test_api.exe
Only one CUDA device detected. Disabling MultiCUDA tests
Note: Google Test filter = *-*_MultiCUDA
[==========] Running 855 tests from 36 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from AutogradAPITests
[ RUN ] AutogradAPITests.BackwardSimpleTest
[ OK ] AutogradAPITests.BackwardSimpleTest (3 ms)
[ RUN ] AutogradAPITests.BackwardTest
[ OK ] AutogradAPITests.BackwardTest (0 ms)
[ RUN ] AutogradAPITests.GradSimpleTest
[ OK ] AutogradAPITests.GradSimpleTest (0 ms)
[ RUN ] AutogradAPITests.GradTest
[ OK ] AutogradAPITests.GradTest (1 ms)
[ RUN ] AutogradAPITests.GradNonLeafTest
[ OK ] AutogradAPITests.GradNonLeafTest (1 ms)
[ RUN ] AutogradAPITests.GradUnreachableTest
[ OK ] AutogradAPITests.GradUnreachableTest (1 ms)
[ RUN ] AutogradAPITests.RetainGrad
[ OK ] AutogradAPITests.RetainGrad (1 ms)
[----------] 7 tests from AutogradAPITests (10 ms total)
[----------] 20 tests from CustomAutogradTest
[ RUN ] CustomAutogradTest.CustomFunction
[ OK ] CustomAutogradTest.CustomFunction (1 ms)
[ RUN ] CustomAutogradTest.FunctionReturnsInput
Warning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (operator () at ..\..\aten\src\ATen\native\TensorFactories.cpp:361)
[ OK ] CustomAutogradTest.FunctionReturnsInput (1 ms)
[ RUN ] CustomAutogradTest.NoGradCustomFunction
[ OK ] CustomAutogradTest.NoGradCustomFunction (0 ms)
[ RUN ] CustomAutogradTest.MarkDirty
[ OK ] CustomAutogradTest.MarkDirty (0 ms)
[ RUN ] CustomAutogradTest.MarkNonDifferentiable
[ OK ] CustomAutogradTest.MarkNonDifferentiable (1 ms)
[ RUN ] CustomAutogradTest.MarkNonDifferentiableMixed
[ OK ] CustomAutogradTest.MarkNonDifferentiableMixed (0 ms)
[ RUN ] CustomAutogradTest.MarkNonDifferentiableNone
[ OK ] CustomAutogradTest.MarkNonDifferentiableNone (0 ms)
[ RUN ] CustomAutogradTest.ReturnLeafInplace
[ OK ] CustomAutogradTest.ReturnLeafInplace (1 ms)
[ RUN ] CustomAutogradTest.ReturnDuplicateInplace
[ OK ] CustomAutogradTest.ReturnDuplicateInplace (0 ms)
[ RUN ] CustomAutogradTest.ReturnDuplicate
[ OK ] CustomAutogradTest.ReturnDuplicate (0 ms)
[ RUN ] CustomAutogradTest.SaveEmptyForBackward
[ OK ] CustomAutogradTest.SaveEmptyForBackward (0 ms)
[ RUN ] CustomAutogradTest.InvalidGradients
[ OK ] CustomAutogradTest.InvalidGradients (1 ms)
[ RUN ] CustomAutogradTest.NoGradInput
[ OK ] CustomAutogradTest.NoGradInput (0 ms)
[ RUN ] CustomAutogradTest.TooManyGrads
[ OK ] CustomAutogradTest.TooManyGrads (0 ms)
[ RUN ] CustomAutogradTest.DepNoGrad
[ OK ] CustomAutogradTest.DepNoGrad (0 ms)
[ RUN ] CustomAutogradTest.Reentrant
[ OK ] CustomAutogradTest.Reentrant (0 ms)
[ RUN ] CustomAutogradTest.DeepReentrant
[ OK ] CustomAutogradTest.DeepReentrant (449 ms)
[ RUN ] CustomAutogradTest.ReentrantPriority
[ OK ] CustomAutogradTest.ReentrantPriority (0 ms)
[ RUN ] CustomAutogradTest.Hooks
[ OK ] CustomAutogradTest.Hooks (1 ms)
[ RUN ] CustomAutogradTest.HookNone
[ OK ] CustomAutogradTest.HookNone (0 ms)
[----------] 20 tests from CustomAutogradTest (481 ms total)
参考:https://oldpan.me/archives/pytorch-build-simple-instruction
https://blog.csdn.net/weixin_40448140/article/details/105345593
参考博客:windows下编写libtorch库
参考:ModuleNotFoundError: No module named '__main__.xxx'; '__main__' is not a package问题
其他问题
虽然使用脚本编译通过了,但是跟官网编译的还是不一样的,见上图红色框线中的部分(实际上用cmake编译没有编译过,此处只为了说明GPU算力问题,实际电脑显卡RTX2070),这个是跟硬件的算力相关的,这样编译好的库在算力没有达到这个值的显卡上是运行不起来的。但是从官网下载的库就没有这个限制,说明官网编译的库对这个参数有特殊设置。
在使用脚本编译时对应的是下面这一句(与上面的形成对比,下面是在1080Ti显卡的电脑中执行的显示结果):
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
脚本编译输出信息(用于反推学习)
(pytorch) D:\documents\vs2017\pytorch1.5.1\pytorch>python ./tools/build_libtorch.py
cmake -GNinja -DBUILD_PYTHON=False -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=D:\documents\vs2017\pytorch1.5.1\pytorch\torch -DCMAKE_PREFIX_PATH=D:\Anaconda3\envs\pytorch\Lib\site-packages -DJAVA_HOME=C:\Program Files\Java\jdk1.8.0_181 -DNUMPY_INCLUDE_DIR=D:\Anaconda3\envs\pytorch\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=D:\Anaconda3\envs\pytorch\python.exe -DPYTHON_INCLUDE_DIR=D:\Anaconda3\envs\pytorch\include -DUSE_NUMPY=True D:\documents\vs2017\pytorch1.5.1\pytorch
-- The CXX compiler identification is MSVC 19.16.27042.0
-- The C compiler identification is MSVC 19.16.27042.0
-- Check for working CXX compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
-- Check for working CXX compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Not forcing any particular BLAS to be found
-- Performing Test COMPILER_WORKS
-- Performing Test COMPILER_WORKS - Success
-- Performing Test SUPPORT_GLIBCXX_USE_C99
-- Performing Test SUPPORT_GLIBCXX_USE_C99 - Success
-- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED
-- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED - Success
-- std::exception_ptr is supported.
-- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING
-- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS - Success
-- Current compiler supports avx2 extension. Will build perfkernels.
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Success
-- Current compiler supports avx512f extension. Will build fbgemm.
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Failed
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Failed
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Failed
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- Caffe2 protobuf include directory: $<BUILD_INTERFACE:D:/documents/vs2017/pytorch1.5.1/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
-- Trying to find preferred BLAS backend of choice: MKL
-- MKL_THREADING = OMP
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- MKL_THREADING = OMP
CMake Warning at cmake/Dependencies.cmake:141 (message):
MKL could not be found. Defaulting to Eigen
Call Stack (most recent call first):
CMakeLists.txt:411 (include)
CMake Warning at cmake/Dependencies.cmake:159 (message):
Preferred BLAS (MKL) cannot be found, now searching for a general BLAS
library
Call Stack (most recent call first):
CMakeLists.txt:411 (include)
-- MKL_THREADING = OMP
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md]
-- Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md]
-- Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core]
-- Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core]
-- Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core]
-- Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_sequential - mkl_core]
-- Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread]
-- Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - libiomp5md - pthread]
-- Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - pthread]
-- Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - pthread]
-- Library mkl_intel: not found
-- Checking for [mkl - guide - pthread - m]
-- Library mkl: not found
-- MKL library not found
-- Checking for [Accelerate]
-- Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
-- Checking for [vecLib]
-- Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
-- Checking for [openblas]
-- Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
-- Checking for