win10 下源码编译Libtorch

本文介绍了如何在Win10系统下,源码编译Libtorch 1.5.1版本,以适配CUDA 10.0。由于1.5.1存在内存泄漏问题,作者选择了1.5.1版本进行编译。编译过程中,尝试了使用cmake+VS2017,但遇到了错误。最后,通过Python脚本成功编译,并详细记录了编译步骤及遇到的问题和解决方案,包括修改脚本中的模块路径、编译选项等。测试编译后的库能够正常运行,但注意编译的库可能对GPU算力有特定要求,对于不同GPU的兼容性需要特殊处理。
摘要由CSDN通过智能技术生成

重要说明:根据pytorch1.6.0中的bug fix说明,pytorch1.5.1有内存泄漏问题,具体说到了OpenMP模块和RReLU实现,所以请绕开pytorch1.5.1版本,换其他版本编译,流程类似。

  1. 很多时候使用官方的库没什么大问题,也很方便,但有时候也需要使用源码编写库。这里碰到一个
    问题需要编写pytorch的libtorch库,记录一下大致步骤与问题;

  2. libtorch1.2.0之后官网提供的预编译库libtorch1.4.0/libtorch1.5.0/libtorch1.5.1/libtorch1.6.0四个版本,只支持cu92、cu101、cu102。支持cu10.0的只有libtorch1.0.0/libtorch1.0.1/libtorch1.1.0/libtorch1.2.0四个版本。

  3. libtorch1.2.0和libtorch1.5.1刚好有一些接口不一样,所以想使用cuda10.0的预编译库只能自己编译。

 

第一步:下载源码

因为最新版本此时已经是1.6.0了,而我想要用的是1.5.1所以使用下面的命令指定了一下分支。

git clone -b v1.5.1 --recursive https://github.com/pytorch/pytorch.git

第二步:cmake生成项目

本来还想按照教程一步一步来的,看到下载的源码文件夹下有CMakeLists.txt,就激动了,毕竟之前用过cmake,还是比较熟的。

虽然官网说要使用VS2017,但是我就是要使用VS2015试一把,结果项目是可以正确生成的,编译报了三万六千多个错误。

所以果断使用cmake+vs2017(配置了CUDA10.0),生成项目,然后编译x64-release版。

vs2017编译最终也以失败告终,一共18条错误,报错:

>D:\documents\vs2015\Project\pytorch1.5.1\build_libtorch\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp(737): error C2672: “convert_to_int_of_same_size”: 未找到匹配的重载函数

10>D:\documents\vs2015\Project\pytorch1.5.1\build_libtorch\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp(741): error C3536: “i_x_nearest”: 初始化之前无法使用

所有的错误都来自下面这一段代码:

    auto i_x_nearest = convert_to_int_of_same_size<scalar_t>(x_nearest);
    auto i_y_nearest = convert_to_int_of_same_size<scalar_t>(y_nearest);

    auto i_mask = must_in_bound ? iVec(-1)
                                : (i_x_nearest > iVec(-1)) & (i_x_nearest < iVec(inp_W)) &
                                  (i_y_nearest > iVec(-1)) & (i_y_nearest < iVec(inp_H));

    auto i_gInp_offset = i_y_nearest * iVec(inp_W) + i_x_nearest;  // gInp is contiguous

    integer_t mask_arr[iVec::size()];
    i_mask.store(mask_arr);
    integer_t gInp_offset_arr[iVec::size()];
    i_gInp_offset.store(gInp_offset_arr);

即\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp的737至749行。

继续尝试:

使用python脚本编译

在tools的同级目录新建build文件夹,使用anaconda prompt 并activate pytorch 环境,进入到build文件夹,执行下述命令:

python ../tools/build_libtorch.py

其中遇到一些问题,主要是python引用模块路径问题,主要修改了两个文件build_libtorch.py中

from tools.build_pytorch_libs import build_caffe2
from tools.setup_helpers.cmake import CMake

改为:

from build_pytorch_libs import build_caffe2
from setup_helpers.cmake import CMake

文件build_pytorch_libs.py中

from .setup_helpers.env import IS_64BIT, IS_WINDOWS, check_negative_env_flag
from .setup_helpers.cmake import USE_NINJA

改为:

from setup_helpers.env import IS_64BIT, IS_WINDOWS, check_negative_env_flag
from setup_helpers.cmake import USE_NINJA

最终使用脚本编译通过!

[2482/2501] Building NVCC (Device) object modules/detectro...etectron_ops_gpu_generated_select_smooth_l1_loss_op.cu.obj
select_smooth_l1_loss_op.cu
select_smooth_l1_loss_op.cu
[2483/2501] Building NVCC (Device) object modules/detectro...fe2_detectron_ops_gpu_generated_upsample_nearest_op.cu.obj
upsample_nearest_op.cu
upsample_nearest_op.cu
[2484/2501] Building NVCC (Device) object modules/detectro...2_detectron_ops_gpu_generated_sigmoid_focal_loss_op.cu.obj
sigmoid_focal_loss_op.cu
sigmoid_focal_loss_op.cu
[2485/2501] Building NVCC (Device) object modules/detectro...ron_ops_gpu_generated_sigmoid_cross_entropy_loss_op.cu.obj
sigmoid_cross_entropy_loss_op.cu
sigmoid_cross_entropy_loss_op.cu
[2486/2501] Building NVCC (Device) object modules/detectro...e2_detectron_ops_gpu_generated_spatial_narrow_as_op.cu.obj
spatial_narrow_as_op.cu
spatial_narrow_as_op.cu
[2487/2501] Building NVCC (Device) object modules/detectro...2_detectron_ops_gpu_generated_softmax_focal_loss_op.cu.obj
softmax_focal_loss_op.cu
softmax_focal_loss_op.cu
[2488/2501] Building NVCC (Device) object modules/detectro...affe2_detectron_ops_gpu_generated_smooth_l1_loss_op.cu.obj
smooth_l1_loss_op.cu
smooth_l1_loss_op.cu
[2500/2501] Linking CXX shared library bin\caffe2_detectron_ops_gpu.dll
  正在创建库 lib\caffe2_detectron_ops_gpu.lib 和对象 lib\caffe2_detectron_ops_gpu.exp
[2500/2501] Install the project...
-- Install configuration: "Release"

从build文件夹下的install_manifest.txt可以查看文件都被安装到了哪里。

由于install_manifest.txt内容较长不便贴在这里,直接总结结论,这里面记录了所有头文件都安装到目录:

/torch/include

所有库文件都安装到:

torch/lib

一些动态库安装到:

torch/bin

测试程序安装到:

torch/test

简单测试

将torch/test目录下的第一个测试程序AlgorithmsTest.exe拷贝到库目录 torch/lib目录下,在powershell下执行:

PS D:\documents\vs2015\Project\pytorch1.5.1\pytorch\torch\lib> .\AlgorithmsTest.exe
Running main() from ..\..\third_party\googletest\googletest\src\gtest_main.cc
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from DominatorTree
[ RUN      ] DominatorTree.Test1
[       OK ] DominatorTree.Test1 (0 ms)
[ RUN      ] DominatorTree.Test2
[       OK ] DominatorTree.Test2 (0 ms)
[----------] 2 tests from DominatorTree (1 ms total)

[----------] 2 tests from Subgraph
[ RUN      ] Subgraph.InduceEdges
[       OK ] Subgraph.InduceEdges (0 ms)
[ RUN      ] Subgraph.InduceEdgesCycle
[       OK ] Subgraph.InduceEdgesCycle (0 ms)
[----------] 2 tests from Subgraph (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (2 ms total)
[  PASSED  ] 4 tests.

又测试了一把,将/torch/bin目录下的test_api.exe拷贝到/torch/lib目录下,在powershell下执行,一共运行了855个测试,通过了852个,没有通过的3个是因为需要额外的数据,而我并没有提供这些数据。

下面只粘贴部分内容供参考:

> .\test_api.exe
Only one CUDA device detected. Disabling MultiCUDA tests
Note: Google Test filter = *-*_MultiCUDA
[==========] Running 855 tests from 36 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from AutogradAPITests
[ RUN      ] AutogradAPITests.BackwardSimpleTest
[       OK ] AutogradAPITests.BackwardSimpleTest (3 ms)
[ RUN      ] AutogradAPITests.BackwardTest
[       OK ] AutogradAPITests.BackwardTest (0 ms)
[ RUN      ] AutogradAPITests.GradSimpleTest
[       OK ] AutogradAPITests.GradSimpleTest (0 ms)
[ RUN      ] AutogradAPITests.GradTest
[       OK ] AutogradAPITests.GradTest (1 ms)
[ RUN      ] AutogradAPITests.GradNonLeafTest
[       OK ] AutogradAPITests.GradNonLeafTest (1 ms)
[ RUN      ] AutogradAPITests.GradUnreachableTest
[       OK ] AutogradAPITests.GradUnreachableTest (1 ms)
[ RUN      ] AutogradAPITests.RetainGrad
[       OK ] AutogradAPITests.RetainGrad (1 ms)
[----------] 7 tests from AutogradAPITests (10 ms total)

[----------] 20 tests from CustomAutogradTest
[ RUN      ] CustomAutogradTest.CustomFunction
[       OK ] CustomAutogradTest.CustomFunction (1 ms)
[ RUN      ] CustomAutogradTest.FunctionReturnsInput
Warning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (operator () at ..\..\aten\src\ATen\native\TensorFactories.cpp:361)
[       OK ] CustomAutogradTest.FunctionReturnsInput (1 ms)
[ RUN      ] CustomAutogradTest.NoGradCustomFunction
[       OK ] CustomAutogradTest.NoGradCustomFunction (0 ms)
[ RUN      ] CustomAutogradTest.MarkDirty
[       OK ] CustomAutogradTest.MarkDirty (0 ms)
[ RUN      ] CustomAutogradTest.MarkNonDifferentiable
[       OK ] CustomAutogradTest.MarkNonDifferentiable (1 ms)
[ RUN      ] CustomAutogradTest.MarkNonDifferentiableMixed
[       OK ] CustomAutogradTest.MarkNonDifferentiableMixed (0 ms)
[ RUN      ] CustomAutogradTest.MarkNonDifferentiableNone
[       OK ] CustomAutogradTest.MarkNonDifferentiableNone (0 ms)
[ RUN      ] CustomAutogradTest.ReturnLeafInplace
[       OK ] CustomAutogradTest.ReturnLeafInplace (1 ms)
[ RUN      ] CustomAutogradTest.ReturnDuplicateInplace
[       OK ] CustomAutogradTest.ReturnDuplicateInplace (0 ms)
[ RUN      ] CustomAutogradTest.ReturnDuplicate
[       OK ] CustomAutogradTest.ReturnDuplicate (0 ms)
[ RUN      ] CustomAutogradTest.SaveEmptyForBackward
[       OK ] CustomAutogradTest.SaveEmptyForBackward (0 ms)
[ RUN      ] CustomAutogradTest.InvalidGradients
[       OK ] CustomAutogradTest.InvalidGradients (1 ms)
[ RUN      ] CustomAutogradTest.NoGradInput
[       OK ] CustomAutogradTest.NoGradInput (0 ms)
[ RUN      ] CustomAutogradTest.TooManyGrads
[       OK ] CustomAutogradTest.TooManyGrads (0 ms)
[ RUN      ] CustomAutogradTest.DepNoGrad
[       OK ] CustomAutogradTest.DepNoGrad (0 ms)
[ RUN      ] CustomAutogradTest.Reentrant
[       OK ] CustomAutogradTest.Reentrant (0 ms)
[ RUN      ] CustomAutogradTest.DeepReentrant
[       OK ] CustomAutogradTest.DeepReentrant (449 ms)
[ RUN      ] CustomAutogradTest.ReentrantPriority
[       OK ] CustomAutogradTest.ReentrantPriority (0 ms)
[ RUN      ] CustomAutogradTest.Hooks
[       OK ] CustomAutogradTest.Hooks (1 ms)
[ RUN      ] CustomAutogradTest.HookNone
[       OK ] CustomAutogradTest.HookNone (0 ms)
[----------] 20 tests from CustomAutogradTest (481 ms total)

 

参考:https://oldpan.me/archives/pytorch-build-simple-instruction
https://blog.csdn.net/weixin_40448140/article/details/105345593

参考博客:windows下编写libtorch库

参考:ModuleNotFoundError: No module named '__main__.xxx'; '__main__' is not a package问题

其他问题

虽然使用脚本编译通过了,但是跟官网编译的还是不一样的,见上图红色框线中的部分(实际上用cmake编译没有编译过,此处只为了说明GPU算力问题,实际电脑显卡RTX2070),这个是跟硬件的算力相关的,这样编译好的库在算力没有达到这个值的显卡上是运行不起来的。但是从官网下载的库就没有这个限制,说明官网编译的库对这个参数有特殊设置。

在使用脚本编译时对应的是下面这一句(与上面的形成对比,下面是在1080Ti显卡的电脑中执行的显示结果):

-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61

脚本编译输出信息(用于反推学习)

(pytorch) D:\documents\vs2017\pytorch1.5.1\pytorch>python ./tools/build_libtorch.py
cmake -GNinja -DBUILD_PYTHON=False -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=D:\documents\vs2017\pytorch1.5.1\pytorch\torch -DCMAKE_PREFIX_PATH=D:\Anaconda3\envs\pytorch\Lib\site-packages -DJAVA_HOME=C:\Program Files\Java\jdk1.8.0_181 -DNUMPY_INCLUDE_DIR=D:\Anaconda3\envs\pytorch\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=D:\Anaconda3\envs\pytorch\python.exe -DPYTHON_INCLUDE_DIR=D:\Anaconda3\envs\pytorch\include -DUSE_NUMPY=True D:\documents\vs2017\pytorch1.5.1\pytorch
-- The CXX compiler identification is MSVC 19.16.27042.0
-- The C compiler identification is MSVC 19.16.27042.0
-- Check for working CXX compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
-- Check for working CXX compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Not forcing any particular BLAS to be found
-- Performing Test COMPILER_WORKS
-- Performing Test COMPILER_WORKS - Success
-- Performing Test SUPPORT_GLIBCXX_USE_C99
-- Performing Test SUPPORT_GLIBCXX_USE_C99 - Success
-- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED
-- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED - Success
-- std::exception_ptr is supported.
-- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING
-- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS - Success
-- Current compiler supports avx2 extension. Will build perfkernels.
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Success
-- Current compiler supports avx512f extension. Will build fbgemm.
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Failed
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Failed
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Failed
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- Caffe2 protobuf include directory: $<BUILD_INTERFACE:D:/documents/vs2017/pytorch1.5.1/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
-- Trying to find preferred BLAS backend of choice: MKL
-- MKL_THREADING = OMP
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- MKL_THREADING = OMP
CMake Warning at cmake/Dependencies.cmake:141 (message):
  MKL could not be found.  Defaulting to Eigen
Call Stack (most recent call first):
  CMakeLists.txt:411 (include)


CMake Warning at cmake/Dependencies.cmake:159 (message):
  Preferred BLAS (MKL) cannot be found, now searching for a general BLAS
  library
Call Stack (most recent call first):
  CMakeLists.txt:411 (include)


-- MKL_THREADING = OMP
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_sequential - mkl_core]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - libiomp5md - pthread]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - pthread]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - pthread]
--   Library mkl_intel: not found
-- Checking for [mkl - guide - pthread - m]
--   Library mkl: not found
-- MKL library not found
-- Checking for [Accelerate]
--   Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
-- Checking for [vecLib]
--   Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
-- Checking for [openblas]
--   Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
-- Checking for 
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

落花逐流水

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值