win10 下源码编译Libtorch

最新推荐文章于 2024-06-29 15:03:46 发布

落花逐流水

最新推荐文章于 2024-06-29 15:03:46 发布

阅读量1.8k

点赞数

分类专栏： C++ pytorch

本文链接：https://blog.csdn.net/juluwangriyue/article/details/108423613

版权

本文介绍了如何在Win10系统下，源码编译Libtorch 1.5.1版本，以适配CUDA 10.0。由于1.5.1存在内存泄漏问题，作者选择了1.5.1版本进行编译。编译过程中，尝试了使用cmake+VS2017，但遇到了错误。最后，通过Python脚本成功编译，并详细记录了编译步骤及遇到的问题和解决方案，包括修改脚本中的模块路径、编译选项等。测试编译后的库能够正常运行，但注意编译的库可能对GPU算力有特定要求，对于不同GPU的兼容性需要特殊处理。

摘要由CSDN通过智能技术生成

重要说明：根据pytorch1.6.0中的bug fix说明，pytorch1.5.1有内存泄漏问题，具体说到了OpenMP模块和RReLU实现，所以请绕开pytorch1.5.1版本，换其他版本编译，流程类似。

很多时候使用官方的库没什么大问题，也很方便，但有时候也需要使用源码编写库。这里碰到一个
问题需要编写pytorch的libtorch库，记录一下大致步骤与问题；
libtorch1.2.0之后官网提供的预编译库libtorch1.4.0/libtorch1.5.0/libtorch1.5.1/libtorch1.6.0四个版本，只支持cu92、cu101、cu102。支持cu10.0的只有libtorch1.0.0/libtorch1.0.1/libtorch1.1.0/libtorch1.2.0四个版本。
libtorch1.2.0和libtorch1.5.1刚好有一些接口不一样，所以想使用cuda10.0的预编译库只能自己编译。

第一步：下载源码

因为最新版本此时已经是1.6.0了，而我想要用的是1.5.1所以使用下面的命令指定了一下分支。

git clone -b v1.5.1 --recursive https://github.com/pytorch/pytorch.git

第二步：cmake生成项目

本来还想按照教程一步一步来的，看到下载的源码文件夹下有CMakeLists.txt，就激动了，毕竟之前用过cmake，还是比较熟的。

虽然官网说要使用VS2017，但是我就是要使用VS2015试一把，结果项目是可以正确生成的，编译报了三万六千多个错误。

所以果断使用cmake+vs2017（配置了CUDA10.0），生成项目，然后编译x64-release版。

vs2017编译最终也以失败告终，一共18条错误，报错：

>D:\documents\vs2015\Project\pytorch1.5.1\build_libtorch\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp(737): error C2672: “convert_to_int_of_same_size”: 未找到匹配的重载函数

10>D:\documents\vs2015\Project\pytorch1.5.1\build_libtorch\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp(741): error C3536: “i_x_nearest”: 初始化之前无法使用

所有的错误都来自下面这一段代码：

    auto i_x_nearest = convert_to_int_of_same_size<scalar_t>(x_nearest);
    auto i_y_nearest = convert_to_int_of_same_size<scalar_t>(y_nearest);

    auto i_mask = must_in_bound ? iVec(-1)
                                : (i_x_nearest > iVec(-1)) & (i_x_nearest < iVec(inp_W)) &
                                  (i_y_nearest > iVec(-1)) & (i_y_nearest < iVec(inp_H));

    auto i_gInp_offset = i_y_nearest * iVec(inp_W) + i_x_nearest;  // gInp is contiguous

    integer_t mask_arr[iVec::size()];
    i_mask.store(mask_arr);
    integer_t gInp_offset_arr[iVec::size()];
    i_gInp_offset.store(gInp_offset_arr);

即\aten\src\ATen\native\cpu\GridSamplerKernel.cpp.AVX.cpp的737至749行。

继续尝试：

使用python脚本编译

在tools的同级目录新建build文件夹，使用anaconda prompt 并activate pytorch 环境，进入到build文件夹，执行下述命令：

python ../tools/build_libtorch.py

其中遇到一些问题，主要是python引用模块路径问题，主要修改了两个文件build_libtorch.py中

from tools.build_pytorch_libs import build_caffe2
from tools.setup_helpers.cmake import CMake

改为：

from build_pytorch_libs import build_caffe2
from setup_helpers.cmake import CMake

文件build_pytorch_libs.py中

from .setup_helpers.env import IS_64BIT, IS_WINDOWS, check_negative_env_flag
from .setup_helpers.cmake import USE_NINJA

改为：

from setup_helpers.env import IS_64BIT, IS_WINDOWS, check_negative_env_flag
from setup_helpers.cmake import USE_NINJA

最终使用脚本编译通过！

[2482/2501] Building NVCC (Device) object modules/detectro...etectron_ops_gpu_generated_select_smooth_l1_loss_op.cu.obj
select_smooth_l1_loss_op.cu
select_smooth_l1_loss_op.cu
[2483/2501] Building NVCC (Device) object modules/detectro...fe2_detectron_ops_gpu_generated_upsample_nearest_op.cu.obj
upsample_nearest_op.cu
upsample_nearest_op.cu
[2484/2501] Building NVCC (Device) object modules/detectro...2_detectron_ops_gpu_generated_sigmoid_focal_loss_op.cu.obj
sigmoid_focal_loss_op.cu
sigmoid_focal_loss_op.cu
[2485/2501] Building NVCC (Device) object modules/detectro...ron_ops_gpu_generated_sigmoid_cross_entropy_loss_op.cu.obj
sigmoid_cross_entropy_loss_op.cu
sigmoid_cross_entropy_loss_op.cu
[2486/2501] Building NVCC (Device) object modules/detectro...e2_detectron_ops_gpu_generated_spatial_narrow_as_op.cu.obj
spatial_narrow_as_op.cu
spatial_narrow_as_op.cu
[2487/2501] Building NVCC (Device) object modules/detectro...2_detectron_ops_gpu_generated_softmax_focal_loss_op.cu.obj
softmax_focal_loss_op.cu
softmax_focal_loss_op.cu
[2488/2501] Building NVCC (Device) object modules/detectro...affe2_detectron_ops_gpu_generated_smooth_l1_loss_op.cu.obj
smooth_l1_loss_op.cu
smooth_l1_loss_op.cu
[2500/2501] Linking CXX shared library bin\caffe2_detectron_ops_gpu.dll
  正在创建库 lib\caffe2_detectron_ops_gpu.lib 和对象 lib\caffe2_detectron_ops_gpu.exp
[2500/2501] Install the project...
-- Install configuration: "Release"

从build文件夹下的install_manifest.txt可以查看文件都被安装到了哪里。

由于install_manifest.txt内容较长不便贴在这里，直接总结结论，这里面记录了所有头文件都安装到目录:

/torch/include

所有库文件都安装到：

torch/lib

一些动态库安装到：

torch/bin

测试程序安装到：

torch/test

简单测试

将torch/test目录下的第一个测试程序AlgorithmsTest.exe拷贝到库目录 torch/lib目录下，在powershell下执行：

PS D:\documents\vs2015\Project\pytorch1.5.1\pytorch\torch\lib> .\AlgorithmsTest.exe
Running main() from ..\..\third_party\googletest\googletest\src\gtest_main.cc
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from DominatorTree
[ RUN      ] DominatorTree.Test1
[       OK ] DominatorTree.Test1 (0 ms)
[ RUN      ] DominatorTree.Test2
[       OK ] DominatorTree.Test2 (0 ms)
[----------] 2 tests from DominatorTree (1 ms total)

[----------] 2 tests from Subgraph
[ RUN      ] Subgraph.InduceEdges
[       OK ] Subgraph.InduceEdges (0 ms)
[ RUN      ] Subgraph.InduceEdgesCycle
[       OK ] Subgraph.InduceEdgesCycle (0 ms)
[----------] 2 tests from Subgraph (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (2 ms total)
[  PASSED  ] 4 tests.

又测试了一把，将/torch/bin目录下的test_api.exe拷贝到/torch/lib目录下，在powershell下执行，一共运行了855个测试，通过了852个，没有通过的3个是因为需要额外的数据，而我并没有提供这些数据。

下面只粘贴部分内容供参考：

> .\test_api.exe
Only one CUDA device detected. Disabling MultiCUDA tests
Note: Google Test filter = *-*_MultiCUDA
[==========] Running 855 tests from 36 test cases.
[----------] Global test environment set-up.
[----------] 7 tests from AutogradAPITests
[ RUN      ] AutogradAPITests.BackwardSimpleTest
[       OK ] AutogradAPITests.BackwardSimpleTest (3 ms)
[ RUN      ] AutogradAPITests.BackwardTest
[       OK ] AutogradAPITests.BackwardTest (0 ms)
[ RUN      ] AutogradAPITests.GradSimpleTest
[       OK ] AutogradAPITests.GradSimpleTest (0 ms)
[ RUN      ] AutogradAPITests.GradTest
[       OK ] AutogradAPITests.GradTest (1 ms)
[ RUN      ] AutogradAPITests.GradNonLeafTest
[       OK ] AutogradAPITests.GradNonLeafTest (1 ms)
[ RUN      ] AutogradAPITests.GradUnreachableTest
[       OK ] AutogradAPITests.GradUnreachableTest (1 ms)
[ RUN      ] AutogradAPITests.RetainGrad
[       OK ] AutogradAPITests.RetainGrad (1 ms)
[----------] 7 tests from AutogradAPITests (10 ms total)

[----------] 20 tests from CustomAutogradTest
[ RUN      ] CustomAutogradTest.CustomFunction
[       OK ] CustomAutogradTest.CustomFunction (1 ms)
[ RUN      ] CustomAutogradTest.FunctionReturnsInput
Warning: Deprecation warning: In a future PyTorch release torch.full will no longer return tensors of floating dtype by default. Instead, a bool fill_value will return a tensor of torch.bool dtype, and an integral fill_value will return a tensor of torch.long dtype. Set the optional `dtype` or `out` arguments to suppress this warning. (operator () at ..\..\aten\src\ATen\native\TensorFactories.cpp:361)
[       OK ] CustomAutogradTest.FunctionReturnsInput (1 ms)
[ RUN      ] CustomAutogradTest.NoGradCustomFunction
[       OK ] CustomAutogradTest.NoGradCustomFunction (0 ms)
[ RUN      ] CustomAutogradTest.MarkDirty
[       OK ] CustomAutogradTest.MarkDirty (0 ms)
[ RUN      ] CustomAutogradTest.MarkNonDifferentiable
[       OK ] CustomAutogradTest.MarkNonDifferentiable (1 ms)
[ RUN      ] CustomAutogradTest.MarkNonDifferentiableMixed
[       OK ] CustomAutogradTest.MarkNonDifferentiableMixed (0 ms)
[ RUN      ] CustomAutogradTest.MarkNonDifferentiableNone
[       OK ] CustomAutogradTest.MarkNonDifferentiableNone (0 ms)
[ RUN      ] CustomAutogradTest.ReturnLeafInplace
[       OK ] CustomAutogradTest.ReturnLeafInplace (1 ms)
[ RUN      ] CustomAutogradTest.ReturnDuplicateInplace
[       OK ] CustomAutogradTest.ReturnDuplicateInplace (0 ms)
[ RUN      ] CustomAutogradTest.ReturnDuplicate
[       OK ] CustomAutogradTest.ReturnDuplicate (0 ms)
[ RUN      ] CustomAutogradTest.SaveEmptyForBackward
[       OK ] CustomAutogradTest.SaveEmptyForBackward (0 ms)
[ RUN      ] CustomAutogradTest.InvalidGradients
[       OK ] CustomAutogradTest.InvalidGradients (1 ms)
[ RUN      ] CustomAutogradTest.NoGradInput
[       OK ] CustomAutogradTest.NoGradInput (0 ms)
[ RUN      ] CustomAutogradTest.TooManyGrads
[       OK ] CustomAutogradTest.TooManyGrads (0 ms)
[ RUN      ] CustomAutogradTest.DepNoGrad
[       OK ] CustomAutogradTest.DepNoGrad (0 ms)
[ RUN      ] CustomAutogradTest.Reentrant
[       OK ] CustomAutogradTest.Reentrant (0 ms)
[ RUN      ] CustomAutogradTest.DeepReentrant
[       OK ] CustomAutogradTest.DeepReentrant (449 ms)
[ RUN      ] CustomAutogradTest.ReentrantPriority
[       OK ] CustomAutogradTest.ReentrantPriority (0 ms)
[ RUN      ] CustomAutogradTest.Hooks
[       OK ] CustomAutogradTest.Hooks (1 ms)
[ RUN      ] CustomAutogradTest.HookNone
[       OK ] CustomAutogradTest.HookNone (0 ms)
[----------] 20 tests from CustomAutogradTest (481 ms total)

参考：https://oldpan.me/archives/pytorch-build-simple-instruction
https://blog.csdn.net/weixin_40448140/article/details/105345593

参考博客：windows下编写libtorch库

参考：ModuleNotFoundError: No module named '__main__.xxx'; '__main__' is not a package问题

其他问题

虽然使用脚本编译通过了，但是跟官网编译的还是不一样的，见上图红色框线中的部分（实际上用cmake编译没有编译过，此处只为了说明GPU算力问题，实际电脑显卡RTX2070），这个是跟硬件的算力相关的，这样编译好的库在算力没有达到这个值的显卡上是运行不起来的。但是从官网下载的库就没有这个限制，说明官网编译的库对这个参数有特殊设置。

在使用脚本编译时对应的是下面这一句（与上面的形成对比，下面是在1080Ti显卡的电脑中执行的显示结果）：

-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61

脚本编译输出信息（用于反推学习）

(pytorch) D:\documents\vs2017\pytorch1.5.1\pytorch>python ./tools/build_libtorch.py
cmake -GNinja -DBUILD_PYTHON=False -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=D:\documents\vs2017\pytorch1.5.1\pytorch\torch -DCMAKE_PREFIX_PATH=D:\Anaconda3\envs\pytorch\Lib\site-packages -DJAVA_HOME=C:\Program Files\Java\jdk1.8.0_181 -DNUMPY_INCLUDE_DIR=D:\Anaconda3\envs\pytorch\lib\site-packages\numpy\core\include -DPYTHON_EXECUTABLE=D:\Anaconda3\envs\pytorch\python.exe -DPYTHON_INCLUDE_DIR=D:\Anaconda3\envs\pytorch\include -DUSE_NUMPY=True D:\documents\vs2017\pytorch1.5.1\pytorch
-- The CXX compiler identification is MSVC 19.16.27042.0
-- The C compiler identification is MSVC 19.16.27042.0
-- Check for working CXX compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
-- Check for working CXX compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: D:/Program Files (x86)/Microsoft Visual Studio/2017/Community/VC/Tools/MSVC/14.16.27023/bin/Hostx64/x64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Not forcing any particular BLAS to be found
-- Performing Test COMPILER_WORKS
-- Performing Test COMPILER_WORKS - Success
-- Performing Test SUPPORT_GLIBCXX_USE_C99
-- Performing Test SUPPORT_GLIBCXX_USE_C99 - Success
-- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED
-- Performing Test CAFFE2_EXCEPTION_PTR_SUPPORTED - Success
-- std::exception_ptr is supported.
-- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING
-- Performing Test CAFFE2_NEED_TO_TURN_OFF_DEPRECATION_WARNING - Failed
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX2_EXTENSIONS - Success
-- Current compiler supports avx2 extension. Will build perfkernels.
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS
-- Performing Test CAFFE2_COMPILER_SUPPORTS_AVX512_EXTENSIONS - Success
-- Current compiler supports avx512f extension. Will build fbgemm.
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_VISIBILITY - Failed
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_SUPPORTS_HIDDEN_INLINE_VISIBILITY - Failed
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC
-- Performing Test COMPILER_SUPPORTS_RDYNAMIC - Failed
-- Building using own protobuf under third_party per request.
-- Use custom protobuf build.
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
-- Caffe2 protobuf include directory: $<BUILD_INTERFACE:D:/documents/vs2017/pytorch1.5.1/pytorch/third_party/protobuf/src>$<INSTALL_INTERFACE:include>
-- Trying to find preferred BLAS backend of choice: MKL
-- MKL_THREADING = OMP
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- MKL_THREADING = OMP
CMake Warning at cmake/Dependencies.cmake:141 (message):
  MKL could not be found.  Defaulting to Eigen
Call Stack (most recent call first):
  CMakeLists.txt:411 (include)


CMake Warning at cmake/Dependencies.cmake:159 (message):
  Preferred BLAS (MKL) cannot be found, now searching for a general BLAS
  library
Call Stack (most recent call first):
  CMakeLists.txt:411 (include)


-- MKL_THREADING = OMP
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - libiomp5md]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - libiomp5md]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_sequential - mkl_core]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - libiomp5md - pthread]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - libiomp5md - pthread]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - pthread]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - pthread]
--   Library mkl_intel: not found
-- Checking for [mkl - guide - pthread - m]
--   Library mkl: not found
-- MKL library not found
-- Checking for [Accelerate]
--   Library Accelerate: BLAS_Accelerate_LIBRARY-NOTFOUND
-- Checking for [vecLib]
--   Library vecLib: BLAS_vecLib_LIBRARY-NOTFOUND
-- Checking for [openblas]
--   Library openblas: BLAS_openblas_LIBRARY-NOTFOUND
-- Checking for

最低0.47元/天解锁文章

落花逐流水

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
打赏
1
评论
win10 下源码编译Libtorch

很多时候使用官方的库没什么大问题，也很方便，但有时候也需要使用源码编写库。这里碰到一个问题需要编写pytorch的libtorch库，记录一下大致步骤与问题；下载源码：从官方克隆最新的代码的时候要加入recursive这个参数，因为Pytorch本身需要很多的第三方库参与编译：git clone --recursive https://github.com/pytorch/pytorch 启动Anaconda虚拟环境（建议搭建一个纯净的虚拟环境），不知道如何搭建虚拟环境的，可以..
复制链接

扫一扫