【亲测解决】TensorRT_LLM连续坑,Could not build wheels for mpi4py, which is required to install pyproject.toml

【计划昵称全网统一,代码随想随记,知乎无法立即修改,,】
微信公众号:leetcode_algos_life,代码随想随记
小红书:412408155
CSDN:https://blog.csdn.net/woai8339?type=blog ,代码随想随记
GitHub: https://github.com/riverind
抖音【暂未开始,计划开始】:tian72530,代码随想随记
知乎【暂未开始,计划开始】:happy001

【背景】
安装tensorRT-LLM,报错,报错信息如下:

 collect2: error: ld returned 1 exit status
      failure.
      removing: _configtest.c _configtest.o
      error: Cannot link MPI programs. Check your configuration!!!
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for mpi4py
Failed to build mpi4py
ERROR: Could not build wheels for mpi4py, which is required to install pyproject.toml-based projects

【解决方案】
踩坑路开始了,

1、上述问题通过下属命令解决

conda install mpi4py
然后再执行,参考,https://nvidia.github.io/TensorRT-LLM/installation/linux.html
# Install dependencies, TensorRT-LLM requires Python 3.10
apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev git git-lfs

# Install the latest preview version (corresponding to the main branch) of TensorRT-LLM.
# If you want to install the stable version (corresponding to the release branch), please
# remove the `--pre` option.
pip3 install tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com

# Check installation
python3 -c "import tensorrt_llm"

第三步Check installation报错,报错信息如下:

*** The MPI_Comm_split_type() function was called before MPI_INIT was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[bm-220341k:592461] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

踩坑路开始,

# 安装MPI并行计算工具,参考,https://webpages.charlotte.edu/abw/coit-grid01.uncc.edu/ParallelProgSoftware/Software/OpenMPIInstall.pdf
sudo apt-get update 
sudo apt install build-essential 
sudo apt-get install openmpi-bin openmpi-doc libopenmpi-dev
# check
which mpicc
which mpiexec 
# 前方高能,version错误
mpicc --version
mpiexec –version 

报错信息如下:

mpicc --version
bin/mpicc: line 285: x86_64-conda_cos6-linux-gnu-cc: command not found
mpiexec –version
HYDU_create_process (utils/launch/launch.c:74): execvp error on file –version (No such file or directory)

解决方案

# 针对,mpicc --version出现的问题,解决
conda install -c conda-forge gfortran

再检查下,可以运行了

mpicc --version
gcc (Anaconda gcc) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.

mpiexec –version
--------------------------------------------------------------------------
mpiexec has detected an attempt to run as root.

再次执行

python3 -c "import tensorrt_llm"

报错,报错信息如下:

ImportError: libmpi.so.12: cannot open shared object file: No such file or directory

【解决方案】

在当前环境下安装openmpi,
conda install openmpi

然后,查找libmpi.so.12的路径,find / -name libmpi.so.12,找到后,将其软链接到/usr/lib下:
ln -s find的libmpi.so.12路径 /usr/local/lib/libmpi.so.12

再次检查,

python3 -c "import tensorrt_llm"

成功运行,显示信息如下:

[TensorRT-LLM] TensorRT-LLM version: 0.11.0
  • 2
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值