ERROR: Failed building wheel for pyarrow

问题描述

安装HuggingFace datasets时出现报错

系统:MacOS 10.13.6

环境:Conda虚拟环境,python==3.8.1

命令:

pip install datasets

报错信息:

CMake Error at CMakeLists.txt:268 (find_package):
        By not providing "FindArrow.cmake" in CMAKE_MODULE_PATH this project has
        asked CMake to find a package configuration file provided by "Arrow", but
        CMake did not find one.
      
        Could not find a package configuration file provided by "Arrow" with any of
        the following names:
      
          ArrowConfig.cmake
          arrow-config.cmake
      
        Add the installation prefix of "Arrow" to CMAKE_PREFIX_PATH or set
        "Arrow_DIR" to a directory containing one of the above files.  If "Arrow"
        provides a separate development package or SDK, be sure it has been
        installed.
      
      
      -- Configuring incomplete, errors occurred!
      See also "/private/var/folders/sd/6d0w7lz121v38498dngh6y540000gn/T/pip-install-ewqnh087/pyarrow_673989b028794d389cba544b08d75516/build/temp.macosx-10.9-x86_64-cpython-38/CMakeFiles/CMakeOutput.log".
      error: command '/usr/local/bin/cmake' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pyarrow
Failed to build pyarrow
ERROR: Could not build wheels for pyarrow, which is required to install pyproject.toml-based projects

解决方案

使用Conda虚拟环境,先拉取测试数据,配置环境变量

$ cd /Users/../anaconda3/envs/env_name # 先定位到虚拟环境目录
$ git clone https://github.com/apache/arrow.git 
$ pushd arrow
$ git submodule update --init
$ export PARQUET_TEST_DATA="${PWD}/cpp/submodules/parquet-testing/data"
$ export ARROW_TEST_DATA="${PWD}/testing/data"
$ popd

conda-forge安装Arrow C++和PyArrow的依赖,但是报错`CondaValueError: Malformed version string '~': invalid character(s).`

$ conda activate env_name # 激活虚拟环境
$ conda install -c conda-forge \
       --file arrow/ci/conda_env_unix.txt \
       --file arrow/ci/conda_env_cpp.txt \
       --file arrow/ci/conda_env_python.txt \
       --file arrow/ci/conda_env_gandiva.txt \
       compilers # 从channel下载
$ export ARROW_HOME=$CONDA_PREFIX

尝试从系统虚拟环境入手,安装Arrow C++的依赖,配置环境变量。使用现有虚拟环境时安装时,发现大量深度学习相关的包,都有依赖冲突问题,需要创建新虚拟环境:

  • lamini 1.0.2 requires pydantic==1.10.*,但gradio 4.4.0 requires pydantic>=2.0
  • tensorflow 2.6.5 requires typing-extensions<3.11,>=3.7,但大部分要求typing-extensions>=4.7.0
  • tensorflow 2.6.5 requires numpy~=1.19.2, 但大部分要求numpy>=1.22.0
$ brew update && brew bundle --file=arrow/cpp/Brewfile
$ python3 -m venv pyarrow-dev # 创建新的虚拟环境
$ source ./pyarrow-dev/bin/activate
$ pip install -r arrow/python/requirements-build.txt # 里面含有oldest-supported-numpy,无法用于Conda、HomeBrew
$ mkdir dist
$ export ARROW_HOME=$(pwd)/dist
$ export LD_LIBRARY_PATH=$(pwd)/dist/lib:$LD_LIBRARY_PATH
$ export CMAKE_PREFIX_PATH=$ARROW_HOME:$CMAKE_PREFIX_PATH

安装

$ mkdir arrow/cpp/build
$ pushd arrow/cpp/build
$ cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
        -DCMAKE_INSTALL_LIBDIR=lib \
        -DCMAKE_BUILD_TYPE=Debug \
        -DARROW_BUILD_TESTS=ON \
        -DARROW_COMPUTE=ON \
        -DARROW_CSV=ON \
        -DARROW_DATASET=ON \
        -DARROW_FILESYSTEM=ON \
        -DARROW_HDFS=ON \
        -DARROW_JSON=ON \
        -DARROW_PARQUET=ON \
        -DARROW_WITH_BROTLI=ON \
        -DARROW_WITH_BZ2=ON \
        -DARROW_WITH_LZ4=ON \
        -DARROW_WITH_SNAPPY=ON \
        -DARROW_WITH_ZLIB=ON \
        -DARROW_WITH_ZSTD=ON \
        -DPARQUET_REQUIRE_ENCRYPTION=ON \
        ..
$ make -j4
$ make install
$ popd

进行到cmake步骤,又报错,暂时放弃 😢

CMake Error at /usr/local/Cellar/cmake/3.22.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem)
  (Required is at least version "1.64")
Call Stack (most recent call first):
  /usr/local/Cellar/cmake/3.22.2/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /usr/local/Cellar/cmake/3.22.2/share/cmake/Modules/FindBoost.cmake:2375 (find_package_handle_standard_args)
  cmake_modules/ThirdpartyToolchain.cmake:307 (find_package)
  cmake_modules/ThirdpartyToolchain.cmake:1271 (resolve_dependency)
  CMakeLists.txt:542 (include)

参考课程中的环境版本,后续再搞:

  • pyarrow==13.0.0
  • numpy==1.24.3
  • datasets==2.14.4

参考

https://arrow.apache.org/docs/developers/python.html#python-development

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值