操作步骤:
1. 迁移模型过程当中,首先使用了CPU:8 核 64GiB GPU:1 * nvidia-v100-pcie-32gb 32GiB的GPU环境跑了模型,一切正常
2. 根据TensorFlow的网络迁移文档修改以后,在Ascend-Powered-Engine | TF-1.15-python3.7-aarch64出现第三方Python包的问题。
问题分析:Ascend-Powered-Engine采用了python3.7-aarch64这个版本和之前python3.7不一致。
问题如下:在处理第三方Python包的时候,使用了https://support.huaweicloud.com/modelarts_faq/modelarts_05_0063.html 所指导的方法,会出现如下两种问题:
1. Shapely-1.7.1-cp37-cp37m-manylinux1_x86_64.whl( 下载地址:https://pypi.org/project/Shapely/#files )这个包会出现版本不符合平台,报错如下:
[Modelarts Service Log][INFO] exec pip install
ERROR: Shapely-1.7.1-cp37-cp37m-manylinux1_x86_64.whl is not a supported wheel on this platform.
[ModelArts Service Log]modelarts-pipe: total length: 168
2. opencv-python这个包从pip install opencv-python安装,报错简化如下:
Building wheel for opencv-python (PEP 517): finished with status 'error'
ERROR: Command errored out with exit status 1:
command: /usr/local/ma/python3.7/bin/python3.7 /usr/local/ma/python3.7/lib/python3.7/site-packages/pip/_vendor/pep517/_in_process.py build_wheel /tmp/tmpmdbsnvrq
cwd: /tmp/pip-install-s9dvoe_9/opencv-python
Complete output (9 lines):
Traceback (most recent call last):
File "/tmp/pip-build-env-wqggv6l6/overlay/lib/python3.7/site-packages/skbuild/setuptools_wrap.py", line 560, in setup
cmkr = cmaker.CMaker(cmake_executable)
File "/tmp/pip-build-env-wqggv6l6/overlay/lib/python3.7/site-packages/skbuild/cmaker.py", line 95, in __init__
self.cmake_version = get_cmake_version(self.cmake_executable)
File "/tmp/pip-build-env-wqggv6l6/overlay/lib/python3.7/site-packages/skbuild/cmaker.py", line 82, in get_cmake_version
"Problem with the CMake installation, aborting build. CMake executable is %s" % cmake_executable)
Problem with the CMake installation, aborting build. CMake executable is cmake
ERROR: Failed building wheel for opencv-python
Successfully built gast grpcio pyclipper PyYAML termcolor wrapt zope.interface easydict Shapely scikit-image
Failed to build matplotlib opencv-python
ERROR: Could not build wheels for opencv-python which use PEP 517 and cannot be installed directly
想请教的一些问题:
1. 数据预处理会涉及大量的Python第三方库,安装第三方库的时候,会由于Python版本等各种问题在平台报错(无论在GPU环境,还是NPU环境),我们应该如何处理这个环境问题?即使是自定义docker,也会出现类似如上opencv-python因为python3.7-aarch64这个版本,安装不上的问题,导致解决环境较为复杂。
2. 目前想到的解决方案:也可以采用先本地环境提前将数据预处理,然后再将预处理以后的数据转换了tfrecord格式,但是不知道这样设计是否合理。