pyspark在yarn上运行出现的问题

chnhbhndchngn

已于 2022-09-10 17:58:50 修改

阅读量1k

点赞数

分类专栏： Spark+Python 文章标签：大数据 spark 分布式 python yarn

于 2022-09-07 18:14:44 首次发布

本文链接：https://blog.csdn.net/a857553315/article/details/126751232

版权

Spark+Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

2022-09-07 17:15:12,431 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, node3, e
Error from python worker:
  Could not find platform independent libraries <prefix>
  Could not find platform dependent libraries <exec_prefix>
  Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
  Python path configuration:
    PYTHONHOME = (not set)
    PYTHONPATH = '/opt/module/hadoop-3.1.3/data/nm-local-dir/usercache/xm/filecache/11/__spark_libcache/xm/appcache/application_1662541746188_0001/container_1662541746188_0001_01_000002/pyspark.zintainer_1662541746188_0001_01_000002/py4j-0.10.9-src.zip'
    program name = 'python'
    isolated = 0
    environment = 1
    user site = 1
    import site = 1
    sys._base_executable = ''
    sys.base_prefix = '/tmp/build/80754af9/python_1618343417471/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho'
    sys.base_exec_prefix = '/tmp/build/80754af9/python_1618343417471/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho'
    sys.executable = ''
    sys.prefix = '/tmp/build/80754af9/python_1618343417471/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho'
    sys.exec_prefix = '/tmp/build/80754af9/python_1618343417471/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho'
    sys.path = [
      '/opt/module/hadoop-3.1.3/data/nm-local-dir/usercache/xm/filecache/11/__spark_libs__71874246
      '/opt/module/hadoop-3.1.3/data/nm-local-dir/usercache/xm/appcache/application_1662541746188_
      '/opt/module/hadoop-3.1.3/data/nm-local-dir/usercache/xm/appcache/application_1662541746188_
      '/tmp/build/80754af9/python_1618343417471/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python38.zip',
      '/tmp/build/80754af9/python_1618343417471/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/python3.8',
      '/tmp/build/80754af9/python_1618343417471/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeho/lib/lib-dynload',
    ]
  Fatal Python error: init_fs_encoding: failed to get the Python codec of the filesystem encoding
  Python runtime state: core initialized
  ModuleNotFoundError: No module named 'encodings'

编写的Pyspark代码在yarn上执行的时候,遇到的问题,其实已经配置了PYTHONHOME了, 这里虽然说的是(no set) 不用管, 主要是PYTHONPATH没有配置造成的, 因为使用的是自己安装的anaconda3, 所以此时的pythonpath需要自己找一下, 方法很简单如下

import sys

print(sys.path)

将得到的结果中是目录的全部记录下来, 添加pythonpath的到profile的环境变量中即可, 具体的环境变量如下:

export PYTHONPATH="/usr/anaconda3/lib:/usr/anaconda3/lib/python3.8:/usr/anaconda3/lib/python3.8/lib-dynload:/usr/anaconda3/lib/python3.8/site-packages"

然后 source /et/profile 一下, 重新运行就没问题了