代码如下,同样的代码通过spark-submit --master yarn是可以正常运行的,但是在pycharm提交到yarn上就会报错java.io.IOException: Cannot run program “python3”: error=2, No such file or directory
查了很多资料改了环境变量都没有解决,求大神解答
#encoding:utf-8
from pyspark import SparkConf,SparkContext
import json
import os
os.environ['HADOOP_CONF_DIR'] = "/export/server/hadoop/etc/hadoop"
if __name__ == '__main__':
# 初始化执行环境,创建SparkConf对象
conf = SparkConf().setAppName("test").setMaster("yarn")
sc = SparkContext(conf=conf)
rdd_file = sc.textFile("hdfs://node1:8020/input/order.txt")
jsons_rdd = rdd_file.flatMap(lambda line:line.split("|"))
dict_rdd = jsons_rdd.map(lambda json_str:json.loads(json_str))
filter_rdd = dict_rdd.filter(lambda x:x["areaName"]=="北京")
beijing_rdd = filter_rdd.map(lambda x:x["areaName"]+"_"+x["category"])
result_rdd = beijing_rdd.distinct()
print(result_rdd.collect())