直接提交单个py文件
spark-submit --deploy-mode client --driver-memory 2G --executor-memory 2G --executor-cores 3 --num-executors 3 --properties-file /etc/spark/conf/spark-defaults.conf test.py
依赖helper中的代码
from helper.util_helper import sub_name
data_converted = data.map(lambda x: (sub_name(x[2][1]),
sub_name(x[1][1]), sub_name(x[2][1])))
cd /data/apps/modules/jupyter/
zip -r helper.zip helper/
spark-submit --deploy-mode client --driver-memory 2G --executor-memory 2G --executor-cores 3 --num-executors 3 --properties-file /etc/spark/conf/spark-defaults.conf --py-files ./helper.zip test.py
有其它模块的代码依赖,比如requests
def get_bd_res():
import sys
import requests
url = "http://www.baidu.com"
payload={
}
headers = {
'Cookie':