以下面这种方式设置的driver memory是没用的
sc = SparkContext('local[*]', 'ad_position')
ss = SparkSession(sc).builder.master('local[*]') \
.config('spark.sql.shuffle.partitions', 200).config("spark.debug.maxToStringFields", "100")\
.config('spark.driver.memory', '4g').config('spark.executor.memory', '4g').config('spark.driver.maxResultsSize', '0') \
.appName('ad_position').getOrCreate()
因为当从Python模块启动上下文时,无法设置驱动程序的内存大小,也就是说一旦JVM启动,Java / Scala就无法改变驱动程序的内存大小
不用submit,程序动态设置:
在导入pyspark模块之前,加入下面这句话
import os
memory = '10g'
pyspark_submit_args = ' --driver-memory ' + memory + ' pyspark-shell'
os.environ["PYSPARK_SUBMIT_ARGS"] = pyspark_submit_args
from pyspark import SparkContext, SQLContext, SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import functions as fn