spark默认的spark.driver.maxResultSize为1g,所以在运行spark程序的时候有时候会报错:
ERROR TaskSetManager: Total size of serialized results of 8113 tasks (1131.0 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)
解决方案是:
from pyspark import SparkConf, SparkContext
SparkContext.setSystemProperty('spark.driver.maxResultSize', '10g')