bin/spark-shell --master yarn-client
spark.executor.memory=26000M,\
spark.executor.cores=4,\
spark.executor.instances=5,\
spark.driver.cores=4,\
spark.driver.memory=24000M,\
spark.default.parallelism=128,\
spark.streaming.blockInterval=100ms,\
spark.streaming.receiver.maxRate=20000,\
spark.akka.timeout=300,\
spark.storage.memoryFraction=0.6,\
spark.rdd.compress=true,\
spark.executor.instances=16,\
spark.serializer=org.apache.spark.serializer.KryoSerializer,\
spark.kryoserializer.buffer.max=2047m,\
======================================================
日志设置:
spark.executor.cores=4,\
spark.executor.instances=5,\
spark.driver.cores=4,\
spark.driver.memory=24000M,\
spark.default.parallelism=128,\
spark.streaming.blockInterval=100ms,\
spark.streaming.receiver.maxRate=20000,\
spark.akka.timeout=300,\
spark.storage.memoryFraction=0.6,\
spark.rdd.compress=true,\
spark.executor.instances=16,\
spark.serializer=org.apache.spark.serializer.KryoSerializer,\
spark.kryoserializer.buffer.max=2047m,\
======================================================
日志设置:
通过配置文件
log4j.rootLogger=warn,console
# console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Threshold=info
log4j.appender.console.ImmediateFlush=true
log4j.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=[%-5p] %d %m %x %n
2、代码中设置
val sc: SparkContext = new SparkContext(sparkConf)
sc.setLogLevel("WARN")
//sc.setLogLevel("DEBUG")
//sc.setLogLevel("ERROR")
//sc.setLogLevel("INFO")
log4j.rootLogger=warn,console
# console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.Threshold=info
log4j.appender.console.ImmediateFlush=true
log4j.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=[%-5p] %d %m %x %n
2、代码中设置
val sc: SparkContext = new SparkContext(sparkConf)
sc.setLogLevel("WARN")
//sc.setLogLevel("DEBUG")
//sc.setLogLevel("ERROR")
//sc.setLogLevel("INFO")
==============================================================
yarn-client和yarn-cluser的区别:以Driver的位置来区分
yarn-client:
Client和Driver运行在一起,ApplicationMaster只用来获取资源;结果实时输出在客户端控制台上,可以方便的看到日志信息,推荐使用该模式;
提交到yarn后,yarn先启动ApplicationMaster和Executor,两者都是运行在Container中。注意:一个container中只运行一个executorbackend;
yarn-cluser:
Driver和ApplicationMaster运行在一起,所以运行结果不能在客户端控制台显示,需要将结果需要存放在HDFS或者写到数据库中;
driver在集群上运行,可通过ui界面访问driver的状态。
==成功
/usr/lib/spark-current/bin/spark-submit --class com.tingyun.statistics.MobleAppDiuStatistics --master yarn --deploy-mode client --driver-memory 6g --num-executors 6 --executor-memory 3g --executor-cores 2 --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=512m" /root/andy/diustatisitcs.jar 60
/usr/lib/spark-current/bin/spark-submit --class com.tingyun.statistics.MobleAppDiuStatistics --master yarn --deploy-mode client --driver-memory 6g --num-executors 4 --executor-memory 5g --executor-cores 3 --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseConcMarkSweepGC -XX:MaxDirectMemorySize=512m" /root/andy/diustatisitcs.jar 60
Yarn client模式:1 driver(在master上)+ 1 app master(on worker)+ n个 executor
Yarn cluster模式:1 driver(包含app master功能,on worker)+ n个executor
CMS垃圾回收器:使用并行的mark-sweep垃圾回收机制,被推荐使用,用来保持GC低开销。虽然并行的GC会降低吞吐量,但是还是建议使用它,来减少batch的处理时间(降低处理过程中的gc开销)。如果要使用,那么要在driver端和executor端都开启。在spark-submit中使用--driver-java-options设置;
使用spark.executor.extraJavaOptions参数设置。-XX:+UseConcMarkSweepGC。
使用spark.executor.extraJavaOptions参数设置。-XX:+UseConcMarkSweepGC。