平台上执行复杂查询,OOM,根据日志提示的结局方法:
-- SET spark.driver.memory=6/8G;【还是OOM】
set spark.sql.autoBroadcastJoinThreshold=-1;【解决问题】
Exception in thread "broadcast-exchange-0" java.lang.OutOfMemoryError: Not enough memory to build and broadcast the table to all worker nodes. As a workaround, you can either disable broadcast by setting spark.sql.autoBroadcastJoinThreshold to -1 or increase the spark driver memory by setting spark.driver.memory to a higher value
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:115)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1$$anonfun$apply$1.apply(BroadcastExchangeExec.scala:73)
at org.apache.spark.sql.execution.SQLExecution$.withExecutionId(SQLExecution.scala:97)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec$$anonfun$relationFuture$1.apply(BroadcastExchangeExec.scala:72)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
at java.lang.Thread.run(Thread.java:748)
2019-11-20 13:45:38 调度器收到作业实例: T_6466665892523663_20191120134538_2, 加入到时间调度队列中.
2019-11-20 13:45:38 计划开始时间是: 2019-11-19 00:10:00, 等待调度中...
2019-11-20 13:45:38 实例到达调度时间, 但父节点尚未完成
2019-11-20 13:47:05 实例的父节点已完成, 且达调度时间,加入调度队列.
2019-11-20 13:47:05 当前有0个作业排队中...
2019-11-20 13:47:05 等待调度中...
2019-11-20 13:47:05 开始调度作业实例
2019-11-20 13:47:05 第1次在节点: 10.94.0.15 上获取Yarn资源信息.
2019-11-20 13:47:05 Yarn:http://10.94.0.16:8088,10.94.0.10:8088, 队列:root.hive 的剩余资源是:"VCores: 17, Memory: 22796M".
2019-11-20 13:47:05 获取Yarn资源结束,开始下放到Node上执行.
2019-11-20 13:47:05 第1次调度到节点: 10.94.0.15 成功.
the current free memorySize is 7240m, greater than the threshold 64m, satisfying the memory requirement to execute task
获取任务插件: SparkSql插件 成功。插件信息:
引擎:SparkSql
引擎适配版本:[]
厂商:
厂商版本:
start to init task.
获取到 resource-plugin: sparkSql2.2.0-cdh5.12.0-1.0.0-jar-with-dependencies.jar
19/11/20 13:47:06 INFO spark.SparkContext: Running Spark version 2.3.0.cloudera2
19/11/20 13:47:06 INFO spark.SparkContext: Submitted application: dwd_mem_2b_applet_filter_task_detail_walker_T_6466665892523663_20191120134538_2_1
19/11/20 13:47:07 INFO server.Server: jetty-9.3.z-SNAPSHOT
19/11/20 13:47:08 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm177
19/11/20 13:47:08 INFO yarn.Client: Requesting a new application from cluster with 5 NodeManagers
19/11/20 13:47:08 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
19/11/20 13:47:08 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead
19/11/20 13:47:08 INFO yarn.Client: Setting up container launch context for our AM
19/11/20 13:47:08 INFO yarn.Client: Setting up the launch environment for our AM container
19/11/20 13:47:08 INFO yarn.Client: Preparing resources for our AM container
19/11/20 13:47:09 INFO yarn.Client: Submitting application application_1573451947838_0670 to ResourceManager
19/11/20 13:47:09 INFO impl.YarnClientImpl: Submitted application application_1573451947838_0670
19/11/20 13:47:10 INFO yarn.Client: Application report for application_1573451947838_0670 (state: ACCEPTED)
19/11/20 13:47:10 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.hive
start time: 1574228829689
final status: UNDEFINED
tracking URL: http://cdh-hadoop-16:8088/proxy/application_1573451947838_0670/
user: deploy
19/11/20 13:47:11 INFO yarn.Client: Application report for application_1573451947838_0670 (state: