Spark
文章平均质量分 58
汐朔
这个作者很懒,什么都没留下…
展开
-
在Spark中实现map-side join和reduce-side join
看过一篇能浅显易懂地解释spark的map-side join与reduce-side join转载 2023-02-03 10:18:32 · 429 阅读 · 0 评论 -
Cannot initialize Cluster. Please check your configuration for mapreduce.framework .name and the cor
背景利用ambari搭建的新环境,跑数据出现了不少问题,但如下问题困扰了很长时间,直到今天才得以解决,每次报错。按照网上的各种方式都不行。我知道问题点肯定在spark2.3.1 集成hive3.1.0的版本问题上,因为hive3.1.0新增了很多功能,如事务等,发布时间没有长时间的积累,出问题很容易不受控制。环境采用ambari2.7.1 + spark2.3.1 + hadoop3.1.1 + hive3.1.0scala2.11.8, jdk1.8代码// 可以正常打印原创 2021-01-11 18:14:03 · 3059 阅读 · 0 评论 -
Yarn application has already ended! It might have been killed or unable to launch application master
提交SPARK任务时报org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.异常,如下:SLF4J: See http://www.slf4j.org/codes.html#multi...原创 2019-11-28 15:43:14 · 1378 阅读 · 1 评论 -
SparkException: Found both spark.driver.extraClassPath and SPARK_CLASSPATH. Use only the former.
在提交spark任务时报,SparkException: Found both spark.driver.extraClassPath and SPARK_CLASSPATH. Use only the former.的异常。Warning: Local jar /usr/local/spark-2.1.0-bin-hadoop2.6/conf/hdfs-site.xml does not ex...原创 2019-11-27 20:53:41 · 668 阅读 · 0 评论 -
IDEA调试SparkSQL时报异常This timeout is controlled by spark.executor.heartbeatInterval
背景在调试sparksql报如下异常报错主要异常为org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval发现语句的后面有张表不存在, 先解决该问题ok解决了...原创 2019-10-08 20:26:03 · 4355 阅读 · 0 评论