CDH6.3.2添加spark-sql

前言

众所周知,CDH为了推自家的Impala,阉割掉了Spark的spark-sql工具,虽然很多时候我们并不需要spark-sql,但是架不住特殊情况下有使用它的时候,这个根据项目或者团队(个人)情况而异。我这边就是因为项目原因,需要使用spark-sql,因此从网上各种查资料,折腾了好几天,最终在CDH集群上集成了spark-sql,以下操作并不能保证百分百适配你的环境,但思路可供借鉴。

集成步骤

下载Apache-spark2.4.0

因为CDH6.3.2使用的Spark版本是2.4.0,为了避免使用过程中出现某些版本不匹配的隐患,因此需要从官网下载spark2.4.0,下载地址:
https://archive.apache.org/dist/spark/spark-2.4.0/
在这里插入图片描述

将下载好的压缩包解压到CDH集群服务器的合适目录下

此处我将其放到/opt/cloudera/parcels/CDH/lib/下并起名为spark2

tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz
mv spark-2.4.0-bin-hadoop2.7 /opt/cloudera/parcels/CDH/lib/spark2

修改配置文件

将spark2/conf下的所有配置全部删除,并将集群spark的conf文件复制到此处

cd /opt/cloudera/parcels/CDH/lib/spark2/conf
rm -rf *
cp -r /etc/spark/conf/* ./
mv spark-env.sh spark-env

说明:

  • spark/conf我是取自ResourceManager节点的,一开始我是在gateway节点上安装spark-sql,conf文件也是直接复制的gateway节点,但是最后死活无法正常链接yarn,后来在ResourceManager(master)节点执行同样的配置,却能正常使用spark-sql,所以应该是gateway节点上的配置有不通的地方,具体原因不明,请大神赐教。
scp -r root@master1:/etc/etc/spark/conf/* ./
  • 将spark-env.sh改名是为了不让spark-sql走spark环境,而是走hive源数据库。

添加hive-site.xml

将gateway节点的hive-site.xml复制到spark2/conf目录下,不需要做变动

cp /etc/hive/conf/hive-site.xml /opt/cloudera/parcels/CDH/lib/spark2/conf/

配置yarn.resourcemanager

查看你CDH的yarn配置里是否有如下配置,需要开启
在这里插入图片描述

正常情况下,resourcemanager应该会默认启用以上配置的
在这里插入图片描述

创建spark-sql

cat /opt/cloudera/parcels/CDH/bin/spark-sql 
#!/bin/bash  
# Reference: http://stackoverflow.com/questions/59895/can-a-bash-script-tell-what-directory-its-stored-in  
export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
SOURCE="${BASH_SOURCE[0]}"  
BIN_DIR="$( dirname "$SOURCE" )"  
while [ -h "$SOURCE" ]  
do  
 SOURCE="$(readlink "$SOURCE")"  
 [[ $SOURCE != /* ]] && SOURCE="$BIN_DIR/$SOURCE"  
 BIN_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"  
done  
BIN_DIR="$( cd -P "$( dirname "$SOURCE" )" && pwd )"  
LIB_DIR=$BIN_DIR/../lib  
export HADOOP_HOME=$LIB_DIR/hadoop  
  
# Autodetect JAVA_HOME if not defined  
. $LIB_DIR/bigtop-utils/bigtop-detect-javahome  
  
exec $LIB_DIR/spark2/bin/spark-submit --class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver "$@" 

配置快捷方式

alternatives --install /usr/bin/spark-sql spark-sql /opt/cloudera/parcels/CDH/bin/spark-sql 1 

说明

本篇内容主要参考以下文章,并根据自己集群的情况进行了相应修改
http://blog.51yip.com/hadoop/2286.html/comment-page-1

介于网友们参照我的文章依然无法使用spark-sql的问题,这里做一下说明:
因为每个人搭建的集群及安装的组件不一定一样,这里我把我集群的相关信息发出来,供大家参考
集群情况:
在这里插入图片描述

附安装部署过程,上传与解压过程略,解压后的完整操作:

[root@gateway1 pkg]# ll /opt/cloudera/parcels/CDH/lib/spark2
ls: cannot access /opt/cloudera/parcels/CDH/lib/spark2: No such file or directory
[root@gateway1 pkg]# mv spark-2.4.0-bin-hadoop2.7 /opt/cloudera/parcels/CDH/lib/spark2
[root@gateway1 pkg]# cd cd /opt/cloudera/parcels/CDH/lib/spark2/conf^C
[root@gateway1 pkg]# cd /opt/cloudera/parcels/CDH/lib/spark2/conf
[root@gateway1 conf]# ll
total 36
-rw-r--r-- 1 aw-bdp-collect bdp  996 Oct 29  2018 docker.properties.template
-rw-r--r-- 1 aw-bdp-collect bdp 1105 Oct 29  2018 fairscheduler.xml.template
-rw-r--r-- 1 aw-bdp-collect bdp 2025 Oct 29  2018 log4j.properties.template
-rw-r--r-- 1 aw-bdp-collect bdp 7801 Oct 29  2018 metrics.properties.template
-rw-r--r-- 1 aw-bdp-collect bdp  865 Oct 29  2018 slaves.template
-rw-r--r-- 1 aw-bdp-collect bdp 1292 Oct 29  2018 spark-defaults.conf.template
-rwxr-xr-x 1 aw-bdp-collect bdp 4221 Oct 29  2018 spark-env.sh.template
[root@gateway1 conf]# rm -rf *
[root@gateway1 conf]# pwd
/opt/cloudera/parcels/CDH/lib/spark2/conf
[root@gateway1 conf]# scp -r root@master1:/etc/etc/spark/conf/* ./
The authenticity of host 'master1 (192.168.0.11)' can't be established.
ECDSA key fingerprint is SHA256:+D1XEzAjCwxLbmLy9ezk9Dc7cKnjorDXBghFZpLp/fE.
ECDSA key fingerprint is MD5:b2:9f:c7:f0:a5:ba:03:02:42:61:50:14:db:b2:07:03.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'master1,192.168.0.11' (ECDSA) to the list of known hosts.
root@master1's password: 
scp: /etc/etc/spark/conf/*: No such file or directory
[root@gateway1 conf]# scp -r root@master1:/etc/spark/conf/* ./
root@master1’s password: 
docker.properties.template                                                                100%  996     1.5MB/s   00:00    
fairscheduler.xml.template                                                                100% 1105     2.2MB/s   00:00    
log4j.properties.template                                                                 100% 2025     4.0MB/s   00:00    
metrics.properties.template                                                               100% 7801    11.2MB/s   00:00    
slaves.template                                                                           100%  865     2.0MB/s   00:00    
spark-defaults.conf                                                                       100% 1292     2.6MB/s   00:00    
spark-defaults.conf.template                                                              100% 1292     3.0MB/s   00:00    
spark-env.sh                                                                              100% 5556     9.7MB/s   00:00    
spark-env.sh.template                                                                     100% 4221     8.0MB/s   00:00    
[root@gateway1 conf]# ll
total 48
-rw-r--r-- 1 root root  996 Aug 12 09:12 docker.properties.template
-rw-r--r-- 1 root root 1105 Aug 12 09:12 fairscheduler.xml.template
-rw-r--r-- 1 root root 2025 Aug 12 09:12 log4j.properties.template
-rw-r--r-- 1 root root 7801 Aug 12 09:12 metrics.properties.template
-rw-r--r-- 1 root root  865 Aug 12 09:12 slaves.template
-rw-r--r-- 1 root root 1292 Aug 12 09:12 spark-defaults.conf
-rw-r--r-- 1 root root 1292 Aug 12 09:12 spark-defaults.conf.template
-rwxr-xr-x 1 root root 5556 Aug 12 09:12 spark-env.sh
-rwxr-xr-x 1 root root 4221 Aug 12 09:12 spark-env.sh.template
[root@gateway1 conf]# mv spark-env.sh spark-env
[root@gateway1 conf]# ll
total 48
-rw-r--r-- 1 root root  996 Aug 12 09:12 docker.properties.template
-rw-r--r-- 1 root root 1105 Aug 12 09:12 fairscheduler.xml.template
-rw-r--r-- 1 root root 2025 Aug 12 09:12 log4j.properties.template
-rw-r--r-- 1 root root 7801 Aug 12 09:12 metrics.properties.template
-rw-r--r-- 1 root root  865 Aug 12 09:12 slaves.template
-rw-r--r-- 1 root root 1292 Aug 12 09:12 spark-defaults.conf
-rw-r--r-- 1 root root 1292 Aug 12 09:12 spark-defaults.conf.template
-rwxr-xr-x 1 root root 5556 Aug 12 09:12 spark-env
-rwxr-xr-x 1 root root 4221 Aug 12 09:12 spark-env.sh.template
[root@gateway1 conf]# cp /etc/hive/conf/hive-site.xml /opt/cloudera/parcels/CDH/lib/spark2/conf/
[root@gateway1 conf]# ll
total 56
-rw-r--r-- 1 root root  996 Aug 12 09:12 docker.properties.template
-rw-r--r-- 1 root root 1105 Aug 12 09:12 fairscheduler.xml.template
-rw-r--r-- 1 root root 6792 Aug 12 09:12 hive-site.xml
-rw-r--r-- 1 root root 2025 Aug 12 09:12 log4j.properties.template
-rw-r--r-- 1 root root 7801 Aug 12 09:12 metrics.properties.template
-rw-r--r-- 1 root root  865 Aug 12 09:12 slaves.template
-rw-r--r-- 1 root root 1292 Aug 12 09:12 spark-defaults.conf
-rw-r--r-- 1 root root 1292 Aug 12 09:12 spark-defaults.conf.template
-rwxr-xr-x 1 root root 5556 Aug 12 09:12 spark-env
-rwxr-xr-x 1 root root 4221 Aug 12 09:12 spark-env.sh.template
[root@gateway1 conf]# cd /opt/cloudera/parcels/CDH/bin/
[root@gateway1 bin]# ll
total 308
-rwxr-xr-x 1 root root  594 Nov  9  2019 avro-tools
-rwxr-xr-x 1 root root  771 Nov  9  2019 beeline
lrwxrwxrwx 1 root root   42 Nov  9  2019 bigtop-detect-javahome -> ../lib/bigtop-utils/bigtop-detect-javahome
-rwxr-xr-x 1 root root 2536 Nov  9  2019 catalogd
-rwxr-xr-x 1 root root  600 Nov  9  2019 cli_mt
-rwxr-xr-x 1 root root  600 Nov  9  2019 cli_st
-rwxr-xr-x 1 root root  642 Nov  9  2019 flume-ng
-rwxr-xr-x 1 root root  626 Nov  9  2019 hadoop
-rwxr-xr-x 1 root root 1361 Nov  9  2019 hadoop-fuse-dfs
-rwxr-xr-x 1 root root 1055 Nov  9  2019 hbase
-rwxr-xr-x 1 root root  583 Nov  9  2019 hbase-indexer
-rwxr-xr-x 1 root root  595 Nov  9  2019 hbase-indexer-sentry
-rwxr-xr-x 1 root root  950 Nov  9  2019 hcat
-rwxr-xr-x 1 root root  690 Nov  9  2019 hdfs
-rwxr-xr-x 1 root root  768 Nov  9  2019 hive
-rwxr-xr-x 1 root root  775 Nov  9  2019 hiveserver2
-rwxr-xr-x 1 root root  539 Nov  9  2019 impala-collect-diagnostics
-rwxr-xr-x 1 root root  537 Nov  9  2019 impala-collect-minidumps
-rwxr-xr-x 1 root root 2535 Nov  9  2019 impalad
-rwxr-xr-x 1 root root 2582 Nov  9  2019 impala-shell
-rwxr-xr-x 1 root root  645 Nov  9  2019 kafka-acls
-rwxr-xr-x 1 root root  660 Nov  9  2019 kafka-broker-api-versions
-rwxr-xr-x 1 root root  648 Nov  9  2019 kafka-configs
-rwxr-xr-x 1 root root  657 Nov  9  2019 kafka-console-consumer
-rwxr-xr-x 1 root root  657 Nov  9  2019 kafka-console-producer
-rwxr-xr-x 1 root root  656 Nov  9  2019 kafka-consumer-groups
-rwxr-xr-x 1 root root  659 Nov  9  2019 kafka-consumer-perf-test
-rwxr-xr-x 1 root root  658 Nov  9  2019 kafka-delegation-tokens
-rwxr-xr-x 1 root root  655 Nov  9  2019 kafka-delete-records
-rwxr-xr-x 1 root root  649 Nov  9  2019 kafka-dump-log
-rwxr-xr-x 1 root root  649 Nov  9  2019 kafka-log-dirs
-rwxr-xr-x 1 root root  653 Nov  9  2019 kafka-mirror-maker
-rwxr-xr-x 1 root root  667 Nov  9  2019 kafka-preferred-replica-election
-rwxr-xr-x 1 root root  659 Nov  9  2019 kafka-producer-perf-test
-rwxr-xr-x 1 root root  660 Nov  9  2019 kafka-reassign-partitions
-rwxr-xr-x 1 root root  661 Nov  9  2019 kafka-replica-verification
-rwxr-xr-x 1 root root  650 Nov  9  2019 kafka-run-class
-rwxr-xr-x 1 root root  647 Nov  9  2019 kafka-sentry
-rwxr-xr-x 1 root root  653 Nov  9  2019 kafka-server-start
-rwxr-xr-x 1 root root  652 Nov  9  2019 kafka-server-stop
-rwxr-xr-x 1 root root  666 Nov  9  2019 kafka-streams-application-reset
-rwxr-xr-x 1 root root  647 Nov  9  2019 kafka-topics
-rwxr-xr-x 1 root root  660 Nov  9  2019 kafka-verifiable-consumer
-rwxr-xr-x 1 root root  660 Nov  9  2019 kafka-verifiable-producer
-rwxr-xr-x 1 root root  576 Nov  9  2019 kite-dataset
-rwxr-xr-x 1 root root  527 Nov  9  2019 kudu
-rwxr-xr-x 1 root root  602 Nov  9  2019 load_gen
-rwxr-xr-x 1 root root  702 Nov  9  2019 mapred
-rwxr-xr-x 1 root root 1353 Nov  9  2019 oozie
-rwxr-xr-x 1 root root 1525 Nov  9  2019 oozie-setup
-rwxr-xr-x 1 root root  580 Nov  9  2019 parquet-tools
-rwxr-xr-x 1 root root 1646 Nov  9  2019 pig
-rwxr-xr-x 1 root root  607 Nov  9  2019 pyspark
-rwxr-xr-x 1 root root  674 Nov  9  2019 sentry
-rwxr-xr-x 1 root root  694 Nov  9  2019 solrctl
-rwxr-xr-x 1 root root  611 Nov  9  2019 spark-shell
-rwxr-xr-x 1 root root  612 Nov  9  2019 spark-submit
-rwxr-xr-x 1 root root  922 Nov  9  2019 sqoop
-rwxr-xr-x 1 root root  930 Nov  9  2019 sqoop-codegen
-rwxr-xr-x 1 root root  940 Nov  9  2019 sqoop-create-hive-table
-rwxr-xr-x 1 root root  927 Nov  9  2019 sqoop-eval
-rwxr-xr-x 1 root root  929 Nov  9  2019 sqoop-export
-rwxr-xr-x 1 root root  927 Nov  9  2019 sqoop-help
-rwxr-xr-x 1 root root  929 Nov  9  2019 sqoop-import
-rwxr-xr-x 1 root root  940 Nov  9  2019 sqoop-import-all-tables
-rwxr-xr-x 1 root root  926 Nov  9  2019 sqoop-job
-rwxr-xr-x 1 root root  937 Nov  9  2019 sqoop-list-databases
-rwxr-xr-x 1 root root  934 Nov  9  2019 sqoop-list-tables
-rwxr-xr-x 1 root root  928 Nov  9  2019 sqoop-merge
-rwxr-xr-x 1 root root  932 Nov  9  2019 sqoop-metastore
-rwxr-xr-x 1 root root  930 Nov  9  2019 sqoop-version
-rwxr-xr-x 1 root root 2539 Nov  9  2019 statestored
-rwxr-xr-x 1 root root  743 Nov  9  2019 yarn
-rwxr-xr-x 1 root root  849 Nov  9  2019 zookeeper-client
-rwxr-xr-x 1 root root  663 Nov  9  2019 zookeeper-security-migration
-rwxr-xr-x 1 root root 1175 Nov  9  2019 zookeeper-server
-rwxr-xr-x 1 root root 1176 Nov  9  2019 zookeeper-server-cleanup
-rwxr-xr-x 1 root root 1186 Nov  9  2019 zookeeper-server-initialize
[root@gateway1 bin]# vim spark-sql
[root@gateway1 bin]# chmod + x spark-sql 
chmod: cannot access ‘x’: No such file or directory
[root@gateway1 bin]# chmod +x spark-sql 
[root@gateway1 bin]# ./spark-sql 
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.use.vector.serde.deserialize does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.adaptor.usage.mode does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.use.vectorized.input.format does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.input.format.excludes does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.bucketing does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.type.safety does not exist
2022-08-12 09:14:57 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.cartesian.product does not exist
2022-08-12 09:14:57 INFO  metastore:376 - Trying to connect to metastore with URI thrift://utility1.cdh6.aw:9083
2022-08-12 09:14:57 INFO  metastore:472 - Connected to metastore.
2022-08-12 09:14:58 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-08-12 09:14:59 INFO  SessionState:641 - Created HDFS directory: /tmp/hive/root
2022-08-12 09:14:59 INFO  SessionState:641 - Created local directory: /tmp/root
2022-08-12 09:14:59 INFO  SessionState:641 - Created local directory: /tmp/2da6581d-f664-4dad-ba0d-70d0d8ed02d3_resources
2022-08-12 09:14:59 INFO  SessionState:641 - Created HDFS directory: /tmp/hive/root/2da6581d-f664-4dad-ba0d-70d0d8ed02d3
2022-08-12 09:14:59 INFO  SessionState:641 - Created local directory: /tmp/root/2da6581d-f664-4dad-ba0d-70d0d8ed02d3
2022-08-12 09:14:59 INFO  SessionState:641 - Created HDFS directory: /tmp/hive/root/2da6581d-f664-4dad-ba0d-70d0d8ed02d3/_tmp_space.db
2022-08-12 09:14:59 INFO  SparkContext:54 - Running Spark version 2.4.0
2022-08-12 09:14:59 INFO  SparkContext:54 - Submitted application: SparkSQL::192.168.0.13
2022-08-12 09:14:59 INFO  SecurityManager:54 - Changing view acls to: root
2022-08-12 09:14:59 INFO  SecurityManager:54 - Changing modify acls to: root
2022-08-12 09:14:59 INFO  SecurityManager:54 - Changing view acls groups to: 
2022-08-12 09:14:59 INFO  SecurityManager:54 - Changing modify acls groups to: 
2022-08-12 09:14:59 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2022-08-12 09:14:59 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 46813.
2022-08-12 09:14:59 INFO  SparkEnv:54 - Registering MapOutputTracker
2022-08-12 09:14:59 INFO  SparkEnv:54 - Registering BlockManagerMaster
2022-08-12 09:14:59 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2022-08-12 09:14:59 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2022-08-12 09:14:59 INFO  DiskBlockManager:54 - Created local directory at /tmp/blockmgr-3356a81b-b797-40d6-9d1f-a9d816f1de54
2022-08-12 09:14:59 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2022-08-12 09:14:59 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2022-08-12 09:14:59 INFO  log:192 - Logging initialized @3892ms
2022-08-12 09:14:59 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2022-08-12 09:14:59 INFO  Server:419 - Started @3974ms
2022-08-12 09:14:59 INFO  AbstractConnector:278 - Started ServerConnector@6944e53e{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2022-08-12 09:14:59 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@587a1cfb{/jobs,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@508a65bf{/jobs/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@17f2dd85{/jobs/job,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@210308d5{/jobs/job/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22a736d7{/stages,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23b8d9f3{/stages/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7f353d99{/stages/stage,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@38d17d80{/stages/stage/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6ede46f6{/stages/pool,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@66273da0{/stages/pool/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2127e66e{/storage,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1229a2b7{/storage/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@e5cbff2{/storage/rdd,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@51c959a4{/storage/rdd/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4fc3c165{/environment,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@10a0fe30{/environment/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7b6860f9{/executors,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@60f70249{/executors/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@31ee2fdb{/executors/threadDump,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@262816a8{/executors/threadDump/json,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1effd53c{/static,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3968bc60{/,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@26f46fa6{/api,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@70228253{/jobs/job/kill,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@63c12e52{/stages/stage/kill,null,AVAILABLE,@Spark}
2022-08-12 09:14:59 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://gateway1.cdh6.aw:4040
2022-08-12 09:14:59 INFO  Executor:54 - Starting executor ID driver on host localhost
2022-08-12 09:14:59 INFO  Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44931.
2022-08-12 09:14:59 INFO  NettyBlockTransferService:54 - Server created on gateway1.cdh6.aw:44931
2022-08-12 09:14:59 INFO  BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2022-08-12 09:14:59 INFO  BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, gateway1.cdh6.aw, 44931, None)
2022-08-12 09:14:59 INFO  BlockManagerMasterEndpoint:54 - Registering block manager gateway1.cdh6.aw:44931 with 366.3 MB RAM, BlockManagerId(driver, gateway1.cdh6.aw, 44931, None)
2022-08-12 09:14:59 INFO  BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, gateway1.cdh6.aw, 44931, None)
2022-08-12 09:14:59 INFO  BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, gateway1.cdh6.aw, 44931, None)
2022-08-12 09:15:00 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b31a708{/metrics/json,null,AVAILABLE,@Spark}
2022-08-12 09:15:00 INFO  SharedState:54 - loading hive config file: file:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark2/conf/hive-site.xml
2022-08-12 09:15:00 INFO  SharedState:54 - spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
2022-08-12 09:15:00 INFO  SharedState:54 - Warehouse path is '/user/hive/warehouse'.
2022-08-12 09:15:00 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@49d30c6f{/SQL,null,AVAILABLE,@Spark}
2022-08-12 09:15:00 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1115433e{/SQL/json,null,AVAILABLE,@Spark}
2022-08-12 09:15:00 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3bed3315{/SQL/execution,null,AVAILABLE,@Spark}
2022-08-12 09:15:00 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@119b0892{/SQL/execution/json,null,AVAILABLE,@Spark}
2022-08-12 09:15:00 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3bc4ef12{/static/sql,null,AVAILABLE,@Spark}
2022-08-12 09:15:00 INFO  HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
2022-08-12 09:15:00 INFO  HiveClientImpl:54 - Warehouse location for Hive client (version 1.2.2) is /user/hive/warehouse
2022-08-12 09:15:00 INFO  StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
Spark master: local[*], Application Id: local-1660266899850
2022-08-12 09:15:01 INFO  SparkSQLCLIDriver:951 - Spark master: local[*], Application Id: local-1660266899850
spark-sql> show databases;
2022-08-12 09:15:28 INFO  CodeGenerator:54 - Code generated in 173.089909 ms
app
default
dmt
dwh
src
tmp
Time taken: 1.872 seconds, Fetched 6 row(s)
2022-08-12 09:15:28 INFO  SparkSQLCLIDriver:951 - Time taken: 1.872 seconds, Fetched 6 row(s)
spark-sql> 

安装后可能遇到的问题

1. 执行spark-sql没问题,但是执行spark-sql --master yarn --deploy-mode client的时候报错如下:

2022-08-12 09:48:48 INFO  RMProxy:98 - Connecting to ResourceManager at /0.0.0.0:8032
2022-08-12 09:48:49 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-12 09:48:50 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-12 09:48:51 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-12 09:48:52 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-12 09:48:53 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-12 09:48:54 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2022-08-12 09:48:55 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

主要原因是当前机器上找不到ResourceManager的相关信息。这时你可以先检查一下/etc/hadoop/conf/目录,应该缺少yarn-site.xml文件,直接从相关服务器上复制一份过来即可。
当然,按照CDH的特性,一旦配置更新,所有配置都重置了,所以为了用CDH管理集群,最好的办法就是在这台机器上安装上yarn的gateway,一劳永逸。
在这里插入图片描述

2. 当你直接使用spark-sql时能正常使用,但是使用spark-sql --master yarn --deploy-mode client时报错,报错内容如下,从报错内容中我们很明显发现,root不具备操作hdfs的权限,换一个有hdfs写权限的用户执行即可,如sudo -u hive spark-sql --master yarn --deploy-mode client

2022-08-12 09:47:52 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x

完整报错如下:

[root@master1 bin]# spark-sql --master yarn --deploy-mode client
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.use.checked.expressions does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.no.partition.filter does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.use.vector.serde.deserialize does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.orderby.no.limit does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.adaptor.usage.mode does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.use.vectorized.input.format does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.vectorized.input.format.excludes does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.bucketing does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.type.safety does not exist
2022-08-12 09:47:49 WARN  HiveConf:2753 - HiveConf of name hive.strict.checks.cartesian.product does not exist
2022-08-12 09:47:50 INFO  metastore:376 - Trying to connect to metastore with URI thrift://utility1.cdh6.aw:9083
2022-08-12 09:47:50 INFO  metastore:472 - Connected to metastore.
2022-08-12 09:47:51 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-08-12 09:47:51 INFO  SessionState:641 - Created local directory: /tmp/root
2022-08-12 09:47:51 INFO  SessionState:641 - Created local directory: /tmp/6022c9c1-dc51-428a-a8b0-87193c45d846_resources
2022-08-12 09:47:51 INFO  SessionState:641 - Created HDFS directory: /tmp/hive/root/6022c9c1-dc51-428a-a8b0-87193c45d846
2022-08-12 09:47:51 INFO  SessionState:641 - Created local directory: /tmp/root/6022c9c1-dc51-428a-a8b0-87193c45d846
2022-08-12 09:47:51 INFO  SessionState:641 - Created HDFS directory: /tmp/hive/root/6022c9c1-dc51-428a-a8b0-87193c45d846/_tmp_space.db
2022-08-12 09:47:51 INFO  SparkContext:54 - Running Spark version 2.4.0
2022-08-12 09:47:51 INFO  SparkContext:54 - Submitted application: SparkSQL::192.168.0.11
2022-08-12 09:47:51 INFO  SecurityManager:54 - Changing view acls to: root
2022-08-12 09:47:51 INFO  SecurityManager:54 - Changing modify acls to: root
2022-08-12 09:47:51 INFO  SecurityManager:54 - Changing view acls groups to: 
2022-08-12 09:47:51 INFO  SecurityManager:54 - Changing modify acls groups to: 
2022-08-12 09:47:51 INFO  SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
2022-08-12 09:47:51 INFO  Utils:54 - Successfully started service 'sparkDriver' on port 36099.
2022-08-12 09:47:51 INFO  SparkEnv:54 - Registering MapOutputTracker
2022-08-12 09:47:51 INFO  SparkEnv:54 - Registering BlockManagerMaster
2022-08-12 09:47:51 INFO  BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2022-08-12 09:47:51 INFO  BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2022-08-12 09:47:51 INFO  DiskBlockManager:54 - Created local directory at /tmp/blockmgr-d93412b1-33df-49d1-a745-ed4cd89e232e
2022-08-12 09:47:51 INFO  MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2022-08-12 09:47:51 INFO  SparkEnv:54 - Registering OutputCommitCoordinator
2022-08-12 09:47:51 INFO  log:192 - Logging initialized @3564ms
2022-08-12 09:47:51 INFO  Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
2022-08-12 09:47:52 INFO  Server:419 - Started @3647ms
2022-08-12 09:47:52 INFO  AbstractConnector:278 - Started ServerConnector@6920614{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2022-08-12 09:47:52 INFO  Utils:54 - Successfully started service 'SparkUI' on port 4040.
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4c6007fb{/jobs,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@3dffc764{/jobs/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b6e1c0{/jobs/job,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@654c7d2d{/jobs/job/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@26cb5207{/stages,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@15400fff{/stages/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@18d910b3{/stages/stage,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@77774571{/stages/stage/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@277b8fa4{/stages/pool,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6cd64ee8{/stages/pool/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@620c8641{/storage,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2f1d0bbc{/storage/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5460b754{/storage/rdd,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@a9f023e{/storage/rdd/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@c27a3a2{/environment,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4b200971{/environment/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1a2bcd56{/executors,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@68d7a2df{/executors/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@59dc36d4{/executors/threadDump,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@12fcc71f{/executors/threadDump/json,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5679e96b{/static,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5e519ad3{/,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7bc44ce8{/api,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@53e800f9{/jobs/job/kill,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  ContextHandler:781 - Started o.s.j.s.ServletContextHandler@337bbfdf{/stages/stage/kill,null,AVAILABLE,@Spark}
2022-08-12 09:47:52 INFO  SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://master1.cdh6.aw:4040
2022-08-12 09:47:52 INFO  ConfiguredRMFailoverProxyProvider:100 - Failing over to rm363
2022-08-12 09:47:52 INFO  Client:54 - Requesting a new application from cluster with 3 NodeManagers
2022-08-12 09:47:52 INFO  Client:54 - Verifying our application has not requested more than the maximum memory capability of the cluster (1537 MB per container)
2022-08-12 09:47:52 INFO  Client:54 - Will allocate AM container, with 896 MB memory including 384 MB overhead
2022-08-12 09:47:52 INFO  Client:54 - Setting up container launch context for our AM
2022-08-12 09:47:52 INFO  Client:54 - Setting up the launch environment for our AM container
2022-08-12 09:47:52 INFO  Client:54 - Preparing resources for our AM container
2022-08-12 09:47:52 ERROR SparkContext:91 - Error initializing SparkContext.
org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3002)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2970)
	at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1047)
	at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1043)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1061)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1036)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1881)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:600)
	at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:427)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:864)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:178)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:178)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:315)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy12.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3000)
	... 35 more
2022-08-12 09:47:52 INFO  AbstractConnector:318 - Stopped Spark@6920614{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2022-08-12 09:47:52 INFO  SparkUI:54 - Stopped Spark web UI at http://master1.cdh6.aw:4040
2022-08-12 09:47:52 WARN  YarnSchedulerBackend$YarnSchedulerEndpoint:66 - Attempted to request executors before the AM has registered!
2022-08-12 09:47:52 INFO  YarnClientSchedulerBackend:54 - Stopped
2022-08-12 09:47:52 INFO  MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2022-08-12 09:47:52 INFO  MemoryStore:54 - MemoryStore cleared
2022-08-12 09:47:52 INFO  BlockManager:54 - BlockManager stopped
2022-08-12 09:47:52 INFO  BlockManagerMaster:54 - BlockManagerMaster stopped
2022-08-12 09:47:52 WARN  MetricsSystem:66 - Stopping a MetricsSystem that is not running
2022-08-12 09:47:52 INFO  OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2022-08-12 09:47:52 INFO  SparkContext:54 - Successfully stopped SparkContext
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3002)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2970)
	at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1047)
	at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1043)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1061)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1036)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1881)
	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:600)
	at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:427)
	at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:864)
	at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:178)
	at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
	at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:178)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:315)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
	at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

	at org.apache.hadoop.ipc.Client.call(Client.java:1475)
	at org.apache.hadoop.ipc.Client.call(Client.java:1412)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
	at com.sun.proxy.$Proxy12.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
	at com.sun.proxy.$Proxy13.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3000)
	... 35 more
2022-08-12 09:47:52 INFO  ShutdownHookManager:54 - Shutdown hook called
2022-08-12 09:47:52 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-3b285f1e-200e-4779-9b96-6c02c3200751
2022-08-12 09:47:52 INFO  ShutdownHookManager:54 - Deleting directory /tmp/spark-7ee59a88-7409-45c2-9e39-82d295bd5467

3.java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$RetrieveDelegationTokens$

单机安装的时候执行spark-sql遇到报错

ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 6405719561689998953
java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$RetrieveDelegationTokens$
.......
Caused by: java.lang.ClassNotFoundException: com.cloudera.spark.lineage.NavigatorAppListener

完整报错如下:

[aw-bdp-dwh@node00 ~]$ spark-sql
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.checked.expressions does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.strict.checks.no.partition.filter does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.vector.serde.deserialize does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.strict.checks.orderby.no.limit does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.vectorized.adaptor.usage.mode does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.vectorized.use.vectorized.input.format does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.vectorized.input.format.excludes does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.strict.checks.bucketing does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.strict.checks.type.safety does not exist
22/10/14 15:05:26 WARN conf.HiveConf: HiveConf of name hive.strict.checks.cartesian.product does not exist
22/10/14 15:05:26 INFO hive.metastore: Trying to connect to metastore with URI thrift://node00.cdh6.test:9083
22/10/14 15:05:26 INFO hive.metastore: Connected to metastore.
22/10/14 15:05:27 INFO session.SessionState: Created local directory: /tmp/9d7c2114-cef7-4963-a0ce-c5e41267550e_resources
22/10/14 15:05:27 INFO session.SessionState: Created HDFS directory: /tmp/hive/aw-bdp-dwh/9d7c2114-cef7-4963-a0ce-c5e41267550e
22/10/14 15:05:27 INFO session.SessionState: Created local directory: /tmp/aw-bdp-dwh/9d7c2114-cef7-4963-a0ce-c5e41267550e
22/10/14 15:05:27 INFO session.SessionState: Created HDFS directory: /tmp/hive/aw-bdp-dwh/9d7c2114-cef7-4963-a0ce-c5e41267550e/_tmp_space.db
22/10/14 15:05:28 INFO util.log: Logging initialized @3411ms
22/10/14 15:05:28 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
22/10/14 15:05:28 INFO server.Server: Started @3491ms
22/10/14 15:05:28 INFO server.AbstractConnector: Started ServerConnector@7a0ef219{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6403e24c{/jobs,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@54e3658c{/jobs/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@43e9089{/jobs/job,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@352c44a8{/jobs/job/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7aac8884{/stages,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@a66e580{/stages/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b852b49{/stages/stage,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f5ac102{/stages/stage/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5df778c3{/stages/pool,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@895416d{/stages/pool/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71a06021{/storage,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@80bfdc6{/storage/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6edcad64{/storage/rdd,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4c6007fb{/storage/rdd/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e33d73e{/environment,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6981f8f3{/environment/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@623dcf2a{/executors,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2eae4349{/executors/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e84fb85{/executors/threadDump,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68a4dcc6{/executors/threadDump/json,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@646c0a67{/static,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15400fff{/,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@18d910b3{/api,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@277b8fa4{/jobs/job/kill,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6cd64ee8{/stages/stage/kill,null,AVAILABLE,@Spark}
22/10/14 15:05:28 INFO client.RMProxy: Connecting to ResourceManager at node00.cdh6.test/192.168.8.11:8032
22/10/14 15:05:29 INFO impl.YarnClientImpl: Submitted application application_1665728941521_0007
22/10/14 15:05:32 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() on RPC id 6405719561689998953
java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.CoarseGrainedClusterMessages$RetrieveDelegationTokens$
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
	at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1868)
	at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2042)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:108)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1$$anonfun$apply$1.apply(NettyRpcEnv.scala:271)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:320)
	at org.apache.spark.rpc.netty.NettyRpcEnv$$anonfun$deserialize$1.apply(NettyRpcEnv.scala:270)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
	at org.apache.spark.rpc.netty.NettyRpcEnv.deserialize(NettyRpcEnv.scala:269)
	at org.apache.spark.rpc.netty.RequestMessage$.apply(NettyRpcEnv.scala:611)
	at org.apache.spark.rpc.netty.NettyRpcHandler.internalReceive(NettyRpcEnv.scala:662)
	at org.apache.spark.rpc.netty.NettyRpcHandler.receive(NettyRpcEnv.scala:647)
	at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:181)
	at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:103)
	at org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)
22/10/14 15:05:33 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5e67a490{/metrics/json,null,AVAILABLE,@Spark}
22/10/14 15:05:33 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
22/10/14 15:05:33 INFO server.AbstractConnector: Stopped Spark@7a0ef219{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
22/10/14 15:05:33 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
22/10/14 15:05:33 ERROR spark.SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Exception when registering SparkListener
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2398)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:315)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.cloudera.spark.lineage.NavigatorAppListener
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2682)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)
	... 22 more
Exception in thread "main" org.apache.spark.SparkException: Exception when registering SparkListener
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2398)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:48)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:315)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:166)
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.cloudera.spark.lineage.NavigatorAppListener
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:348)
	at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2682)
	at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
	at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
	at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
	at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
	at scala.Option.foreach(Option.scala:257)
	at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)
	... 22 more

解决方案:
主要修改点在下面标注处,此方法主要使用于local调用方式。

如果集群调用,建议将相关jar包上传hdfs,然后此处写hdfs的路径,
例如:spark.yarn.jars=hdfs://nameservice1/user/spark2/jars/*
上传的jar包:

hdfs  dfs  -put /opt/cloudera/parcels/CDH/lib/spark2/jars/* hdfs://nameservice1/user/spark2/jars/
[root@node00 conf]# cat /opt/cloudera/parcels/CDH/lib/spark2/conf/spark-defaults.conf

spark.authenticate=false
spark.driver.log.dfsDir=/user/spark/driverLogs
spark.driver.log.persistToDfs.enabled=true
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60
spark.dynamicAllocation.minExecutors=0
spark.dynamicAllocation.schedulerBacklogTimeout=1
spark.eventLog.enabled=true
spark.io.encryption.enabled=false
spark.network.crypto.enabled=false
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled=true
spark.shuffle.service.port=7337
spark.ui.enabled=true
spark.ui.killEnabled=true
spark.lineage.log.dir=/var/log/spark/lineage
spark.lineage.enabled=true
spark.master=yarn
spark.submit.deployMode=client
spark.eventLog.dir=hdfs://node00.cdh6.test:8020/user/spark/applicationHistory
spark.yarn.historyServer.address=http://node00.cdh6.test:18088
spark.yarn.jars=local:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark2/jars/*,local:/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/spark2/hive/*
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath=/opt/cloudera/parcels/CDH-6.3.2-1.cdh6.3.2.p0.1605554/lib/hadoop/lib/native
spark.yarn.config.gatewayPath=/opt/cloudera/parcels
spark.yarn.config.replacementPath={{HADOOP_COMMON_HOME}}/…/…/…
spark.yarn.historyServer.allowTracking=true
spark.yarn.appMasterEnv.MKL_NUM_THREADS=1
spark.executorEnv.MKL_NUM_THREADS=1
spark.yarn.appMasterEnv.OPENBLAS_NUM_THREADS=1
spark.executorEnv.OPENBLAS_NUM_THREADS=1
#spark.extraListeners=com.cloudera.spark.lineage.NavigatorAppListener
#spark.sql.queryExecutionListeners=com.cloudera.spark.lineage.NavigatorQueryListener

  • 9
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 13
    评论
### 回答1: 要将作业提交到CDH6.3.2的YARN集群上,需要使用以下命令: ``` spark-submit --master yarn --deploy-mode client --class <main-class> <application-jar> <application-arguments> ``` 其中,`<main-class>`是你的应用程序的主类,`<application-jar>`是你的应用程序的jar包路径,`<application-arguments>`是你的应用程序的参数。 例如,如果你的应用程序的主类是`com.example.MyApp`,jar包路径是`/path/to/myapp.jar`,应用程序需要传递两个参数`arg1`和`arg2`,则提交作业的命令如下: ``` spark-submit --master yarn --deploy-mode client --class com.example.MyApp /path/to/myapp.jar arg1 arg2 ``` 提交作业后,Spark将在YARN集群上启动应用程序,并将日志输出到YARN的应用程序日志中。你可以使用YARN的命令行工具或Web UI来监视应用程序的运行状态和日志输出。 ### 回答2: 在CDH6.3.2框架中,使用spark-submit命令可以将作业提交到YARN资源管理器,实现分布式部署执行作业的功能。 具体步骤如下: 1. 在终端中使用spark-submit命令,指定主类名、执行参数等信息。 例如: ```bash spark-submit \ --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --num-executors 3 \ --driver-memory 4g \ --executor-memory 2g \ --executor-cores 2 \ /path/to/examples.jar 100 ``` --class参数指定执行的主类名,对应的jar文件已经上传至HDFS上。 --master参数指定使用YARN作为资源管理器,--deploy-mode参数指定执行模式为集群模式。 --num-executors参数指定申请的Executor个数。 --driver-memory参数指定Driver进程需要使用的内存大小,同样可以指定Executor进程的内存和核数。 2. 执行以上命令后,YARN资源管理器会为任务分配相应的资源,并启动作业执行。 3. 可以通过YARN界面查看作业的运行状况,包括Container的个数、启动时间、资源使用情况等。 4. 执行完成后,可以在日志文件和任务的输出目录中查看作业的输出结果。 总的来说,通过spark-submit命令提交作业到YARN非常方便,只需指定相应的参数即可实现作业的分布式部署,提高执行效率并节省时间。 ### 回答3: CDH 6.3.2 是包含了 Hadoop、Hive、Spark 等组件的大数据平台。要提交 Spark 作业到 YARN 集群,需要使用 spark-submit 命令。 首先,要确保已经安装了 CDH 6.3.2Spark。然后,在本地编写好 Spark 作业代码,并上传到集群中的一个路径。 接下来,通过以下命令提交 Spark 作业: ``` spark-submit \ --class com.example.YourMainClass \ --master yarn \ --deploy-mode client \ --num-executors 4 \ --executor-memory 4g \ --executor-cores 2 \ /path/to/your/spark/job.jar \ arg1 arg2 ``` 其中,`--class` 参数指定主类,`--master yarn` 表示使用 YARN 集群作为 Spark 的资源管理器,`--deploy-mode client` 表示客户端模式, `--num-executors`、`--executor-memory` 和 `--executor-cores` 分别是设定 Spark 应用程序执行所需的 executor 数量、每个 executor 占用的内存和 CPU 核心数量。`/path/to/your/spark/job.jar` 是你上传的 Spark 作业包的路径,`arg1` 和 `arg2` 是你的应用程序所需要的参数。 提交成功后,Spark 应用程序就会在 YARN 上执行,输出结果会被打印到标准输出中或者存储到指定路径。 需要注意的是,提交的 Spark 作业路径和参数是相对于 YARN 集群上的路径和参数,而不是本地路径和参数。另外,如果采用了集群管理工具 Cloudera Manager 管理 CDH 6.3.2,也可以通过其提供的界面来提交 Spark 作业,更加方便快捷。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 13
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

丰耳

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值