CDH5.7.0默认情况下是没有提供spark-sql cli的,我曾尝试用tarball以standalone模式部署过spark集群,都是自带spark-sql命令,于是就想到Copy Tarball里的spark-sql文件到$SPARK_HOME/bin目录下
cp ./bin/spark-sql /opt/cloudera/parcels/CDH/lib/spark/bin/
执行./spark-sql,可悲的是报错
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
java
.
lang
.
ClassNotFoundException
:
org
.
apache
.
spark
.
sql
.
hive
.
thriftserver
.
SparkSQLCLIDriver
at
java
.
net
.
URLClassLoader
$
1.run
(
URLClassLoader
.
java
:
366
)
at
java
.
net
.
URLClassLoader
$
1.run
(
URLClassLoader
.
java
:
355
)
at
java
.
security
.
AccessController
.
doPrivileged
(
Native
Method
)
at
java
.
net
.
URLClassLoader
.
findClass
(
URLClassLoader
.
java
:
354
)
at
java
.
lang
.
ClassLoader
.
loadClass
(
ClassLoader
.
java
:
425
)
at
java
.
lang
.
ClassLoader
.
loadClass
(
ClassLoader
.
java
:
358
)
at
java
.
lang
.
Class
.
forName0
(
Native
Method
)
at
java
.
lang
.
Class
.
forName
(
Class
.
java
:
270
)
at
org
.
apache
.
spark
.
util
.
Utils
$
.
classForName
(
Utils
.
scala
:
175
)
at
org
.
apache
.
spark
.
deploy
.
SparkSubmit
$
.
org
$
apache
$
spark
$
deploy
$
SparkSubmit
$
$
runMain
(
SparkSubmit
.
scala
:
689
)
at
org
.
apache
.
spark
.
deploy
.
SparkSubmit
$
.
doRunMain
$
1
(
SparkSubmit
.
scala
:
181
)
at
org
.
apache
.
spark
.
deploy
.
SparkSubmit
$
.
submit
(
SparkSubmit
.
scala
:
206
)
at
org
.
apache
.
spark
.
deploy
.
SparkSubmit
$
.
main
(
SparkSubmit
.
scala
:
121
)
at
org
.
apache
.
spark
.
deploy
.
SparkSubmit
.
main
(
SparkSubmit
.
scala
)
Failed
to
load
main
class
org
.
apache
.
spark
.
sql
.
hive
.
thriftserver
.
SparkSQLCLIDriver
.
You
need
to
build
Spark
with
-
Phive
and
-
Phive
-
thriftserver
.
|
异常提示已经很明显,在网上查了很多资料,千篇一律是要加入hive&hivethriftserver重新编译源码才能支持,对我等不是搞基础架构只要能熟练用好spark的人来说,还没编译源码的习惯,自动就过滤该方法了。
既然是org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver 类找不到,那就Copy Tarball到底,将与之相关的jar也同样拷贝过来,其实就是spark-assembly-1.6.1-hadoop2.6.0.jar(我下载的是官网spark1.6.1 on hadoop2.6),接着修改spark-sql脚本,将依赖的jar加入进来
export _SPARK_CMD_USAGE=”Usage: ./bin/spark-sql [options] [cli option]”
exec “${SPARK_HOME}”/bin/spark-submit –jars /opt/cloudera/parcels/CDH/lib/spark/spark-assembly-1.6.1-hadoop2.6.0.jar –class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver “$@”
标红的–jars部分是额外增加的
再执行./spark-sql, 命令行启动起来了,紧接着执行了show tables; select * from table* ,都没任何问题,大功告成!
严重说明:本人装的是CDH5.7.0集成的是SPARK1.6.0,Tarball是下载的SPARK1.6.1 on hadoop2.6,其他旧的版本没有测试过。