spark sql 上个月_SparkSQL使用之Spark SQL CLI

最新推荐文章于 2024-04-28 17:10:33 发布

保皇

最新推荐文章于 2024-04-28 17:10:33 发布

阅读量179

点赞数

文章标签： spark sql 上个月

本文链接：https://blog.csdn.net/weixin_42377407/article/details/113009764

版权

Spark SQL CLI描述

Spark SQL CLI的引入使得在SparkSQL中通过hive metastore就可以直接对hive进行查询更加方便；当前版本中还不能使用Spark SQL CLI与ThriftServer进行交互。

使用Spark SQL CLI前需要注意：

1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下；

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.27-bin.jar

Spark SQL CLI命令参数介绍：

cd $SPARK_HOME/bin

spark-sql --help

Usage: ./bin/spark-sql [options] [cli option]

Spark assembly has been built with Hive, including Datanucleus jars on classpath

Options:--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.

--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or

on one of the worker machines inside the cluster ("cluster")

(Default: client).--class CLASS_NAME Your application's main class (for Java / Scala apps).

--name NAME A name of your application.--jars JARS Comma-separated list of local jars to include on the driver

and executor classpaths.--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place

on the PYTHONPATHforPython apps.--files FILES Comma-separated list of files to be placed inthe working

directory of each executor.--conf PROP=VALUE Arbitrary Spark configuration property.--properties-file FILE Path to a file from whichto load extra properties. If not

specified, this will lookfor conf/spark-defaults.conf.--driver-memory MEM Memory fordriver (e.g. 1000M, 2G) (Default: 512M).--driver-java-options Extra Java options to pass to the driver.--driver-library-path Extra library path entries to pass to the driver.--driver-class-path Extra class path entries to pass to the driver. Note that

jars added with--jars are automatically included inthe

classpath.--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).--help, -h Show this help message and exit--verbose, -v Print additional debug output

Spark standalone with cluster deploy mode only:--driver-cores NUM Cores for driver (Default: 1).--supervise If given, restarts the driver on failure.

Spark standalone and Mesos only:--total-executor-cores NUM Total cores forall executors.

YARN-only:--executor-cores NUM Number of cores per executor (Default: 1).--queue QUEUE_NAME The YARN queue to submit to (Default: "default").--num-executors NUM Number of executors to launch (Default: 2).--archives ARCHIVES Comma separated list of archives to be extracted into the

working directory of each executor.

CLI options:-d,--define Variable subsitution to apply to hive

commands. e.g.-d A=B or --define A=B--database Specify the database to use-e SQL from command line-f SQL from files-h connecting to Hive Server on remote host--hiveconf Use value forgiven property--hivevar Variable subsitution to apply to hive

commands. e.g.--hivevar A=B-i Initialization SQL file

-p connecting to Hive Server on port number-S,--silent Silent mode ininteractive shell-v,--verbose Verbose mode (echoexecuted SQL to the console)

在启动spark-sql时，如果不指定master，则以local的方式运行，master既可以指定standalone的地址，也可以指定yarn；

当设定master为yarn时(spark-sql --master yarn)时，可以通过http://hadoop000:8088页面监控到整个job的执行过程；

注：如果在$SPARK_HOME/conf/spark-defaults.conf中配置了spark.master spark://hadoop000:7077，那么在启动spark-sql时不指定master也是运行在standalone集群之上。

spark-sql使用

启动spark-sql：由于我已经在spark-defaults.conf中配置了spark.master spark://hadoop000:7077，就没在spark-sql启动时指定master了

cd $SPARK_HOME/bin

spark-sql

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;

SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;

上面两个sql语句用到的表现在存在hive中了，如果没有则手工创建下，创建脚本以及导入数据脚本如下：

create tablepage_views(

track_time string,

url string,

session_id string,

referer string,

ip string,

end_user_id string,

city_id string

)

ROW FORMAT DELIMITED FIELDS TERMINATEDBY '\t';

load data local inpath '/home/spark/software/data/page_views.dat' overwrite into table page_views;

保皇

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
spark sql 上个月_SparkSQL使用之Spark SQL CLI

Spark SQL CLI描述Spark SQL CLI的引入使得在SparkSQL中通过hive metastore就可以直接对hive进行查询更加方便；当前版本中还不能使用Spark SQL CLI与ThriftServer进行交互。使用Spark SQL CLI前需要注意：1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下；2、需要在$SPARK_HOME/...
复制链接

扫一扫

spark sql 上个月_SparkSQL使用之Spark SQL CLI

“相关推荐”对你有帮助么？