spark sql 上个月_SparkSQL使用之Spark SQL CLI

Spark SQL CLI描述

Spark SQL CLI的引入使得在SparkSQL中通过hive metastore就可以直接对hive进行查询更加方便;当前版本中还不能使用Spark SQL CLI与ThriftServer进行交互。

使用Spark SQL CLI前需要注意:

1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下;

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.27-bin.jar

Spark SQL CLI命令参数介绍:

cd $SPARK_HOME/bin

spark-sql --help

Usage: ./bin/spark-sql [options] [cli option]

Spark assembly has been built with Hive, including Datanucleus jars on classpath

Options:--master MASTER_URL spark://host:port, mesos://host:port, yarn, or local.

--deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or

on one of the worker machines inside the cluster ("cluster")

(Default: client).--class CLASS_NAME Your application's main class (for Java / Scala apps).

--name NAME A name of your application.--jars JARS Comma-separated list of local jars to include on the driver

and executor classpaths.--py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place

on the PYTHONPATHforPython apps.--files FILES Comma-separated list of files to be placed inthe working

directory of each executor.--conf PROP=VALUE Arbitrary Spark configuration property.--properties-file FILE Path to a file from whichto load extra properties. If not

specified, this will lookfor conf/spark-defaults.conf.--driver-memory MEM Memory fordriver (e.g. 1000M, 2G) (Default: 512M).--driver-java-options Extra Java options to pass to the driver.--driver-library-path Extra library path entries to pass to the driver.--driver-class-path Extra class path entries to pass to the driver. Note that

jars added with--jars are automatically included inthe

classpath.--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G).--help, -h Show this help message and exit--verbose, -v Print additional debug output

Spark standalone with cluster deploy mode only:--driver-cores NUM Cores for driver (Default: 1).--supervise If given, restarts the driver on failure.

Spark standalone and Mesos only:--total-executor-cores NUM Total cores forall executors.

YARN-only:--executor-cores NUM Number of cores per executor (Default: 1).--queue QUEUE_NAME The YARN queue to submit to (Default: "default").--num-executors NUM Number of executors to launch (Default: 2).--archives ARCHIVES Comma separated list of archives to be extracted into the

working directory of each executor.

CLI options:-d,--define Variable subsitution to apply to hive

commands. e.g.-d A=B or --define A=B--database Specify the database to use-e SQL from command line-f SQL from files-h connecting to Hive Server on remote host--hiveconf Use value forgiven property--hivevar Variable subsitution to apply to hive

commands. e.g.--hivevar A=B-i Initialization SQL file

-p connecting to Hive Server on port number-S,--silent Silent mode ininteractive shell-v,--verbose Verbose mode (echoexecuted SQL to the console)

在启动spark-sql时,如果不指定master,则以local的方式运行,master既可以指定standalone的地址,也可以指定yarn;

当设定master为yarn时(spark-sql --master yarn)时,可以通过http://hadoop000:8088页面监控到整个job的执行过程;

注:如果在$SPARK_HOME/conf/spark-defaults.conf中配置了spark.master spark://hadoop000:7077,那么在启动spark-sql时不指定master也是运行在standalone集群之上。

spark-sql使用

启动spark-sql: 由于我已经在spark-defaults.conf中配置了spark.master spark://hadoop000:7077,就没在spark-sql启动时指定master了

cd $SPARK_HOME/bin

spark-sql

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;

SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;

上面两个sql语句用到的表现在存在hive中了,如果没有则手工创建下,创建脚本以及导入数据脚本如下:

create tablepage_views(

track_time string,

url string,

session_id string,

referer string,

ip string,

end_user_id string,

city_id string

)

ROW FORMAT DELIMITED FIELDS TERMINATEDBY '\t';

load data local inpath '/home/spark/software/data/page_views.dat' overwrite into table page_views;

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值