SparkSQL(1):SparkSQL和Hive集成

68 篇文章 0 订阅
16 篇文章 1 订阅

一、SparkSQL功能

SparkSQL需要的是hive表的元数据,如果配置了metastore的服务,直接去服务里面拿,如果没有,那就只能去mysql中间拿

二、配置和hive集成

1. 将hive的hive-site.xml文件复制或者软连接到spark的conf文件夹中

cd /opt/modules/spark-2.1.0-bin-2.7.3/conf/

ln -s /opt/modules/hive-1.2.1/conf/hive-site.xml

2.hive的配置文件hive-site.xml

<configuration>
			<!--让hive的sql语句不提交到集群上,在本地执行 -->
		<property>
		<name>hive.exec.mode.local.auto</name>
		<value>true</value>
		<description> Let Hive determine whether to run in local mode automatically </description>
		</property>
		<property>
		    <name>javax.jdo.option.ConnectionURL</name>
		    <value>jdbc:mysql://bigdata.ibeifeng.com:3306/metastore?createDatabaseIfNotExist=true</value>
		  </property>
		  <property>
		    <name>javax.jdo.option.ConnectionDriverName</name>
		    <value>com.mysql.jdbc.Driver</value>
		  </property>
		  <property>
		    <name>javax.jdo.option.ConnectionUserName</name>
		    <value>root</value>
		  </property>
		  <property>
		    <name>javax.jdo.option.ConnectionPassword</name>
		    <value>123456</value>
		  </property>
		  <property>
				<name>hive.cli.print.current.db</name>
				<value>true</value>
				<description>Whether to include the current database in the Hive prompt.</description>
		</property>
		 <property>
				<name>hive.cli.print.header</name>
				<value>true</value>
				<description>Whether to print the names of the columns in query output.</description>
		</property>
		<property>
			  <name>hive.metastore.uris</name>
			  <value>thrift://bigdata.ibeifeng.com:9083</value>
			  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
		</property>

		<property>
		  <name>hive.exec.max.dynamic.partitions.pernode</name>
		  <value>100</value>
		  <description>Maximum number of dynamic partitions allowed to be created in each mapper/reducer node.</description>
		</property>
		
		<property>
		  <name>hive.exec.max.dynamic.partitions</name>
		  <value>1000</value>
		  <description>Maximum number of dynamic partitions allowed to be created in total.</description>
		</property>

		<property>
		  <name>hive.exec.dynamic.partition</name>
		  <value>true</value>
		  <description>Whether or not to allow dynamic partitions in DML/DDL.</description>
		</property>

		<property>
		  <name>hive.exec.dynamic.partition.mode</name>
		  <value>nonstrict</value>
		  <description>In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions.</description>
		</property>
		
		
		<property>
		  <name>hive.support.sql11.reserved.keywords</name>
		  <value>false</value>
		</property>
		
		<property>
			<name>hbase.zookeeper.quorum</name>
			<value>bigdata.ibeifeng.com</value>
		</property>
		<property>
			<name>hive.server2.thrift.bind.host</name>
			<value>bigdata.ibeifeng.com</value>
		</property>
</configuration>

3.根据hive的配置文件的hive.metastore.uris参数的配置值选择不同的操作方式

(1) 如果没有给定参数(默认情况)

将hive元数据数据库的驱动包(就是mysql)添加到spark的classpath环境变量中即可完成spark和hive的集成

(2) 给定具体的metastore服务所在的节点信息(值非空,一般情况下用这种比较多)

    (a)启动hive的metastore服务(目录下:/opt/modules/hive-1.2.1)

bin/hive --service metastore &

    (b)完成spark和hive的集成

三、测试SparkSQL和hive的集成

1.Hive创建表,插入测试数据

【员工表】
create table emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
row format delimited fields terminated by '\t';


load data local inpath '/opt/datas/emp.txt' into table emp;
加载数据:overwrite

load data local inpath '/opt/datas/emp.txt'  overwrite into table emp;


【部门表】
create table dept(
deptno int,
dname string,
loc string
)
row format delimited fields terminated by '\t';

load data local inpath '/opt/datas/dept.txt' into table dept;

2.测试两种方式

(1)spark-sql测试

  -》开启

 ./bin/spark-sql

  -》运行语句

spark-sql (default)> select * from default.emp;
spark-sql (default)> select * from default.emp a join default.dept b on a.deptno = b.deptno; 
spark-sql (default)> explain select * from default.emp a join default.dept b on a.deptno = b.deptno;

(2)spark-shell测试 

  -》开启

./bin/spark-shell

  -》运行语句

scala>spark.sqlContext
scala>spark.sqlContext.sql("select * from default.emp a join default.dept b on a.deptno = b.deptno").show()

结果:
+-----+------+---------+----+----------+------+------+------+------+----------+--------+
|empno| ename|      job| mgr|  hiredate|   sal|  comm|deptno|deptno|     dname|     loc|
+-----+------+---------+----+----------+------+------+------+------+----------+--------+
| 7369| SMITH|    CLERK|7902|1980-12-17| 800.0|  null|    20|    20|  RESEARCH|  DALLAS|
| 7499| ALLEN| SALESMAN|7698| 1981-2-20|1600.0| 300.0|    30|    30|     SALES| CHICAGO|
| 7521|  WARD| SALESMAN|7698| 1981-2-22|1250.0| 500.0|    30|    30|     SALES| CHICAGO|
| 7566| JONES|  MANAGER|7839|  1981-4-2|2975.0|  null|    20|    20|  RESEARCH|  DALLAS|
| 7654|MARTIN| SALESMAN|7698| 1981-9-28|1250.0|1400.0|    30|    30|     SALES| CHICAGO|
| 7698| BLAKE|  MANAGER|7839|  1981-5-1|2850.0|  null|    30|    30|     SALES| CHICAGO|
| 7782| CLARK|  MANAGER|7839|  1981-6-9|2450.0|  null|    10|    10|ACCOUNTING|NEW YORK|
| 7788| SCOTT|  ANALYST|7566| 1987-4-19|3000.0|  null|    20|    20|  RESEARCH|  DALLAS|
| 7839|  KING|PRESIDENT|null|1981-11-17|5000.0|  null|    10|    10|ACCOUNTING|NEW YORK|
| 7844|TURNER| SALESMAN|7698|  1981-9-8|1500.0|   0.0|    30|    30|     SALES| CHICAGO|
| 7876| ADAMS|    CLERK|7788| 1987-5-23|1100.0|  null|    20|    20|  RESEARCH|  DALLAS|
| 7900| JAMES|    CLERK|7698| 1981-12-3| 950.0|  null|    30|    30|     SALES| CHICAGO|
| 7902|  FORD|  ANALYST|7566| 1981-12-3|3000.0|  null|    20|    20|  RESEARCH|  DALLAS|
| 7934|MILLER|    CLERK|7782| 1982-1-23|1300.0|  null|    10|    10|ACCOUNTING|NEW YORK|
+-----+------+---------+----+----------+------+------+------+------+----------+--------+

四、代码实现

1.参考blog:https://blog.csdn.net/u010886217/article/details/82916520,注意外网访问服务器,需要配置datanode使用hostname的属性,否则无法访问

 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值