Spark 配置连接hive 元数据库
第一步 安装Hive和Spark(略)
第二步 配置 Metastore 到 MySql
原因是, Metastore 默认存储在自带的 derby 数据库中,推荐使用 MySQL 存储 Metastore;
2.1 驱动拷贝
上传并解压 mysql-connector-java-5.1.27.tar.gz 驱动包 到 hive/lib/ 目录下
[root@hadoop102 mysql-libs]# tar -zxvf mysql-connector-java-5.1.27.tar.gz
[root@hadoop102 mysql-connector-java-5.1.27]# cp mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/
2.2 配置 Metastore 到 MySql
1.在/opt/module/hive/conf 目录下创建一个 hive-site.xml
[atguigu@hadoop102 conf]$ touch hive-site.xml
[atguigu@hadoop102 conf]$ vi hive-site.xml
2.根据官方文档配置参数,拷贝数据到 hive-site.xml 文件中
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>000000</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
</configuration>
3. 配置完毕后,如果启动 hive 异常,可以重新启动虚拟机。(重启后,别忘了启动 hadoop 集群)
[root@hadoop102 hive]# ./bin/hive
ls: 无法访问/opt/module/spark/lib/spark-assembly-*.jar: 没有那个文件或目录
Logging initialized using configuration in file:/opt/module/hive/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
hive (default)> show databases;
OK
database_name
default
ecommerce
hive
hivewindow
sparksql
Time taken: 0.707 seconds, Fetched: 5 row(s)
hive (default)>
第三步 Spark 配置连接hive 元数据库
Thrift服务器实现远程连接。
3.1 拷贝hive的hive-site.xml文件到spark的conf目录下
3.2 修改spark中hive-site.xml文件
添加以下:
<property>
<name>hive.metastore.uris</name>
<value>thrift://hadoop102:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
3.3 另建窗口启动:
[root@hadoop102 hive]# bin/hive --service metastore &
[1] 7288
[root@hadoop102 hive]# ls: 无法访问/opt/module/spark/lib/spark-assembly-*.jar: 没有那个文件或目录
Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/module/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/module/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
3.4 启动spark:
[root@hadoop102 spark]# ./bin/spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/04/07 14:19:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://192.168.174.102:4040
Spark context available as 'sc' (master = local[*], app id = local-1617776367025).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.1
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_144)
Type in expressions to have them evaluated.
Type :help for more information.
3.5 测试:
scala> spark.sql("show databases").show
+------------+
|databaseName|
+------------+
| default|
| ecommerce|
| hive|
+------------+
这样就OK了!