Hive安装及与Spark的集成配置

最新推荐文章于 2022-12-01 21:42:24 发布

b10l07

最新推荐文章于 2022-12-01 21:42:24 发布

阅读量197

点赞数

文章标签：大数据 python 开发工具

原文链接：https://my.oschina.net/u/729917/blog/1575860

版权

2019独角兽企业重金招聘Python工程师标准>>>

本博客主要介绍SparkSql基于Hive作为元数据的基本操作，包括以下内容：

1、Hive安装

2、Spark与Hive的集成

3、SparkSql的操作

注：在操作本博客的内容时，需要安装Hadoop和Spark。

其中hadoop安装可参考：https://my.oschina.net/u/729917/blog/1556872

spark安装可参考：https://my.oschina.net/u/729917/blog/1556871

1、Hive安装

a)、安装Mysql数据库，此步骤自行百度。

b)、官网下载Hive:http://mirror.bit.edu.cn/apache/hive/，作者下载后放在了目录/home/hadoop/tools/apache-hive-2.2.0-bin.tar.gz下。

c)、移动到指定目录并解压：作者是解压到/usr/local/目录下，并且作者的Hadoop和Spark均是安装在此目录下。

sudo mv /home/hadoop/tools/apache-hive-2.2.0-bin.tar.gz /usr/local/apache-hive-2.2.0-bin/

sudo tar -zxvf apache-hive-2.2.0-bin.tar.gz

d)、配置环境变量

vim ~/.bashrc

export HIVE_HOME=/usr/local/apache-hive-2.2.0-bin
export PATH=$PATH:${HIVE_HOME}/bin

环境变量生效

 source ~/.bashrc

e)、在conf目录下新建一个hive-site.xml，配置hive信息，使用mysql保存hive元数据信息

hadoop@Master:/usr/local/apache-hive-2.2.0-bin/conf$ touch hive-site.xml

下面是hive-site.xml的信息

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
   <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>0000</value>
    </property>
    <property>
   <name>hive.metastore.schema.verification</name>
   <value>false</value>
    <description>
    Enforce metastore schema version consistency.
    True: Verify that version information stored in metastore matches with one from Hive jars.  Also disable automatic
          schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures
          proper metastore schema migration. (Default)
    False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
    </description>
 </property>
</configuration>

f)、启动hive:输入hive命令即可

启动成功显示：

hadoop@Master:~$ hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-2.2.0-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/apache-hive-2.2.0-bin/lib/hive-common-2.2.0.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

2、Spark集成Hive

a)、在spark的conf目录下新建hive-site.xml文件

hadoop@Master:/usr/local/spark-2.2.0-bin-hadoop2.7/conf$ touch hive-site.xml

<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://Master:9083</value>
</property>
</configuration>

b)、启动hadoop和spark

c)、启动hive service metastore服务

hadoop@Master:/usr/local/spark-2.2.0-bin-hadoop2.7/bin$ hive --service metastore&

d)、启动spark-sql进行测试

hadoop@Master:/usr/local/spark-2.2.0-bin-hadoop2.7/bin$ ./spark-sql

启动成功后部分截屏

17/11/19 21:50:37 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/ce95f463-74ca-42de-ac85-3a283aa1520a
17/11/19 21:50:37 INFO SessionState: Created local directory: /tmp/hadoop/ce95f463-74ca-42de-ac85-3a283aa1520a
17/11/19 21:50:37 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/ce95f463-74ca-42de-ac85-3a283aa1520a/_tmp_space.db
17/11/19 21:50:37 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/usr/local/spark-2.2.0-bin-hadoop2.7/bin/spark-warehouse
17/11/19 21:50:37 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
17/11/19 21:50:38 INFO SessionState: Created local directory: /tmp/2110b645-b83e-4b65-87a8-5e9f1482699e_resources
17/11/19 21:50:38 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/2110b645-b83e-4b65-87a8-5e9f1482699e
17/11/19 21:50:38 INFO SessionState: Created local directory: /tmp/hadoop/2110b645-b83e-4b65-87a8-5e9f1482699e
17/11/19 21:50:38 INFO SessionState: Created HDFS directory: /tmp/hive/hadoop/2110b645-b83e-4b65-87a8-5e9f1482699e/_tmp_space.db
17/11/19 21:50:38 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1) is file:/usr/local/spark-2.2.0-bin-hadoop2.7/bin/spark-warehouse
spark-sql>

转载于:https://my.oschina.net/u/729917/blog/1575860