hive 是基于Hadoop的数据仓库,所以在安装hive之前需要先安装配置好hadoop环境,同时需要开启hadoop的hdfs模块和yarn模块(yarn模块用于资源调度)。本文需要使用MySQL存放hive的元数据,所以请先在本机上安装配置MySQL
说在前面
- 工作环境:VMware® Workstation 12 Pro 12.5.6 build-5528349
- linux版本:CentOS-7-x86_64-Minimal-1611.iso
- JDK版本:jdk-8u65-linux-x64.tar.gz
- Hadoop版本:hadoop-2.7.6.tar.gz
- MySQL版本:mysql-5.6.41-linux-glibc2.12-x86_64.tar.gz
- Hive版本:apache-hive-2.3.3-bin.tar.gz
一. 安装hive
-
下载hive:https://mirrors.tuna.tsinghua.edu.cn/apache/hive/stable-2/
-
将 apache-hive-2.3.3-bin.tar.gz 解压到 /soft 目录下,并创建符号链接
//解压 apache-hive-2.3.3-bin.tar.gz
$> tar -xzvf apache-hive-2.3.3-bin.tar.gz -C /soft
//为解压出的文件 apache-hive-2.3.3-bin 创建符号链接
$> ln -s apache-hive-2.3.3-bin hive
- 在 /etc/profile 文件下配置环境变量
$> sudo nano /etc/profile
//导入配置
export HIVE_HOME=/soft/hive
export PATH=$PATH:$HIVE_HOME/bin
- 验证hive是否配置成功:出现版本号
$> hive --version
二. 配置hive
使用mysql存放hive的元数据
-
将mysql的驱动程序复制到 /soft/hive/lib 目录下
-
将 /soft/hive/conf/hive-default.xml.template 文件复制为 hive-site.xml,并进行如下配置:
$> cp /soft/hive/conf/hive-default.xml.template /soft/hive/conf/hive-site.xml
//配置如下属性
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/home/centosmin0/hive</value>
<description>Local scratch space for Hive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/home/centosmin0/hive/downloads</value>
<description>Temporary local directory for added resources in the remote file system.</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/home/centosmin0/hive/querylog</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/home/centosmin0/hive/server2_logs</value>
<description>Top level directory where operation logs are stored if logging functionality is enabled</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>false</value>
<description>
Setting this property to true will have HiveServer2 execute
Hive operations as the user making the calls to it.
</description>
</property>
- 在mysql中创建存放hive元数据的数据库 hive
mysql> create database hive;
- 初始化hive的元数据到mysql中
$> cd /soft/hive/bin
$> schematool -dbType mysql -initSchema
注意:在这个步骤很有可能初始化失败,报:权限异常
解决办法:为 127.0.0.1 添加权限:grant all privileges on . to ‘root’@‘127.0.0.1’ identified by ‘root’;
元数据初始化成功
- 通过hdfs命令查看:默认会有 /user/hive/warehouse 目录
三. hive的基本命令行操作
hive的命令和mysql基本相同
$hive>hive --version //版本
$hive>hive --help //帮助
$hive>create database mydb2 ; //创建数据库 mydb2
$hive>show databases ;
$hive>use mydb2 ;
$hive>create table mydb2.t(id int,name string,age int);
$hive>drop table t ;
$hive>drop table mydb2.t ;
$hive>select * from mydb2.t ; //查看指定库的表
$hive>exit ; //退出
$>hive //hive --service cli
$>hive //hive --service cli