Hive的简介
hive是基于Hadoop的一个数据仓库工具,用来进行数据提取、转化、加载,这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表,并提供SQL查询功能,能将SQL语句转变成MapReduce任务来执行。Hive的优点是学习成本低,可以通过类似SQL语句实现快速MapReduce统计,使MapReduce变得更加简单,而不必开发专门的MapReduce应用程序。hive是十分适合数据仓库的统计分析和Windows注册表文件。
Hive的使用场景
hive 构建在基于静态批处理的Hadoop 之上,Hadoop 通常都有较高的延迟并且在作业提交和调度的时候需要大量的开销。因此,hive 并不能够在大规模数据集上实现低延迟快速的查询,因此,hive 并不适合那些需要高实性的应用,例如,联机事务处理(OLTP)。hive 查询操作过程严格遵守Hadoop MapReduce 的作业执行模型,hive 将用户的hiveQL 语句通过解释器转换为MapReduce 作业提交到Hadoop 集群上,Hadoop 监控作业执行过程,然后返回作业执行结果给用户。hive 并非为联机事务处理而设计,hive 并不提供实时的查询和基于行级的数据更新操作。hive 的最佳使用场合是大数据集的批处理作业,例如,网络日志分析。
Hive的安装
- Hive安装前需要事先安装mysql
MySQL的安装教程在安装完成后需要给root用户赋权
mysql> use mysql
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> grant all on *.* to root@'hadoop110';
Query OK, 0 rows affected (0.00 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
mysql>
- Hive的安装包提取密码:8vll
- 将hive安装包解压安装在/opt目录下
- 配置环境变量: vi /etc/profile ,添加以下内容
export HIVE_PATH=/opt/hive
export PATH=$PATH:$HIVE_PATH/bin
- 输入:
source /etc/profile
,使其配置文件生效
Hive的配置
- 进入cd /opt/hive/conf/目录下,新建hive-site.xml文件,并输入
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>hdfs://hadoop110:9000/hive/warehouse</value>
<description>fs.default.name</description>
<!-- 管理表存储的位置,可以是linux中的目录,也可以是相对于fs.default.name有关的目录 -->
</property>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<!-- 指定hive元数据存储的MySQL地址 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<!-- 元数据存储数据库的驱动 -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!-- 元数据存储数据库的用户名 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- 元数据存储数据库的密码,(注:这里是mysql自己root用户的密码) -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>ok</value>
</property>
</configuration>
- 新建hive-env.sh文件并输入
export HADOOP_HOME=/opt/hadoop
export HIVE_HOME=/opt/hive
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HIVE_AUX_JARS_PATH=/opt/hive/lib
export JAVA_HOME=/opt/jdk
export HIVE_CONF_DIR=/opt/hive/conf
- 将hive-log4j.properties.template文件改名成hive-log4j.properties并进行修改.在第20行
hive.log.dir=/opt/hive/logs
- 创建配置文件中需要的文件夹
mkdir /opt/hive/warehouse
mkdir /opt/hive/logs
- 将mysql-connector-java-5.1.48-bin.jar文件放入/opt/hive/lib/目录下并给hive文件赋权
chmod 777 /opt/hive
Hive的启动和验证
- 初始化mysql数据库: schematool -initSchema -dbType mysql
找不到命令可以去hive的bin目录下
[root@localhost lib]# schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-06-28 21:20:06,615 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/28 21:20:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/28 21:20:07 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
Metastore connection URL: jdbc:mysql://127.0.0.1:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 1.1.0-cdh5.14.2
Initialization script hive-schema-1.1.0.mysql.sql
Initialization script completed
schemaTool completed
- 输入hive
[root@localhost lib]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hbase/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2020-06-28 21:20:16,938 WARN [main] util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/28 21:20:17 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/28 21:20:18 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
Logging initialized using configuration in file:/opt/hive/conf/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show tables;
OK
Time taken: 2.679 seconds
hive>
- beeline
给 hdfs上的tmp和/hive/warehouse赋权
hiveserver2
//在bin目录下
./beeline -u jdbc:hive2://localhost:10000