Hive的常用命令使用及Hive API的案例教学

墨绳

已于 2022-08-19 11:09:58 修改

阅读量1.5k

点赞数 1

分类专栏： hive 实战教学文章标签： hive hadoop big data

于 2021-10-28 17:45:52 首次发布

本文链接：https://blog.csdn.net/weixin_43206536/article/details/121008542

版权

hive 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

实战教学

1 篇文章 0 订阅

订阅专栏

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档

文章目录

前言
一、HIVE是什么？
二、Hive的安装配置
总结

前言

提示：这里可以添加本文要记录的大概内容：
例如：随着人工智能的不断发展，机器学习这门技术也越来越重要，很多人都开启了学习机器学习，本文就介绍了机器学习的基础内容。

提示：以下是本篇文章正文内容，下面案例可供参考

一、HIVE是什么？

示例：本文将介绍：如何安装配置hive，hive和hadoop的关系

二、Hive的安装配置

1.版本配置

磨刀不费砍材工，统一协调需要相适应的版本，这里我用的是：

apache-tomcat-8.5.66.tar.gz
apache-zookeeper-3.5.9-bin.tar.gz
hadoop-2.7.5.tar.gz
hbase-1.7.0-bin.tar.gz
apache-hive-2.3.0-bin.tar.gz
mysql57-community-release-el7-11.noarch.rpm

至于为什么需要这么搭配，可以通过下载hive的src源文件打开看看，来知道如何搭配版本，如下图所示：
在这里插入图片描述
可以看出hive2.3.0用的zookeeper版本为3.4.6，再看hadoop和hbase，如下图：

可以看到hadoop用的是2.7.2，hbase用的是1.1.1，版本上比较接近，这里不推荐使用最近的，推荐使用maven上使用最受欢迎的依赖搭配，比较可靠。

2.hive搭建配置

2.1配置hive的环境变量

本文的hive解压在如下目录：

[root@hadoop-2 apache-hive-2.3.0-bin]# pwd
/export/server/apache-hive-2.3.0-bin

然后配置hive的环境变量：

[root@hadoop-2 apache-hive-2.3.0-bin]# vi /etc/profile


#配置hive的环境变量
export HIVE_HOME=/export/server/apache-hive-2.3.0-bin
export PATH=$PATH:$HIVE_HOME/bin

#保存退出后
[root@hadoop-2 apache-hive-2.3.0-bin]# source /etc/profile

2.2配置hive的配置文件

打开hive的hive-env.sh文件，这里我只修改了如下：
#Set HADOOP_HOME to point to a specific hadoop install directory
#HADOOP_HOME=${bin}/…/…/hadoop
export HADOOP_HOME=/export/server/hadoop
export JAVA_HOME=/export/server/jdk1.8
export HIVE_HOME=/export/server/apache-hive-2.3.0-bin
#Hive Configuration Directory can be controlled by:
#export HIVE_CONF_DIR=
export HIVE_CONF_DIR=/export/server/apache-hive-2.3.0-bin/conf

以下是详细配置文件：

#进入conf路径
[root@hadoop-2 conf]# pwd
/export/server/apache-hive-2.3.0-bin/conf

[root@hadoop-2 conf]# vi hive-env.sh

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI etc.) is available via the environment
# variable SERVICE


# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be
# appropriate for hive server.


# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
export HADOOP_HOME=/export/server/hadoop
export JAVA_HOME=/export/server/jdk1.8
export HIVE_HOME=/export/server/apache-hive-2.3.0-bin

# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=
export HIVE_CONF_DIR=/export/server/apache-hive-2.3.0-bin/conf


# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=

打开hive的hive-site.xml文件：

#打开hive-site.xml
[root@hadoop-2 conf]# vi hive-site.xml 

<configuration>
  <property>
    <name>hive.exec.scratchdir</name>
    <value>/user/hive/tmp</value>
  </property>
  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>/user/hive/log</value>
  </property>
  <property>
    <!--使用本地服务连接hive-->
    <name>hive.metastore.local</name>
    <value>true</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hadoop-1:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=latin1&amp;useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>
  <property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>
  <property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
  </property>
  <property>
    <name>hive.server2.transport.mode</name>
    <value>binary</value>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>localhost</value>
  </property>
  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
  </property>
  <property>
    <name>hive.server2.webui.host</name>
    <value>127.0.0.1</value>
  </property>
  <property>
    <name>hive.server2.webui.port</name>
    <value>10002</value>
  </property>
</configuration>

其中

<property>
    <name>hive.metastore.schema.verification</name>
    <value>false</value>
  </property>
  <property>
    <name>datanucleus.schema.autoCreateAll</name>
    <value>true</value>
  </property>

是初始化时防止metastore元数据报错。

其中

<property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/user/hive/warehouse</value>
  </property>

是hadoop的hdfs文件系统存储hive数据库的位置，如图所示：
在这里插入图片描述
name是table名

其中

  <property>
    <!--使用本地服务连接hive-->
    <name>hive.metastore.local</name>
    <value>true</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://hadoop-1:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=latin1&amp;useSSL=false</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>

从上到下分别是：1、mysql是否和hive的安装位置在一个服务器上（这就是本地的意思），不是就写false，我这里是本地的，所以是true。
2、连接mysql的路径，首先到mysql中创建一个hive，hive的编码格式一定要是latin1，不能是utf-8，外国人开发的不识别，使用hive的时候会导致编码报错，哭晕在厕所。
3、连接mysql的驱动
4、账号
5、密码

其中

</property>
  <property>
    <name>hive.server2.transport.mode</name>
    <value>binary</value>
  </property>
  <property>
    <name>hive.server2.thrift.bind.host</name>
    <value>localhost</value>
  </property>
  <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
  </property>
  <property>
    <name>hive.server2.webui.host</name>
    <value>127.0.0.1</value>
  </property>
  <property>
    <name>hive.server2.webui.port</name>
    <value>10002</value>
  </property>

从上到下分别是：1、hive服务通信方式：分别为http和binary二进制格式，我选的后者
2、hive服务开启的Ip路径，默认是localhost
3、hive服务开启的端口，默认是10000
4-5、开启hive服务网站访问Ip和端口
注：这里的hive服务☞的是hiveserver2是否开启

2.3配置hadoop影响hive部分的环境变量

打开core-site.xml文件，添加配置：

 <property>
    <!--设置java连接其它服务的权限-->
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
  </property>
  <property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
  </property>

其中配置部分hadoop.proxyuser.xxx.hosts的xxx是可以被修改的，如何修改需要查看此时服务器用户是谁，就用谁，这里我截了个图，如下
在这里插入图片描述
从图中可以看出，我现在的用户名为root，服务器名称为hadoop-2，这个时候xxx用什么就很清楚了
，没错就是root，如果你这时更换用户再次登录，配置完成后，就会导致beeline连接拒绝报错了，我这里就卡了很长一段时间才明白的，总要的事情说三遍，一定要注意、注意、再注意·····················

3.Hive Shell简单操作

首先启动mysql，执行service mysqld start,然后启动zookeeper，再启动hadoop服务，由于配置了hive环境变量，所以在任意目录下输入hive，即可登录Hive 服务，hive shell 命令与mysql命令基本上完全一致，hive的主要作用是为了简化hadoop的mapReduce操作的，而mapReduce操作的目的是为了从hbase中提取有用信息，做到数据统计、数据过滤、数据快速查找等等。
如果你发现本次修改了hive的数据库连接信息，你就需要初始化hive的mysql数据信息，才能顺利连接hive，操作如下：

#进入Hive 服务bin目录
[root@hadoop-2 apache-hive-2.3.0-bin]# ./schematool -dbType mysql -initSchema
#开启hiveserver服务
[root@hadoop-2 apache-hive-2.3.0-bin]# ./hive --service hiveserver2
#复制新的会话操作,启动客户端连接
[root@hadoop-2 apache-hive-2.3.0-bin]# ./beeline
#连接数据库
[root@hadoop-2 apache-hive-2.3.0-bin]# !connect jdbc:hive2://hadoop-2:10000
#如果你还是连接不上说明你的hive的lib下缺失mysql-connector-java.jar 包

#进入Hive 服务
[root@hadoop-2 apache-hive-2.3.0-bin]# hive

#hive shell命令
#查看数据库
show databases;
#创建数据库
create database financials;
#查询以f开头的数据库
show databases like 'f.*';
#查看某个数据库的详细信息
describe database financials;

如图所示：
在这里插入图片描述
可以看到 hdfs://ns1/user/hive/warehouse/financials.db，拆分成hdfs://ns1和/user/hive/warehouse/financials.db。
1、hdfs://ns1表示该数据库在那个服务的hadoop的hdfs文件系统中，我之前配置的1个叫ns1的hadoop的集群，如下配置所示：

 <property>
    <!-- nameservice 包含哪些namenode，为各个namenode起名 -->
    <name>dfs.ha.namenodes.ns1</name>
    <value>nn1,nn2</value>
  </property>

nn1:192.168.xxx.xxx
nn2:192.168.xxx.xxx
这是2台虚拟机地址。
2、/user/hive/warehouse/financials.db代表存储在hdfs文件服务器根目录/ 下的哪个位置。

#在指定的hdfs目录下创建数据库
create database financials1 location '/user/hive/warehouse/test'
#创建一个和employee1结构一致的表
create table employee1(
name string,
salary float,
subordinates array<string>,
deductions map<string,float>,
address struct<street:string,city:string,stata:string,zip:int>);
#然后执行
create table if not exists emp like employee1;

#查看表的详细信息
describe extended emp;

4.Hive Api操作

4.1启动hiveserver2服务

在使用hive开始前，如果mysql是部署在远端且 hive.metastore.local=false
则需要启动hive的metadata服务,这里需要把mysql的jar放到lib下mysql-connector-java-5.1.40-bin.jar，如图所示
在这里插入图片描述

$ hive --service metastore
启动mysql，hadoop，执行以下操作

#cd到hive的bin目录下
[root@hadoop-1 bin]# pwd
/export/server/apache-hive-2.3.0-bin/bin
#然后执行
[root@hadoop-1 bin]# hive --service hiveserver2
#开启一个新的命令窗口使用一下命令，查看hiveserver2是否启动成功
[root@hadoop-1 bin]# netstat -anop|grep 10000

在这里插入图片描述
也可以使用以下步骤查看是否成功

#进入hive shell 命令
[root@hadoop-1 bin]# hive
#查看hive 数据库
hive> show databases;
OK
default
financials
financials1
financials3
financials4
test

这里我们选择default连接，先退出hive，换另一种方式连接

#退出hive
hive> quit;
#连接hiveserver2
[root@hadoop-1 /]# beeline

beeline> !connect jdbc:hive2://hadoop-1:10000

#默认登录账号和密码是root
Enter username for jdbc:hive2://hadoop-1:10000: root
Enter password for jdbc:hive2://hadoop-1:10000: root

如图所示：
在这里插入图片描述
这样连接成功后就可以尝试在idea上创建java项目了，如果这个操作没成功的话，下面也不会成功的。

4.2 hive实战使用

创建一个idea的java项目，取名为apache-hive-test，然后再src同目录下创建lib文件夹，添加Hive安装目录下lib目录下的jar包全部导入，如图所示：
在这里插入图片描述
再将Hadoop安装包下3个jar包导入到项目中去。分别是：

hadoop-common-2.7.5.jar
commons-cli-1.2.jar
hadoop-mapreduce-client-core-2.7.5.jar

分别在以下目录可以看到
/export/server/hadoop/share/hadoop/common/
/export/server/hadoop/share/hadoop/common/lib/
/export/server/hadoop/share/hadoop/mapreduce/
由于我们之前已经把mysql的jar包导入hive的lib包下了，所以不用再次导入mysql的jar包了。

在Hive目录下创建一个文本文件如下所示：

[root@hadoop-1 apache-hive-2.3.0-bin]# touch userinfo.txt
[root@hadoop-1 apache-hive-2.3.0-bin]# vi  userinfo.txt



1          xiaopi
2          xiaoxue
3          qingqing
4          wangwu
5          zhangsan
6          lisi

编写程序：

package com.hive;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.sql.*;

public class HiveJdbcClient {
    private static String driverName = "org.apache.hive.jdbc.HiveDriver";
    private static String url = "jdbc:hive2://192.168.149.110:10000/default";
    private static String user = "root";
    private static String password = "123456";
    private static String sql = "";
    private static ResultSet res;
    private static final Logger log = LoggerFactory.getLogger(HiveJdbcClient.class);

    public static void main(String[] args) {
        try {
            Class.forName(driverName);
            Connection conn = DriverManager.getConnection(url, "root", "root");
            Statement stmt = conn.createStatement();
            String tableName = "testHiveDriverTable";
            sql = "drop table if exists " + tableName;
            stmt.executeUpdate(sql);
            sql = "create  table "+tableName+"(key int ,value string) row format delimited fields terminated by '\t'";
            stmt.executeUpdate(sql);
            sql = "show tables '"+tableName+"'";
            System.out.println("Running: "+sql);
            res = stmt.executeQuery(sql);
            System.out.println("执行“show tables”运行结果：");
            if (res.next()){
                System.out.println(res.getString(1));
            }
            sql = "describe "+tableName;
            System.out.println("Running: "+sql);
            res = stmt.executeQuery(sql);
            System.out.println("执行“describe table”运行结果：");
            while (res.next()){
                System.out.println(res.getString(1)+"\t"+res.getString(2));
            }
            String filepath = "/export/server/apache-hive-2.3.0-bin/userinfo.txt";
            sql = "load data local inpath '"+filepath+"'into table "+tableName;
            System.out.println("Running: "+sql);
            stmt.executeUpdate(sql);
            sql = "select * from "+tableName;
            System.out.println("Running: "+sql);
            res=stmt.executeQuery(sql);
            System.out.println("执行“select * query”运行结果：");
            while (res.next()){
                System.out.println(res.getInt(1)+"\t"+res.getString(2));
            }
            sql = "select count(1) from "+tableName;
            System.out.println("Running: "+sql);
            res=stmt.executeQuery(sql);
            System.out.println("执行“regular hive query”运行结果：");
            while (res.next()){
                System.out.println(res.getString(1));
            }
            conn.close();
            conn = null;
        }catch (ClassNotFoundException e){
            e.printStackTrace();
            log.error(driverName+"not found !",e.getMessage());
            System.exit(1);
        }catch (SQLException e){
            e.printStackTrace();
            log.error("Connection error!",e);
            System.exit(1);
        }
    }
}

启动后的显示效果：
在这里插入图片描述
不难看出最下面全部是0和null，并没有显示出我们之前创建文件的userinfo.txt里面的数据，这是为什么呢？
问题就出在分隔符上，我的建表语句是： sql = “create table “+tableName+”(key int ,value string) row format delimited fields terminated by ‘\t’”;
分隔符使用的是’\t’，'\t’代表的是tab符号，这个时候如果你习惯使用空格分隔的话，就会导致数据不识别，这样你的hdfs上虽然数据上去了，但是hive并不能查到数据，修改userinfo.txt文件后，如下图所示：
在这里插入图片描述
发现结果全部正常了，另外使用select count(1) from testHiveDriverTable的时候因为使用的是mapReduce操作，所以出来的较慢，大概等个20S到30S之前，linux的加载结果如图：

重点需要注意一下，如果hiveserver2没启动，idea上的项目是连接报错的。

总结

提示：这里对文章进行总结：
本来hive的单独使用并不复杂，但是考录到系统之间的关系，特别是需要hadoop开发用户使用权限的时候，就会报很多莫名其妙的错误（其他系统不允许连接），导致服务就像停在哪，一天都不动一下，等的毫无意义。

墨绳

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
Hive的常用命令使用及Hive API的案例教学

提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录前言一、HIVE是什么？二、Hive的安装配置1.版本配置2.hive搭建配置2.1配置hive的环境变量2.2配置hive的配置文件2.3配置hadoop影响hive部分的环境变量3.Hive Shell简单操作4.Hive Api操作4.1启动hiveserver2服务4.2 hive实战使用总结前言提示：这里可以添加本文要记录的大概内容：例如：随着人工智能的不断发展，机器学习这门技术也越来越重要，很多人都开启了学习机器
复制链接

扫一扫