Hive从入门到放弃——Hive安装和基本使用(二)

Hive前提

  Hive的前提必须要搭载在hadoop集群上,作为hadoop集群的数据仓库使用,所以你先需要有个hadoop集群,关于hadoop集群的搭建,你可以参考博客Hadoop集群大数据解决方案之搭建Hadoop3.X+HA模式(二)
  这篇搭建hadoop集群用的是hadoop3.x,其实纯属装逼,我们这里需要用到的最好是hadoop-2.8.5+hive-2.3.5,关于hadoop2.x和hadoop3.x的安装区别,这位装逼的博主已经标注清楚了,请放心使用。

Hive下载

  这里使用Hive 2.3.5版本为例子:
  官网下载地址:http://archive.apache.org/dist/hive/hive-2.3.5
  一般是下载编译好的bin.tar.gz包:apache-hive-2.3.5-bin.tar.gz,如图1
在这里插入图片描述

图1 下载tar包选择

安装metadata库

选用MySQL且新建hive数据库

  为了配合Hadoop家族开源的精神,这么metadata就选用MySQL,先在集群的某一台机器安装好MySQL,不会安装MySQL就找DBA吧,DBA不理你的话你就可以看看这篇博客Linux(CentOS-7)下安装MySQL-5.7.30 ,然后把在这MySQL上新建好hive库,选用坑少的字符集utf8mb4_unicode_ci,后需要用到,实际运用中 ,一般不会用MySQL的root用户管理MySQL,所以需要后在MySQL服务器上新增hive用户,并把hive库的所有权限赋予hive,可以参考博客MySQL新建数据库,用户管理及授权管理简介

create DATABASE hive   DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

给Hive准备连接MySQL的驱动

  之前提到过,Hive的metadata库不一定是MySQL,所以针对不同类型库的必然需要不同的连接驱动Jar包,然后目前的版本Hive并没有给你ready这个Jar包,所以需要自己确保Hive目录下的lib下有mysql的连接驱动mysql-connector-java-xxx.jar,xxx是要兼容安装的metadata的mysql版本,我的是MySQL-5.7.30,所以我下载mysql-connector-java-8.0.13.jar,直接去官网下载即可,下载时需要登录/注册Oracle账号,然后放到你的Hive的/lib目录下,如我的/data/tools/apache-hive-2.3.5-bin/lib,再通过以下命令修改下这个驱动Jar包的权限;

chmod 755 mysql-connector-java-8.0.13.jar

   mysql-connector-java-xxx.jar下载传送门;

安装配置

  下载好Hive后得到是一个tar包apache-hive-2.3.5-bin.tar.gz,一般默认下载后tar权限比较低,利用chmod 755 apache-hive-2.3.5-bin.tar.gz 修改下权限,然后解压tar包tar -zxvf apache-hive-2.3.5-bin.tar.gz,然后 进入解压的folderapache-hive-2.3.5-bin,预览下得到以下文件夹,其中conf是配置相关文件夹,bin 是启动相关的文件夹。

[hadoop@node1 apache-hive-2.3.5-bin]$ ll
总用量 56
drwxrwxr-x. 3 hadoop hadoop   133 5月   5 10:59 bin
drwxrwxr-x. 2 hadoop hadoop  4096 5月   5 10:59 binary-package-licenses
drwxrwxr-x. 2 hadoop hadoop  4096 5月   5 10:59 conf
drwxrwxr-x. 4 hadoop hadoop    34 5月   5 10:59 examples
drwxrwxr-x. 7 hadoop hadoop    68 5月   5 10:59 hcatalog
drwxrwxr-x. 2 hadoop hadoop    44 5月   5 10:59 jdbc
drwxrwxr-x. 4 hadoop hadoop 12288 5月   5 10:59 lib
-rw-r--r--. 1 hadoop hadoop 20798 5月   7 2019 LICENSE
-rw-r--r--. 1 hadoop hadoop   230 5月   8 2019 NOTICE
-rw-r--r--. 1 hadoop hadoop   277 5月   8 2019 RELEASE_NOTES.txt
drwxrwxr-x. 4 hadoop hadoop    35 5月   5 10:59 scripts

配置环境变量

  利用vim /etc/profile新增Hive的环境变量,如下,其中/data/tools/apache-hive-2.3.5-bin为我的Hive的根目录,大家根据自己想把Hive安装的位置来写,不可照搬,增加该目录下的$PATH:$HIVE_HOME/bin目录是为了启动Hive方便,然后保存退出 /etc/profile文件,别忘了source /etc/profile来刷新环境变量文件;

export HIVE_HOME=/data/tools/apache-hive-2.3.5-bin
export PATH=$PATH:$HIVE_HOME/bin

  准确配置好以后,就可使用Hive --version来查看自己Hive安装的版本了,如下:

[hadoop@node1 tools]$ hive --version
Hive 2.3.5
Git git://HW13934/Users/gates/git/hive -r 76595628ae13b95162e77bba365fe4d2c60b3f29
Compiled by gates on Tue May 7 15:45:09 PDT 2019
From source with checksum c7864fc25abcb9cf7a36953ac6be4665

Hive在集群中的布局配置

  Hive的配置,选定一个Hive的master,其他的节点为slaves,master和slaves的配置略有不同,不是完全照搬,当然准确的来说,应该是只有一个节点是Hive的服务器节点负责和元数据库以及集群内部通信,其他的Hive节点其实是装的client,这些节点在访问元数据库时,先会内部跳转到Hive服务器的节点,所以Hive服务器节点和客户端节点的配置略有不同;
  我们搭建好的hadoop开发集群4个节点,我们选用配置如下,选用node1为hive的服务器节点,在四个节点上都装一下hive的客户端,其实其他节点不装hive客户端也可以,就是你是用hive cli的时候不能在其他节点使用,道理也很好理解对吧,除了这一点,并不影响啥。

  • NN hadoop集群namenode
  • DN hadoop集群datanode
  • ZK hadoop集群zookeeper
  • ZKFC hadoop集群Zookeeper Failover Controller高可用故障转移机制
  • JN Journal Node存放hadoop集群的元数据
metadata(MySQL)hive masterhive clientNNDNZKZKFCJNRMDM
node11111111
node21111111
node311111
node41111

Hive的master节点配置

  之前说了,我们的hive master节点其实是起到服务器的作用,配置和客户端有差别的,所以在node1节点,进入hive目录下的/conf目录,因为Hive只有一个自带的hive-default.xml.template文件,所以我们需要以此复制一个hive-site.xml出来,然后编辑里面的配置内容,这些shell指令如下:

# 进入hive的conf目录下
cd /data/tools/apache-hive-2.3.5-bin/conf

# 以hive-default.xml.template为源文件复制生成一个hive-site.xml
cp hive-default.xml.template hive-site.xml

#编辑hive-site.xml文件
vim hive-site.xml

  hive master节点的hive-site.xml的配置内容如下,我只是简单的借去了我的hive配置的部分必要的,但是有幸偷师了百度大佬的5000行的配置文件,很多配置都调整的相对较优,因为5000多行写不下,就只能放在资源包里面了,里面有master节点和slave节点的分别配置,感兴趣的可以去下载:hive-2.3.5配置文件.rar

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<configuration>
 <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
 <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->
 <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->
 <!-- Hive Execution Parameters -->
 
<!--hive的元数据库连接配置-->
<property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://node1:3306/hive?createDatabaseIfNotExist=true&amp;useUnicode=true&amp;characterEnconding=UTF-8</value>
</property>

<!--hive的元数据库选择jdbc驱动-->
<property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.cj.jdbc.Driver</value>
</property>

<!--hive的元数据库登录的用户名-->
<property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
</property>

<!--hive的元数据库登录的密码-->
<property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hive1234</value>
</property>

<property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
   <description>
     Enforce metastore schema version consistency.
     True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
           schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
           proper metastore schema migration. (Default)
     False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
   </description>
</property>

<property>
   <name>hive.server2.thrift.bind.host</name>
   <value>node1</value>
   <description>Bind host on which to run the HiveServer2 Thrift service.</description>
 </property>

<!--hive的监听端口-->
<property> 
   <name>hive.metastore.port</name>  
   <value>9083</value>  
   <description>Hive metastore listener port</description> 
 </property> 
<property>

<!--这里是配置hive能够连接jdbc,很实用-->
<property>
   <name>hive.server2.thrift.port</name>
   <value>10000</value>
   <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>

 <property>
   <name>hive.server2.thrift.http.port</name>
   <value>10001</value>
   <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description>
 </property>
</configuration>

  hive的slave配置,即客户端节点的配置

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--
       Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<configuration> 
 <!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->  
 <!-- WARNING!!! Any changes you make to this file will be ignored by Hive.   -->  
 <!-- WARNING!!! You must make your changes in hive-site.xml instead.         -->  
 <!-- Hive Execution Parameters -->  
 <property> 
   <name>hive.metastore.local</name>  
   <value>false</value> 
 </property>  
 <property> 
   <name>hive.metastore.warehouse.dir</name>  
   <value>/hive/warehouse</value>  
   <description>location of default database for the warehouse</description> 
 </property>  
 <property> 
   <name>hive.metastore.uris</name>  
   <value>thrift://node1:9083</value>  
   <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description> 
 </property>  
 <property> 
   <name>hive.metastore.schema.verification</name>  
   <value>false</value>  
   <description>Enforce metastore schema version consistency. True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. (Default) False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.</description> 
 </property>  
 <property> 
   <name>hive.server2.transport.mode</name>  
   <value>binary</value>  
   <description>Expects one of [binary, http]. Transport mode of HiveServer2.</description> 
 </property>  
 <property> 
   <name>hive.server2.thrift.bind.host</name>  
   <value>node1</value>  
   <description>Bind host on which to run the HiveServer2 Thrift service.</description> 
 </property>  
 <property> 
   <name>hive.driver.parallel.compilation</name>  
   <value>false</value>  
   <description>Whether to enable parallel compilation of the queries between sessions and within the same session on HiveServer2. The default is false.</description> 
 </property>  
 <property> 
   <name>hive.server2.metrics.enabled</name>  
   <value>false</value>  
   <description>Enable metrics on the HiveServer2.</description> 
 </property>  
 <property> 
   <name>hive.server2.thrift.http.port</name>  
   <value>10001</value>  
   <description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'http'.</description> 
 </property>  
 <property> 
   <name>hive.server2.thrift.http.path</name>  
   <value>cliservice</value>  
   <description>Path component of URL endpoint when in HTTP mode.</description> 
 </property>  
 <property> 
   <name>hive.server2.thrift.max.message.size</name>  
   <value>104857600</value>  
   <description>Maximum message size in bytes a HS2 server will accept.</description> 
 </property>  
 <property> 
   <name>hive.server2.thrift.http.max.idle.time</name>  
   <value>1800s</value>  
   <description>Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is msec if not specified. Maximum idle time for a connection on the server when in HTTP mode.</description> 
 </property>  
 <property> 
   <name>hive.server2.thrift.http.worker.keepalive.time</name>  
   <value>60s</value>  
   <description>Expects a time value with unit (d/day, h/hour, m/min, s/sec, ms/msec, us/usec, ns/nsec), which is sec if not specified. Keepalive time for an idle http worker thread. When the number of workers exceeds min workers, excessive threads are killed after this time interval.</description> 
 </property>  
 <property> 
   <name>hive.metastore.port</name>  
   <value>9083</value>  
   <description>Hive metastore listener port</description> 
 </property> 
<property>
   <name> mapreduce.job.queuename</name>
   <value>etl</value>
   <description>
     Used to specify name of Hadoop queue to which
     jobs will be submitted.  Set to empty string to let Hadoop choose the queue.
   </description>
 </property>
</configuration>

初始化Hive

    第一次启动前需要初始化metadata库,其实就是把hive的相关元数据写入到我们准备的MySQL的hive库里面,初始化的shell指令schematool -dbType mysql -initSchema,具体如下:

[hadoop@node1 conf]$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/tools/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/tools/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:        jdbc:mysql://node1:3306/hive?createDatabaseIfNotExist=true&useUnicode=true&characterEnconding=UTF-8
Metastore Connection Driver :    com.mysql.cj.jdbc.Driver
Metastore connection User:       hive
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed

   成功初始化后,可以看到元数据库hive内新增了许多表,这些表就是用来存储hive的元数据的,这里知道就行,先不做详细介绍。

在这里插入图片描述

图2 初始化后元数据MySQL的hive发生变化

   初始化其实就是把/data/tools/apache-hive-2.3.5-bin/scripts/metastore/upgrade/mysql下的相关sql语句设法在MySQL的hive库里面跑一下,一定要确保metadata的hive库初始化成功,留意字眼: schematool completed,如果遇到初始化报错,如没权限登录hive,则看一下自己的master节点hive-site.xml配置的登录元数据库hive的配置信息是否准确,其他异常可以百度相关的配置调整,这里就不一一列举了 ,一般原始是因为选的mysql版本和驱动不对应,或者mysql对索引长度做了限制或者其他限制,导致Hive初始化建表失败,或者就是登录元数据库hive的IP,端口,用户名,密码等写的不对,初始化正常后,就可以在master节点先尝试下使用hive了,具体如下;

[hadoop@node1 conf]$ hive
which: no hbase in (/data/tools/jdk1.8.0_211/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/data/tools/apache-hive-2.3.5-bin/bin:/data/tools/zookeeper-3.4.14:/data/tools/hadoop-2.8.5/sbin:/data/tools/hadoop-2.8.5/bin:/data/tools/mysql-5.7.30-linux-glibc2.12-x86_64/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/tools/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/tools/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/data/tools/apache-hive-2.3.5-bin/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
OK
default
Time taken: 31.212 seconds, Fetched: 1 row(s)
hive> create database dw;
OK
Time taken: 0.884 seconds

   这个时候如果启动客户端节点,及slave节点的hive,可能会报错,如下,不要慌,这是因为我们采用master节点作为服务器的模式,那么我们要把hive的服务器启动,保证客户端模式能访问。


[hadoop@node2 conf]$ hive
which: no hbase in (/data/tools/jdk1.8.0_211/bin:/data/tools/jdk1.8.0_211/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/data/tools/zookeeper-3.4.14:/data/tools/hadoop-2.8.5/sbin:/data/tools/hadoop-2.8.5/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/data/tools/apache-hive-2.3.5-bin/bin:/data/tools/zookeeper-3.4.14:/data/tools/hadoop-2.8.5/sbin:/data/tools/hadoop-2.8.5/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/tools/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/tools/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/data/tools/apache-hive-2.3.5-bin/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

处理元数据中文乱码

  先查看元数据库mysql本身字符集是否正常,以下语句查看字符集,一定要是utf8或者utf8mb4类的字符集,如果不是,请修改你的mysql配置文件/etc/my.cnf,如character_set_server不是utf8话则将/etc/my.cnf[client]新增default-character-set=utf8;

mysql> show variables like 'char%';
+--------------------------+-----------------------------------------------------------+
| Variable_name            | Value                                                     |
+--------------------------+-----------------------------------------------------------+
| character_set_client     | utf8                                                      |
| character_set_connection | utf8                                                      |
| character_set_database   | utf8                                                      |
| character_set_filesystem | binary                                                    |
| character_set_results    | utf8                                                      |
| character_set_server     | utf8                                                      |
| character_set_system     | utf8                                                      |
| character_sets_dir       | /usr/local/mysql-5.7.31-macos10.14-x86_64/share/charsets/ |
+--------------------------+-----------------------------------------------------------+
8 rows in set (0.00 sec)

   利用以下语句在mysql的hive库里面跑一下,因为hive源数据库mysql的默认的字符集都是latin1,该字符集不支持中文,会有中文乱码;

SELECT 

CONCAT("ALTER TABLE `", TABLE_NAME,"` CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;") 

AS target_tables

FROM INFORMATION_SCHEMA.TABLES

WHERE TABLE_SCHEMA="hive"

AND TABLE_TYPE="BASE TABLE"

   但是有些表的索引如果用utf8字符集可能会出现Specified key was too long; max key length is 3072 bytes,这个时候就不能整表改字符集了,建议你直接改这些表的某些需要中文字符的单独字段的字符集,如下;

--修改表字段注释支持中文字符集
alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8 COLLATE utf8_unicode_ci;

-- 修改表注释支持中文字符集
alter table TABLE_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8 COLLATE utf8_unicode_ci;

-- 修改分区表的属性支持中文
alter table PARTITION_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8 COLLATE utf8_unicode_ci;

-- 修改分区字段的描述支持中文
alter table PARTITION_KEYS modify column PKEY_COMMENT varchar(4000) character set utf8 COLLATE utf8_unicode_ci;

-- 修改分区字段的值支持中文
alter table PARTITION_KEY_VALS  modify column PART_KEY_VAL varchar(256) character set utf8 COLLATE utf8_unicode_ci;

-- 修改索引注释支持中文
alter table INDEX_PARAMS modify column PARAM_VALUE varchar(4000) character set utf8 COLLATE utf8_unicode_ci;

-- 修改数据库的描述
alter table DBS modify column `DESC` varchar(4000) character set utf8 COLLATE utf8_unicode_ci;

启动Hive服务

  可以在你的hive目录下,新建一个shell文件start_hive.sh,
shell指令vim /data/tools/apache-hive-2.3.5-bin/start_hive.sh,里面内容如下,然后保存退出;

#!/usr/bin/env bash

#后台启动hive service
nohup hive --service metastore >> /data/logs/hive/meta.log 2>&1 &

#后台启动hive 的jdbc service
nohup  hive --service hiveserver2 >> /data/logs/hive/hiveserver2.log 2>&1 &

  准确启动后,jps指令能查到两个RunJar进程对应hive的本身service和hive的jdbc service,要 重启hive的话,就先kill这两个RunJar,然后在运行这个脚本即可,注意:jdbc连接hive的配置为冷加载,什么意思呢,假设你改了一个hdfs或者hive的配置,你要是用jdbc连接的hive,不一定能刷新到新的配置,甚至可能会报错,如你修改了hdfs块的大小,另一边你用jdbc在查询hive表,可能会引发异常,这个时候你需要重启一下hive的jdbc service,jps查看如下:

[hadoop@node1 apache-hive-2.3.5-bin]$ jps
8195 DFSZKFailoverController
15686 Jps
7607 NameNode
15303 RunJar
6408 QuorumPeerMain
15304 RunJar

cli连接hive

  博客Hive从入门到放弃——Hive背景和基础架构介绍(一)讲了连接hive的几种形式,其中比较常用的有cli模式,因为我们配置好了hive环境变量,只需要直接在shell下hive 即可进入,具体如下;


[hadoop@node2 conf]$ hive
which: no hbase in (/data/tools/jdk1.8.0_211/bin:/data/tools/jdk1.8.0_211/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/data/tools/zookeeper-3.4.14:/data/tools/hadoop-2.8.5/sbin:/data/tools/hadoop-2.8.5/bin:/home/hadoop/.local/bin:/home/hadoop/bin:/data/tools/apache-hive-2.3.5-bin/bin:/data/tools/zookeeper-3.4.14:/data/tools/hadoop-2.8.5/sbin:/data/tools/hadoop-2.8.5/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/tools/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/tools/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/data/tools/apache-hive-2.3.5-bin/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

利用JDBC连接hive

  前提确保自己的hive配置了jdbc连接,而且 启动了jdbc service,然后通过shell指令的jdbc连接如下:

方法1
beeline
!connect jdbc:hive2://dw-test-cluster-007:10000
输入用户名,输入密码;
方法2:
或者 beeline -u “jdbc:hive2://dw-test-cluster-007:10000” -n hadoop hadoop
-u : 指定元数据库的链接信息
-n : 指定用户名和密码

  shell环境下jdbc连接效果如下:

[hadoop@node1 apache-hive-2.3.5-bin]$  beeline -u "jdbc:hive2://node1:10000" -n hadoop hadoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/tools/apache-hive-2.3.5-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/tools/hadoop-2.8.5/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://node1:10000
Connected to: Apache Hive (version 2.3.5)
Driver: Hive JDBC (version 2.3.5)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.5 by Apache Hive
0: jdbc:hive2://node1:10000> show databases;
+----------------+
| database_name  |
+----------------+
| default        |
| dw             |
+----------------+
2 rows selected (5.817 seconds)
0: jdbc:hive2://node1:10000>

  也可以用支持jdbc的ide工具连接,具体使用可参考Hive从入门到放弃——Hive常用连接工具推荐(三);
  注意:jdbc的稳定性没有cli的好,第一,hive里面的jdbc是冷加载,你新改动的hdfs或者hive配置,jdbc不一定能实时获取到,需要重启加载,第二,jdbc的service可能受到已知的,未知的原因导致断开,如果你在shell环境下利用以上jdbc的指令都无法正确 连接上的话,就要想想是不是jdbc的service断了。

基本使用

  安装中难免会遇到坑,百度,必应解决所有安装问题后,正常启动,直接在集群内部按的shell环境下输入hive,会调转如图2,注意正常启动的时候也是干净利落的,如果启动中会报错,就算最后跳出了’hive>'的CLI环境,也是有配置不完美的地方,再百度,必应修正,可以利用cli或者jdbc模式登录hive后,进行简单的语句测试下Hive是否可用。

--Hive SQL跟MySQL很类似,但是不是完全一样,数据类型也是不一样的,有MySQL基础相对学起来友好些
show databases; --查看DBs
use default; --切换到default库

--建表
--注意,Hive内主键,索引,都不流行使用,高版本虽然支持但是很少人用,
--允许在文件后追加数据,不能行级别删除,修改数据,后续操作中会细讲
create table if not exists user(
id string comment 'openId', 
phone string comment '电话', 
user_name string comment '用户名', 
sex string comment '性别,直接存男女', 
certificate_no string comment '身份证',  
create_time string 
)
row format delimited fields terminated by '\t' NULL DEFINED AS '' stored as textfile;

select * from user limit 10;--查询

  注意:hive本身并没有像mysql一样不带用户名,密码的登录验证,需要结合其他技术或者二次开发,但是很多自己搭建的hive也不太关心这一点,而是通过防火墙端口等技术来限制hive的安全访问,这个感兴趣的可以先自己研究下。

异常

  在运行启动hive时报出:Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
此问题是因为hive-site.xml配置文件问题所导致,需要在hive-site.xml配置文件中加入代码:

<property>
    <name>system:java.io.tmpdir</name>
    <value>/tmp/hive/java</value>
  </property>
  <property>
    <name>system:user.name</name>
    <value>${user.name}</value>
  </property>

  然后查下以下配置是否正常;
在这里插入图片描述

图3 配置解决异常
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

╭⌒若隐_RowYet——大数据

谢谢小哥哥,小姐姐的巨款

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值