011 大数据之Hive基础

1、Hive基本概念

1.1、Hive概述

概念:The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
本质:将SQL转化为MapReduce程序。Hive的存储系统是HDFS,HIVE计算引擎是MapReduce,HIVE的资源调度器是YARN

1.2、Hive优缺点

优点: 1)类SQL语法,易于上手;2)SQL自动转化为MapReduce,减少学习成本;3)适用于大数据的分析场景;4)Hive支持用户根据需求自定义函数;
缺点:1)Hive的执行延迟比较高;2)迭代式算法无法表达;3)处理小数据没有优势;4)自动生成的MapReduce作业不够智能化;5)Hive调优比较困难,粒度较粗;

1.3、Hive架构原理

Hive简介及Hive架构和原理
10、Hive核心概念和架构原理
在这里插入图片描述
Thrift Server
Hive的可选组件,此组件是一个软件框架服务,允许客户端使用包括Java、C++、Ruby和其他很多种语言,通过编程的方式远程访问Hive。
驱动器:Driver
(1)解析器(SQL Parser):将SQL字符串转换成抽象语法树AST,这一步一般都用第三方工具库完成,比如antlr;对AST进行语法分析,比如表是否存在、字段是否存在、SQL语义是否有误;
(2)编译器(Physical Plan):对HQL语句进行词法、语法、语义的编译(需要跟元数据关联),编译完成后会生成一个执行计划;
(3)优化器(Query Optimizer):对逻辑执行计划进行优化,减少不必要的列、使用分区、使用索引等;
(4)执行器(Execution):将优化后的把逻辑执行计划转换成可以运行的物理计划(对于Hive来说,就是MapReduce/Spark),提交到Hadoop的Yarn上执行。

1.4、Hive与传统数据库对比

2、Hive安装

2.1、安装MySql

原因在于Hive默认使用的元数据库为derby,开启Hive之后就会占用元数据库,且不与其他客户端共享数据,如果想多窗口操作就会报错,操作比较局限。以我们需要将Hive的元数据地址改为MySQL(rpm安装和卸载软件),可支持多窗口操作。

卸载已经安装的MySql
[atguigu@hadoop102 ~]$ rpm -qa|grep mariadb
mariadb-libs-5.5.56-2.el7.x86_64
[atguigu@hadoop102 ~]$ sudo rpm -e --nodeps mariadb-libs
解压MySQL安装包
[atguigu@hadoop102 software]$ tar -zxvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar 
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
# MySql的tar包没有压缩使用 -xvf选项就可
[atguigu@hadoop102 software]$ tar -xvf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar 
mysql-community-embedded-5.7.28-1.el7.x86_64.rpm
mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
mysql-community-devel-5.7.28-1.el7.x86_64.rpm
mysql-community-embedded-compat-5.7.28-1.el7.x86_64.rpm
mysql-community-libs-5.7.28-1.el7.x86_64.rpm
mysql-community-test-5.7.28-1.el7.x86_64.rpm
mysql-community-common-5.7.28-1.el7.x86_64.rpm
mysql-community-embedded-devel-5.7.28-1.el7.x86_64.rpm
mysql-community-client-5.7.28-1.el7.x86_64.rpm
mysql-community-server-5.7.28-1.el7.x86_64.rpm
[atguigu@hadoop102 software]$ ll
总用量 1720252
-rw-r--r--. 1 atguigu atguigu 609556480 1017 17:24 mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar
-rw-r--r--. 1 atguigu atguigu  45109364 930 2019 mysql-community-client-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu    318768 930 2019 mysql-community-common-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu   7037096 930 2019 mysql-community-devel-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu  49329100 930 2019 mysql-community-embedded-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu  23354908 930 2019 mysql-community-embedded-compat-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu 136837816 930 2019 mysql-community-embedded-devel-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu   4374364 930 2019 mysql-community-libs-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu   1353312 930 2019 mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu 208694824 930 2019 mysql-community-server-5.7.28-1.el7.x86_64.rpm
-rw-r--r--. 1 atguigu atguigu 133129992 930 2019 mysql-community-test-5.7.28-1.el7.x86_64.rpm
[atguigu@hadoop102 software]$
使用RPM安装MySql

RPM包名称及其概要:
mysql-community-server 数据库服务器和相关工具
mysql-community-client MySQL 客户端应用程序和工具
mysql-community-common 服务器和客户端通用的库文件
mysql-community-devel MySQL 数据库客户端应用程序开发的头文件和库文件
mysql-community-libs 用于 MySQL 数据库客户端应用程序的共享库
mysql-community-libs-compat 对于之前 MySQL 安装的共享兼容性库
mysql-community-embedded MySQL 嵌入式库
mysql-community-embedded-devel 嵌入式的 MySQL 开发头文件和库文件
mysql-community-test MySQL 服务器的测试套件

[atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm 
警告:mysql-community-common-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-common-5.7.28-1.e################################# [100%]
[atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-libs-5.7.28-1.el7.x86_64.rpm 
警告:mysql-community-libs-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-libs-5.7.28-1.el7################################# [100%]
[atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm 
警告:mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-libs-compat-5.7.2################################# [100%]
[atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-client-5.7.28-1.el7.x86_64.rpm 
警告:mysql-community-client-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-client-5.7.28-1.e################################# [100%]
[atguigu@hadoop102 software]$ sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm 
警告:mysql-community-server-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-community-server-5.7.28-1.e################################# [100%]
配置安装的MySql
# 查看datadir指向的目录
[atguigu@hadoop102 etc]$ cat /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html

[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
# datadir指向的目录下的所有内容
[atguigu@hadoop102 etc]$ cd /var/lib/mysql
[atguigu@hadoop102 mysql]$ sudo rm -rf ./*
# 初始化数据库
[atguigu@hadoop102 mysql]$ sudo mysqld --initialize --user=mysql
# 查看临时生成的root用户的密码
[atguigu@hadoop102 mysql]$ sudo cat /var/log/mysqld.log
2022-01-03T09:05:10.990246Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2022-01-03T09:05:16.710989Z 0 [Warning] InnoDB: New log files created, LSN=45790
2022-01-03T09:05:17.853899Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.
2022-01-03T09:05:18.010178Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 4586a786-6c74-11ec-9fb9-000c2955b598.
2022-01-03T09:05:18.042571Z 0 [Warning] Gtid table is not ready to be used. Table 'mysql.gtid_executed' cannot be opened.
2022-01-03T09:05:18.451056Z 0 [Warning] CA certificate ca.pem is self signed.
2022-01-03T09:05:18.589675Z 1 [Note] A temporary password is generated for root@localhost: )9Shrzd/oirE
# 启动MySQL服务
[atguigu@hadoop102 mysql]$ sudo systemctl start mysqld
# 登录MySQL数据库
[atguigu@hadoop102 mysql]$ mysql -uroot -p')9Shrzd/oirE'
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
# 修改root用户的密码,否则执行其他的操作会报错
mysql> set password = password("root");
Query OK, 0 rows affected, 1 warning (0.00 sec)
# 修改mysql库下的user表中的root用户允许任意ip连接
mysql> update mysql.user set host='%' where user='root';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
# 离开MySql
mysql> quit;
Bye

2.2、Hive安装部署

# 解压Hive
[atguigu@hadoop102 software]$ tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/module/
# 添加Hive环境变量
[atguigu@hadoop102 software]$ cat /etc/profile.d/my_env.sh 
# JAVA_HOME
export JAVA_HOME=/opt/module/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin

# HADOOP_HOME
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

# ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/module/apache-zookeeper-3.5.7-bin
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export PATH=$PATH:$ZOOKEEPER_HOME/sbin

# HIVE_HOME
HIVE_HOME=/opt/module/apache-hive-3.1.2-bin
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
# 刷新系统环境变量
[atguigu@hadoop102 software]$ source /etc/profile
# 解决日志Jar包冲突
[atguigu@hadoop102 software]$ mv $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar $HIVE_HOME/lib/log4j-slf4j-impl-2.10.0.jar.bak

2.3、Hive元数据配置到MySql

[atguigu@hadoop102 software]$ cp mysql-connector-java-5.1.37.jar $HIVE_HOME/lib
[atguigu@hadoop102 software]$ cat $HIVE_HOME/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- jdbc连接的URL -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false</value>
    </property>

    <!-- jdbc连接的Driver-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <!-- jdbc连接的username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <!-- jdbc连接的password -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>

    <!-- Hive默认在HDFS的工作目录 -->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    
   <!-- Hive元数据存储的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
   
    <!-- 元数据存储授权  -->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
</configuration>

2.4、启动Hive

myclusters.sh

# 登陆MySQL
[atguigu@hadoop102 software]$ mysql -uroot -proot
# 新建Hive元数据库
mysql> create database metastore;
mysql> quit;
# 初始化Hive元数据库
[atguigu@hadoop102 software]$ schematool -initSchema -dbType mysql -verbose
# 启动Hadoop集群
[atguigu@hadoop102 software]$ myclusters.sh start
# 启动Hive
[atguigu@hadoop102 hive]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin)
Hive Session ID = 0e138704-f46b-48e6-9b5f-67f1acda92a2

Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = 41ae884b-9d14-40eb-b53b-f58a1410049c
# 查看数据库
hive> show databases;
OK
default
Time taken: 1.009 seconds, Fetched: 1 row(s)
# 查看默认数据库中的表
hive> show tables;
OK
Time taken: 0.045 seconds
# 在默认数据库中建表
hive> create table test (id int);
OK
Time taken: 0.864 seconds
# 向表中插入数据
hive> insert into test values(1);
Query ID = atguigu_20220103174139_894e6a7c-1997-41f5-8abe-34478acad198
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Interrupting... Be patient, this might take some time.
Press Ctrl+C again to kill JVM
Starting Job = job_1641202744100_0001, Tracking URL = http://hadoop103:8088/proxy/application_1641202744100_0001/
Kill Command = /opt/module/hadoop-3.1.3/bin/mapred job  -kill job_1641202744100_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-01-03 17:42:33,045 Stage-1 map = 0%,  reduce = 0%
2022-01-03 17:42:59,762 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 11.01 sec
2022-01-03 17:43:25,209 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 22.36 sec
MapReduce Total cumulative CPU time: 22 seconds 360 msec
Ended Job = job_1641202744100_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://hadoop102:9820/user/hive/warehouse/test/.hive-staging_hive_2022-01-03_17-41-40_004_8605542926052281340-1/-ext-10000
Loading data to table default.test
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 22.36 sec   HDFS Read: 12765 HDFS Write: 199 SUCCESS
Total MapReduce CPU Time Spent: 22 seconds 360 msec
OK
Time taken: 112.741 seconds
# 查看表中的数据
hive> select * from test;
OK
1
Time taken: 0.211 seconds, Fetched: 1 row(s)
# 离开Hive
hive> quit;
[atguigu@hadoop102 software]$ 

在这里插入图片描述
另开一个窗口查看新建的表

[atguigu@hadoop102 ~]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/home/atguigu/.local/bin:/home/atguigu/bin)
Hive Session ID = dcc0168b-a0ec-4510-a48a-861a82019407

Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = ec1d304b-3988-40e5-8955-54e3bdfd0eb9
hive> show tables;
OK
test
Time taken: 0.836 seconds, Fetched: 1 row(s)
hive> show databases;
OK
default
Time taken: 0.026 seconds, Fetched: 1 row(s)
hive> quit;

2.5、使用JDBC方式访问Hive

在hive-site.xml文件中添加如下配置信息(前两个property是新添加的)

[atguigu@hadoop102 ~]$ cat $HIVE_HOME/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- 指定hiveserver2连接的host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>hadoop102</value>
    </property>

    <!-- 指定hiveserver2连接的端口号 -->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>

    <!-- jdbc连接的URL -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false</value>
    </property>

    <!-- jdbc连接的Driver-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <!-- jdbc连接的username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <!-- jdbc连接的password -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>

    <!-- Hive默认在HDFS的工作目录 -->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    
   <!-- Hive元数据存储的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
   
    <!-- 元数据存储授权  -->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
</configuration>

启动hiveserver2(使用过程中不能关掉)

[atguigu@hadoop102 ~]$ hive --service hiveserver2
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/home/atguigu/.local/bin:/home/atguigu/bin)
2022-01-03 20:14:16: Starting HiveServer2
Hive Session ID = a60c0358-d232-4d08-8f11-215017b6122c
Hive Session ID = 43853016-ce83-429f-9670-9d2a96c081ea
Hive Session ID = 496671ff-41d6-44f2-8f41-a838572d61cd
Hive Session ID = da1c1cb7-c0c0-49a0-82df-cf32fb6c6cd0
Hive Session ID = c506f9f9-6a36-4b8f-828b-fb5c9723e514

启动beeline客户端(需要多等待一会)

# -n atguigu 与 [atguigu@hadoop102 software] 的用户名保持一致
[atguigu@hadoop102 software]$ beeline -u jdbc:hive2://hadoop102:10000 -n atguigu
Connecting to jdbc:hive2://hadoop102:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
# 命令的输出格式更美观
0: jdbc:hive2://hadoop102:10000> show databases;
INFO  : Compiling command(queryId=atguigu_20220103202645_ec2b96fc-0e49-4822-9853-0553a563a64c): show databases
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=atguigu_20220103202645_ec2b96fc-0e49-4822-9853-0553a563a64c); Time taken: 1.023 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=atguigu_20220103202645_ec2b96fc-0e49-4822-9853-0553a563a64c): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=atguigu_20220103202645_ec2b96fc-0e49-4822-9853-0553a563a64c); Time taken: 0.058 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (1.787 seconds)
启动hiveserver2(后台运行)

Metastore 有3中开启方式:
① 默认开启方式:没有配置metaStore的时候,每当开启bin/hive;或者开启hiveServer2的时候,都会在内部启动一个metastore嵌入式服务;资源比较浪费,如果开启多个窗口,就会存在多个metastore server;
② local mataStore(本地):当metaStore和装载元数据的数据库(MySQL)存在同一机器上时配置是此模式,开启metastore服务就只需要开启一次就好,避免资源浪费;
③ Remote Metastore(远程):当metaStore和装载元数据的数据库(MySQL)不存在同一机器上时配置是此模式,开启metastore服务就只需要开启一次就好,避免资源浪费;

[atguigu@hadoop202 hive]$ nohup hive --service metastore 2>&1 &
[atguigu@hadoop202 hive]$ nohup hive --service hiveserver2 2>&1 &

nohup: 放在命令开头,表示不挂起,也就是关闭终端进程也继续保持运行状态
0:标准输入
1:标准输出
2:错误输出
2>&1 : 表示将错误重定向到标准输出上
&: 放在命令结尾,表示后台运行
一般会组合使用: nohup [xxx命令操作]> file 2>&1 & , 表示将xxx命令运行的结果输出到file中,并保持命令启动的进程在后台运行。

Hive为什么要启用Metastore?

2.6、Hive常用交互命令

[atguigu@hadoop102 bin]$ hive -help
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin)
Hive Session ID = ab0b4235-084f-4d02-bc00-dc9dbc2a0acf
usage: hive
 -d,--define <key=value>          Variable substitution to apply to Hive
                                  commands. e.g. -d A=B or --define A=B
    --database <databasename>     Specify the database to use
 -e <quoted-query-string>         SQL from command line
 -f <filename>                    SQL from files
 -H,--help                        Print help information
    --hiveconf <property=value>   Use value for given property
    --hivevar <key=value>         Variable substitution to apply to Hive
                                  commands. e.g. --hivevar A=B
 -i <filename>                    Initialization SQL file
 -S,--silent                      Silent mode in interactive shell
 -v,--verbose                     Verbose mode (echo executed SQL to the
                                  console)
# 不进入hive的交互窗口执行sql语句
[atguigu@hadoop102 bin]$ hive -e "select id from test;"
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin)
Hive Session ID = b798f0f2-44d9-4429-b699-327a1f527e96

Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 82850e5e-accf-4a6e-9208-eb8b21e596d3
OK
1
Time taken: 3.405 seconds, Fetched: 1 row(s)
[atguigu@hadoop102 datas]$ cat hivef.sql 
select * from test;
# 不执行SQL脚本
[atguigu@hadoop102 datas]$ hive -f /opt/module/apache-hive-3.1.2-bin/datas/hivef.sql 
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin)
Hive Session ID = 6c763bda-fb28-4d62-bdf0-54a2d9d3c31e

Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 01eb09c5-52df-4579-bdc1-1f8ddd37eaa7
OK
1
Time taken: 2.594 seconds, Fetched: 1 row(s)
# hive中查看hdfs文件系统
hive> dfs -ls /;
Found 10 items
-rw-r--r--   2 atguigu supergroup          5 2021-12-14 23:17 /dage.txt
-rw-r--r--   3 atguigu supergroup      12562 2021-12-17 21:24 /fsimage.xml
drwxr-xr-x   - atguigu supergroup          0 2021-12-12 22:10 /hahahha
drwxr-xr-x   - atguigu supergroup          0 2021-12-12 22:09 /input
drwxr-xr-x   - atguigu supergroup          0 2021-12-26 23:19 /output
drwxr-xr-x   - atguigu supergroup          0 2021-12-26 23:23 /output1
drwxr-xr-x   - atguigu supergroup          0 2021-12-28 20:46 /output2
drwxr-xr-x   - atguigu supergroup          0 2021-12-14 21:21 /sanguo
drwxrwx---   - atguigu supergroup          0 2022-01-03 17:38 /tmp
drwxr-xr-x   - atguigu supergroup          0 2022-01-03 17:41 /user
# 查看在hive中输入的所有历史命令
[atguigu@hadoop102 ~]$ cd;pwd
/home/atguigu
[atguigu@hadoop102 ~]$ cat .hivehistory 
show tables;
show databases;
quit;
quit
;
dfs -ls /;
cat .hivehistory
;
dfs -ls /;
quit;

2.7、Hive常见属性配置

hive窗口打印默认库和表头

[atguigu@hadoop102 ~]$ cat $HIVE_HOME/conf/hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- hive窗口打印默认库和表头 -->
    <property>
        <name>hive.cli.print.header</name>
        <value>true</value>
    </property>
    <property>
        <name>hive.cli.print.current.db</name>
        <value>true</value>
    </property>

    <!-- 指定hiveserver2连接的host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>hadoop102</value>
    </property>

    <!-- 指定hiveserver2连接的端口号 -->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10000</value>
    </property>

    <!-- jdbc连接的URL -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false</value>
    </property>

    <!-- jdbc连接的Driver-->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <!-- jdbc连接的username-->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <!-- jdbc连接的password -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>

    <!-- Hive默认在HDFS的工作目录 -->
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>
    
   <!-- Hive元数据存储的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
   
    <!-- 元数据存储授权  -->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
</configuration>
[atguigu@hadoop102 ~]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/home/atguigu/.local/bin:/home/atguigu/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin)
Hive Session ID = 2974245c-c05f-49c0-aeff-dad02314e8fc

Logging initialized using configuration in jar:file:/opt/module/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = ac9d8985-4539-4103-bec5-14f6e03676bb
hive (default)> select * FROM test;
OK
test.id
1
Time taken: 2.317 seconds, Fetched: 1 row(s)

Hive运行日志信息配置

[atguigu@hadoop102 ~]$ mv $HIVE_HOME/conf/hive-log4j2.properties.template $HIVE_HOME/conf/hive-log4j2.properties
[atguigu@hadoop102 ~]$ cat $HIVE_HOME/conf/hive-log4j2.properties.template
# Hive的log默认存放在/tmp/atguigu/hive.log目录下(当前用户名下)
# property.hive.log.dir = ${sys:java.io.tmpdir}/${sys:user.name}
property.hive.log.dir = /opt/module/apache-hive-3.1.2-bin/logs

Hive 参数配置的三种方式

3、Hive数据类型

3.1、hive3-hive基本数据类型

[atguigu@hadoop102 ~]$ cat /opt/module/apache-hive-3.1.2-bin/datas/test.txt
songsong,bingbing_lili,xiao song:18_xiaoxiao song:19,hui long guan_beijing
yangyang,caicai_susu,xiao yang:18_xiaoxiao yang:19,chao yang_beijing
hive (default)> create table test(
              >  name string,
              >  friends array<string>,
              >  children map<string,int>,
              >  address struct<street:string,city:string>
              >  )
              >  row format delimited fields terminated by ','
              >  collection items terminated by '_'
              >  map keys terminated by ':'
              >  lines terminated by '\n';
OK
Time taken: 0.541 seconds
hive (default)> load data local inpath'/opt/module/apache-hive-3.1.2-bin/datas/test.txt' into table test;
Loading data to table default.test
OK
Time taken: 1.643 seconds
hive (default)> select friends[1],children['xiao song'],address.city from test
              > where name='songsong';
OK
_c0	_c1	city
lili	18	beijing
Time taken: 1.915 seconds, Fetched: 1 row(s)

3.2、类型转化

隐式类型转换规则如下:
(1)任何整数类型都可以隐式地转换为一个范围更广的类型,如TINYINT可以转换成INT,INT可以转换成BIGINT。
(2)所有整数类型、FLOAT和STRING类型都可以隐式地转换成DOUBLE
(3)TINYINT、SMALLINT、INT都可以转换为FLOAT。
(4)BOOLEAN类型不可以转换为任何其它的类型。

# cast强制类型转化,转化失败返回null
hive (default)> select '1'+2, cast('1'as int) + 2;
OK
_c0	_c1
3.0	3
Time taken: 0.189 seconds, Fetched: 1 row(s)
hive (default)> select 1 + 2 + 2.1;
OK
_c0
5.1
Time taken: 0.21 seconds, Fetched: 1 row(s)
hive (default)> select 2.4 + 2.1;
OK
_c0
4.5
Time taken: 0.176 seconds, Fetched: 1 row(s)
hive (default)> select true + 1;
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments '1': No matching method for class org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPNumericPlus with (boolean, int)

4、DDL数据定义

4.1、数据库相关操作

[atguigu@hadoop102 ~]$ hive --service hiveserver2
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-zookeeper-3.5.7-bin/bin:/opt/module/apache-zookeeper-3.5.7-bin/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/apache-hive-3.1.2-bin/bin:/home/atguigu/.local/bin:/home/atguigu/bin)
2022-01-04 22:01:57: Starting HiveServer2
Hive Session ID = 44d9eacb-c941-4f6f-af92-b854323b3d21
Hive Session ID = e97640ae-e1ac-40b1-a84d-150a862043cd
Hive Session ID = 231724bb-f3d1-45ca-bd88-f8d0d7abe7dc
Hive Session ID = 45fc01d9-0b42-43cd-b6ec-bd800c7f084a
[atguigu@hadoop102 ~]$ beeline -u jdbc:hive2://hadoop102:10000 -n atguigu
Connecting to jdbc:hive2://hadoop102:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://hadoop102:10000> 
# 创建数据库
0: jdbc:hive2://hadoop102:10000> create database if not exists db_hive1;
INFO  : Compiling command(queryId=atguigu_20220104222210_480a7ffb-7135-4d5b-a97f-a86aef34535b): create database if not exists db_hive1
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=atguigu_20220104222210_480a7ffb-7135-4d5b-a97f-a86aef34535b); Time taken: 0.016 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=atguigu_20220104222210_480a7ffb-7135-4d5b-a97f-a86aef34535b): create database if not exists db_hive1
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=atguigu_20220104222210_480a7ffb-7135-4d5b-a97f-a86aef34535b); Time taken: 0.031 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
No rows affected (0.066 seconds)
# 修改数据库
0: jdbc:hive2://hadoop102:10000> alter database db_hive1 set dbproperties('createtime'='20170830');
INFO  : Compiling command(queryId=atguigu_20220104220851_49474d3e-f488-4cb8-a280-f7341cacd8e3): alter database db_hive1 set dbproperties('createtime'='20170830')
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=atguigu_20220104220851_49474d3e-f488-4cb8-a280-f7341cacd8e3); Time taken: 0.024 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=atguigu_20220104220851_49474d3e-f488-4cb8-a280-f7341cacd8e3): alter database db_hive1 set dbproperties('createtime'='20170830')
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=atguigu_20220104220851_49474d3e-f488-4cb8-a280-f7341cacd8e3); Time taken: 0.072 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
No rows affected (0.115 seconds)
# 显示数据库详细信息
0: jdbc:hive2://hadoop102:10000> describe database extended db_hive1;
INFO  : Compiling command(queryId=atguigu_20220104220904_22e942f7-3c07-41e4-931d-abcedbc16298): describe database extended db_hive1
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:db_name, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer), FieldSchema(name:location, type:string, comment:from deserializer), FieldSchema(name:owner_name, type:string, comment:from deserializer), FieldSchema(name:owner_type, type:string, comment:from deserializer), FieldSchema(name:parameters, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=atguigu_20220104220904_22e942f7-3c07-41e4-931d-abcedbc16298); Time taken: 0.021 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=atguigu_20220104220904_22e942f7-3c07-41e4-931d-abcedbc16298): describe database extended db_hive1
INFO  : Starting task [Stage-0:DDL] in serial mode
+-----------+----------+----------------------------------------------------+-------------+-------------+------------------------+
|  db_name  | comment  |                      location                      | owner_name  | owner_type  |       parameters       |
+-----------+----------+----------------------------------------------------+-------------+-------------+------------------------+
| db_hive1  |          | hdfs://hadoop102:9820/user/hive/warehouse/db_hive1.db | atguigu     | USER        | {createtime=20170830}  |
+-----------+----------+----------------------------------------------------+-------------+-------------+------------------------+
1 row selected (0.059 seconds)
INFO  : Completed executing command(queryId=atguigu_20220104220904_22e942f7-3c07-41e4-931d-abcedbc16298); Time taken: 0.011 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager

0: jdbc:hive2://hadoop102:1000
# 显示数据库信息
0> describe database db_hive1;
INFO  : Compiling command(queryId=atguigu_20220104220819_34d3e484-1944-48ae-b09b-87c865498b92): describe database db_hive1
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:db_name, type:string, comment:from deserializer), FieldSchema(name:comment, type:string, comment:from deserializer), FieldSchema(name:location, type:string, comment:from deserializer), FieldSchema(name:owner_name, type:string, comment:from deserializer), FieldSchema(name:owner_type, type:string, comment:from deserializer), FieldSchema(name:parameters, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=atguigu_20220104220819_34d3e484-1944-48ae-b09b-87c865498b92); Time taken: 0.025 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=atguigu_20220104220819_34d3e484-1944-48ae-b09b-87c865498b92): describe database db_hive1
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=atguigu_20220104220819_34d3e484-1944-48ae-b09b-87c865498b92); Time taken: 0.012 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
|  db_name  | comment  |                      location                      | owner_name  | owner_type  | parameters  |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
| db_hive1  |          | hdfs://hadoop102:9820/user/hive/warehouse/db_hive1.db | atguigu     | USER        |             |
+-----------+----------+----------------------------------------------------+-------------+-------------+-------------+
1 row selected (0.082 seconds)

4.2、创建表

详解hive的列分隔符和行分隔符的使用

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name
[(col_name data_type [COMMENT col_comment], …)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], …)]
[CLUSTERED BY (col_name, col_name, …)
[SORTED BY (col_name [ASC|DESC], …)] INTO num_buckets BUCKETS]
[ROW FORMAT row_format]
[STORED AS file_format]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, …)]
[AS select_statement]

row format delimited fields terminated by ‘\t’ – 列分隔符 ‘\001’
row format delimited lines terminated by ‘\n’ – 行分隔符

create table if not exists tb_iptv_data_sequence(
expodate string,
addcode string,
sbtid string,
rscode string,
rsname string,
scid string)
row format DELIMITED FIELDS TERMINATED BY '|'LINES TERMINATED BY '\n' 
stored as sequencefile;

字段解释说明
(01)CREATE TABLE: 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常;用户可以用 IF NOT EXISTS 选项来忽略这个异常。
(02)EXTERNAL:关键字可以让用户创建一个外部表,在建表的同时可以指定一个指向实际数据的路径(LOCATION),在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据
(03)COMMENT:为表和列添加注释。
(04)PARTITIONED BY:创建分区表
(05)CLUSTERED BY:创建分桶表
(06)SORTED BY:不常用,对桶中的一个或多个列另外排序
(07)ROW FORMAT :用户在建表的时候可以自定义SerDe或者使用自带的SerDe。如果没有指定ROW FORMAT 或者ROW FORMAT DELIMITED,将会使用自带的SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的SerDe,Hive通过SerDe确定表的具体的列的数据。SerDe是Serialize/Deserilize的简称, hive使用Serde进行行对象的序列与反序列化。
(08)STORED AS:指定存储文件类型。常用的存储文件类型:SEQUENCEFILE(二进制序列文件)、TEXTFILE(文本)、RCFILE(列式存储格式文件)。如果文件数据是纯文本,可以使用STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCEFILE。
(09)LOCATION :指定表在HDFS上的存储位置。
(10)AS:后跟查询语句,根据查询结果创建表。
(11)LIKE:允许用户复制现有的表结构,但是不复制数据。

hive 管理表(内部表)与外部表的互相转换
# 1.修改外部表emp为内部表(emp为表名)
alter table emp set tblproperties ('EXTERNAL'='FALSE');
# 2.修改内部表emp为外部表
alter table emp set tblproperties ('EXTERNAL'='TRUE');

5、DML数据操作

5.1、HIVE常用命令——使用SHELL命令和HDFS命令

# 如果需要使用shell命令,可以通过!+shell命令来实现,注意结尾带上分号
hive (default)> ! ls -ls /opt;
总用量 24
4 drwxr-xr-x.  3 atguigu atguigu 4096 12月 26 2021 data
4 drwxr-xr-x.  3 atguigu atguigu 4096 12月 28 2021 ha
4 drwxr-xr-x. 23 atguigu atguigu 4096 5月  26 21:40 module
4 drwxr-xr-x.  2 atguigu atguigu 4096 12月 28 2021 program
4 drwxr-xr-x.  2 root    root    4096 9月   7 2017 rh
4 drwxr-xr-x.  3 atguigu atguigu 4096 5月  22 18:07 software
# 如果需要使用HDFS命令,可以通过dfs来实现,注意结尾带上分号
hive (default)> dfs -ls /user/hive;
Found 2 items
drwxr-xr-x   - atguigu supergroup          0 2022-05-09 21:36 /user/hive/jars
drwxr-xr-x   - atguigu supergroup          0 2022-07-03 19:27 /user/hive/warehouse

5.2、Hive Shell指令——数据导入方法、数据导出方法详解

LanguageManual ImportExport

  • LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]
  • INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …) [IF NOT EXISTS]] select_statement1 FROM from_statement;
  • INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 …)] select_statement1 FROM from_statement;

PS:overwrite 关键字表示完全覆盖(全删全插),不支持插入或加载部分字段

将Hive查询结果导出重定向到文件中

# 基本语法:(hive -f/-e 执行语句或者脚本 > file)
[atguigu@hadoop102 hive]$ bin/hive -e 'select * from default.student;' > /opt/module/hive/datas/export/student4.txt;

5.3、清除表中数据

删除hive表

# 注意:Truncate只能删除管理表,不能删除外部表中数据;相当于delete from table where 1=1
hive (default)> truncate table student;

注意:truncate 不能删除外部表!因为外部表里的数据并不是存放在Hive Meta store中。创建表的时候指定了EXTERNAL,外部表在删除分区后,hdfs中的数据还存在,不会被删除。因此要想删除外部表数据,可以把外部表转成内部表或者删除hdfs文件。

删除hive分区

# alter table table_name drop partition (partition_name='分区名')
hive (default)> alter table tablename drop partition(load_date='2019-01-01');
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值