Hive基本操作+库语法+表语法+内外表区别

ListenerDMT

已于 2022-04-05 10:36:10 修改

阅读量1.4k

点赞数

分类专栏： HIVE学习文章标签： hadoop 大数据 hive

于 2022-04-04 01:53:32 首次发布

本文链接：https://blog.csdn.net/qq_43476430/article/details/123947666

版权

HIVE学习专栏收录该内容

6 篇文章 1 订阅

订阅专栏

3、插入表数据 load / insert

4、修改表、删除表、清空表

四、内部表（MANAGED）、外部表（EXTERNAL）区别

五、作业

1、stored as 的含义

2、hive中哪些操作会产生MapReduced？

一、一些基本hive操作

1、hive -e +select 语句

[root@hadoop ~]# hive -e 'select * from test0401;'
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/home/peizk/app/jdk1.8.0_212/bin:/home/peizk/app/hadoop-3.1.3/bin:/home/peizk/app/hadoop-3.1.3/sbin:/home/peizk/app/hive-3.1.2/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/peizk/app/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/peizk/app/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 22d8f12f-d79a-4977-930a-5a87ceb7b143

Logging initialized using configuration in jar:file:/home/peizk/app/hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 5f64da15-a0ea-4735-9d97-05b7bb248eaa
OK
test0401.id	test0401.name
1	zhangsan
Time taken: 2.285 seconds, Fetched: 1 row(s)

2、hive -f + sql脚本文件

创建脚本

[root@hadoop ~]# vim test0404.sql

内容为

select *  from test0401;

执行

[root@hadoop ~]# hive -f test0404.sql 
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/home/peizk/app/jdk1.8.0_212/bin:/home/peizk/app/hadoop-3.1.3/bin:/home/peizk/app/hadoop-3.1.3/sbin:/home/peizk/app/hive-3.1.2/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/peizk/app/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/peizk/app/hadoop-3.1.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 5290fc65-c2c9-40f3-a127-0a5417845229

Logging initialized using configuration in jar:file:/home/peizk/app/hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = e9664dbb-a7be-4441-be6a-fc3ff518a22c
OK
test0401.id	test0401.name
1	zhangsan
Time taken: 2.233 seconds, Fetched: 1 row(s)

3、hive -i

定义UDF函数

二、库相关语法

hive默认有一个default库，对应HDFS地址为： /user/hive/warehouse（在hive-site.xml 定义的）

1、create 创建数据库

#基本语句格式
create database [if not exists] database_name [location hdfs_path]


#如果不加location，默认对应HDFS地址是：/user/hive/warehouse/test0404.db

带路径测试一下

hive (default)> create database test0404_b location '/dwd/database';
OK
Time taken: 0.043 seconds

查看如下

2、drop 删除数据库

#基本语句格式
drop  database database_name

###注意如果数据库中有表存在，不允许删除！！
如果需要删除
（1）提前将库中所有表删除
（2）使用cascade关键字
drop database if exists database_name cascade

3、use 使用库

(1) 使用某个库
   use  database_name

(2) 查看库的属性信息
  desc  database  [extended]  database_name

(3) 查看正在使用哪个库
   seect  current_database()

此处也可以设置  hive-site.xml 中
 <property>
        <name>hive.cli.print.current.db</name>
        <value>true</value>
    </property>
参数使得显示正在使用表

(4) 查看创建库的详细语句
    show  create database database_name

三、表相关语法

1、create 创建表

#基本格式

create [external]  table  [if not exists] table_name [(
col_name  data_type  [comment  col_comment]
)]
[comment table_comment]
[partitiong by (col_name data_type [comment col_comment],....)] 
[clustered by (col_name,col_name,...)  into num_buckets buckets]
[row_format  row_format]
[stored as file_format]
[location hdfs_path]
 


#解释

partitiong by  ：指定分区，hive中一个重要概念

clustered by   ：对于每一个表或者分区，hive可以进一步变为桶，hive采用对列值进行哈希，然后除以桶的个数求余的方式来决定该条记录存放在哪个桶中

row_format     ：一般是 row format delimited  fields terminated by ',' 指定表存储中列的分隔符，默认是  \001 （阿斯克码第一位），这里也可以定义其它分隔符

stored as      : stored as  sequencefile | textfile    如果以纯文本存储 textfile ，如果以压缩格式存储 sequencefile

location       ：指定hive表在HDFS上的路径，外部表必须要指定，默认是内部表

#一些建表语句
（1）create table table_name like 存在表
只复制表机构，内无数据

（2）create table table_name  as  select语句

CREATE external  TABLE emp_external 
   (  EMPNO bigint, 
  ENAME string, 
  JOB string, 
  MGR bigint, 
  HIREDATE string, 
  SAL bigint, 
  COMM bigint, 
  DEPTNO bigint)
row format delimited  fields terminated by ','
location '/dwd/';

#创建外部表 并指定存在的数据文件，/dwd  下  有emp.txt  文件

2、查看表

（1）查看表
show tables

(2) 查看其它库的表
show  tables in database_name

(3) 模糊查询表
show  tables  like  'test*'

(4) 查看表的信息
desc  [formatted]  table_name

(5) 查看表的建表语句
show create table table_name

3、插入表数据 load / insert

#  load 基本格式

load  data  [local]  inpath  'filepath' [overwrite] into table  table_name  
[partition (partcol = val1,partcol = val2,....)]




#  解释

[local]  :代表本地文件系统，不加会去HDFS上去找

inpath  :  加载数据的路径

[overwrite] : 有则覆盖，无则追加

[partition] :加载到哪个分区

# insert 基本格式 


（1）insert into table ...  values()    自己造数据用的比较多
（2）insert overwrite table  partition (dt)  ...  select  xxx,yyy,zzz  from a join bgroup by ..

4、修改表、删除表、清空表

#修改表

（1）修改表名
ALTER test_table old_name RENAME TO new_name

（2）修改字段名
ALTER TABLE test_table CHANGE col1 col2 STRING COMMENT 'The datatype of col2 is STRING'

(3)增加字段
 ALTER TABLE test_table ADD COLUMNS (dept STRING COMMENT 'Department name');

（4）给表添加注释
ALTER TABLE test_table  SET TBLPROPERTIES ('comment' = '注释内容')

（5）给字段修改注释
ALTER TABLE test_table  change column id id string comment '用户号码';


注意：删除一个表的字段 和修改其类型不支持，一般给其注释添加 废弃
如必须要删除 某个字段，那么就替换所有字段
alter table table_name replace columns(col_naem data_type [comment col_comment],...)

##删除表

drop table table_name

# 清空表 

truncate table table_name 


注意外部表不允许  truncate  会报错

四、内部表（MANAGED）、外部表（EXTERNAL）区别

最大不同：

内部表删除后，表和数据文件均删除；HDFS和Mysql的数据都被删除；

外部表删除后，表没了，location数据文件还在；Mysql中数据被删除，但是HDFS数据还在

总结：外部表不删除数据文件这个特点，方便数据共享，比如我们的原始数据和日志等

内部表可以作为我们中间表和结果表

五、思考

1、stored as 的含义

hive数据存储的格式主要有以下几种

2、hive中哪些操作会产生MapReduced？

hive 0.10.0为了执行效率考虑，简单的查询，就是只是select，不带count,sum,group by这样的，都不走map/reduce，直接读取hdfs文件进行filter过滤。这样做的好处就是不新开mr任务，执行效率要提高不少，但是不好的地方就是用户界面不友好，有时候数据量大还是要等很长时间，但是又没有任何返回。

改这个很简单，在hive-site.xml里面有个配置参数叫

hive.fetch.task.conversion

将这个参数设置为more，简单查询就不走map/reduce了，设置为minimal，就任何简单select都会走map/reduce

ListenerDMT

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Hive基本操作+库语法+表语法+内外表区别

目录一、一些基本hive操作1、hive -e +select语句2、hive -f + sql脚本文件3、hive -i二、库相关语法1、create 创建数据库2、drop 删除数据库3、use 使用库三、表相关语法1、create创建表2、查看表3、插入表数据load / insert4、修改表、删除表、清空表四、内部表（MANAGED）、外部表（EXTERNAL）区别五、作业1、stored as 的含...
复制链接

扫一扫

专栏目录