Hive 基础知识学习(语法)

最新推荐文章于 2024-04-09 10:32:46 发布

艾伦蓝

最新推荐文章于 2024-04-09 10:32:46 发布

阅读量107

点赞数

分类专栏： Hive Hadoop 文章标签：大数据数据库

本文链接：https://blog.csdn.net/lan12334321234/article/details/88328333

版权

Hadoop 同时被 2 个专栏收录

82 篇文章 0 订阅

订阅专栏

Hive

16 篇文章 1 订阅

订阅专栏

[size=large][b]一.Hive 简介 [/b][/size]
[color=red][b]Hive是基于 Hadoop 分布式文件系统的一种数据库，它的数据都是以文件形式存在的。 [/b][/color]
[color=blue][b]Hive中的每一条记录对应于文件中的一行，各个字段的值是被指定的分隔符分隔的。[/b][/color]在读数据的时候，会将文件行以分隔符分隔字段值，并将各个值按顺序给字段；[color=red][b]现有的 hive 的权限基于文件的，如果某个用户对表对应的文件有读的权限，那么用户就对表有读的权限。 [/b][/color]
当前 hive 运用最多的是分区，hive 会将各个分区的数据分别放在不同的文件夹下；
[color=red][b]在用 hive 执行 SQL 语句时，是将语句处理成 mapreduce 程序运行的。 [/b][/color]

[size=large][b]二.数据类型 [/b][/size]
[b]整型 [/b]int 4 字节 smallint 2 字节 Tinyint 1 字节 bigint 8 字节
[b]浮点数[/b] float double
[b]字符串 [/b]string
[b]布尔型[/b] boolean
不支持日期时间型
不支持二进制串

[color=red][b]其它数据类型 [/b][/color]
ARRAY
MAP
STRUCT


create table complex( 
    col1 ARRAY<int>, 
    col2 Map<string,int>, 
    col3 STRUCT<a:string, b:int, c:double> 
  ); 
select col1[0],col2['b'],col3.c from complex;

[size=large][b]三.支持各种内建函数[/b][/size]
略...
[size=large][b]四.DDL(数据定义）[/b][/size]
[color=red][b]1.创建和删除建数据库 [/b][/color]

create database if not exists db_test 
  comment '用于测试'; 
  drop database if exists db_test;

[size=medium][b]2.建表 [/b][/size]

 create external table order_joined_extend( 
    addr_id bigint comment 'address id' , 
    alliance_id int , 
    allot_quantity int , 
    city_ship_type_desc string 
  ) 
comment 'order_joined_extend' 
partitioned by (create_date string,type string) 
row format delimited fields terminated by '\001' 
lines terminated by '\n' 
stored as textfile 
location '/home/zhouweiping/order_joined_extend/';

[size=medium][b]external 建立外部表。[/b][/size]

外部表的好处：
[color=red][b]a.可以直接将数据文件放到 location 指定的目录，在 hive中即可查询出数据；[/b][/color]
[color=blue][b]b.可以多个表使用一份数据，只需将 location 指向同一个目录; [/b][/color]
partitioned by 建立分区表。
分区表是将分区列值一样的放到一个文件中，如果该分区列下还有子分区，会在该文件夹下再分小文件夹；如图：

[img]http://dl2.iteye.com/upload/attachment/0124/1511/77051b2a-d877-34a0-b8be-57218cefa19f.png[/img]

[color=red][b]row format 指定表中行列分隔符。 [/b][/color]
[color=blue][b]Stored as 文件存储的格式，此处的 textfile。 [/b][/color]
Location 指定表中数据文件存放的 hdfs 目录。该参数默认为:
/user/hive/warehouse/dbname.db/tablename

也可以用

create table table_name like old_table_name,

[color=blue][b]但是这个只能建内表，不能建外表[/b][/color]，就是加了external，所建的表任然是内表；而且在建表时如果原表是分区表，新建的表也只是一般的表，原表中的分区字段成了新表中的一般字段。
[size=medium][b]3.建表的时候可以同时插入数据 [/b][/size]

create table order_joined_extend1 
     comment 'order_joined_extend' 
     row format delimited fields terminated by '\001' 
     lines terminated by '\n' 
     stored as textfile 
     location '/home/zhouweiping/order_joined_extend1/' 
     as 
     select * from order_joined_extend;

[size=medium][color=red][b]但是这种方法不支持外部表和分区表，[/b][/color][/size]并且在建表时不能指定详细的列。
[size=medium][b]4. 删除表 [/b][/size]

drop table if exists order_joined_extend1;

删除的表可能是外部表或者内表，[size=medium][color=red][b]在删除外部表时只是删除了表结构，数据文件依然存在。[/b][/color][/size]
[size=medium][b]5. 修改表 [/b][/size]
[color=red][b]增加删除分区 [/b][/color]

alter table order_joined_extend 
  add partition(create_date='2012-09-01',type='ddclick_bang') 
  location '/share/comm/ddclick/2012-09-01/ddclick_bang/'; 
  alter table order_joined_extend 
  drop if exists partition(create_date='2012-09-01',type='ddclick_bang');

[color=red][b]重命名 [/b][/color]

 alter table order_joined_extend rename to order_joined_extend_rename;

替换原有的列，替换时只是在分区列之前，分区列不变

 ALTER TABLE order_joined_extend REPLACE COLUMNS 
  ( 
   product_id string, 
   product_name string, 
   bd_name string 
   )

增加列，之后在分区之前的最后一列加，不能指定到某列之后

alter table order_joined_extend 
  add columns (add_col_test string)

内部表转外部表

 alter table tablePartition set TBLPROPERTIES ('EXTERNAL'='TRUE');

外部表转内部表

alter table tablePartition set TBLPROPERTIES ('EXTERNAL'='FALSE');

[color=red][b]6. Show/describle [/b][/color]


show databases; 
show tables; 
show tables '*tianzhao*';

显示表中中含有tianzhao的表名
show partitions table_name;
展示表中现有的分区
desc formatted table_name;
可以描述出很多信息，包括字段，location，分区字段,是内表或者外表等；
show functions;
显示可以用的函数列表，包括可用的udf函数。
describe function length;
返回length函数的说明
show table extended like order_joined_extend partition(create_date='2012-09-01',type='ddclick_bang');

[size=large][b]五.DML（数据操作） [/b][/size]
[color=red][b]Hive 只支持 select、insert，不支持 delete、update [/b][/color]

[size=medium][color=red][b]1.load 数据 [/b][/color][/size]

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename[PARTITION (partcol1=val1, partcol2=val2 ...)]

Load本地数据到hive，最好指定本地文件的绝对路径
追加导入数据：

load data local inpath '/home/zhouweiping/d.dat'  into table order_joined_extend1;

覆盖导入数据：

load data local inpath '/home/zhouweiping/d.dat'  overwrite into table order_joined_extend1;

加载hdfs上的数据到hive表
如果是外表可以直接将数据文件拷贝到location的目录

Hadoop fs –cp from location

[size=medium][color=red][b]内表或者外表都可以用load的方法 [/b][/color][/size]

load data inpath '/home/zhouweiping/d.dat'  into table order_joined_extend1;

load 数据时：
[color=red][b] 如果数据在本地，会将本地数据复制一份到 hdfs 中表的 location；
如果是 hdfs 是的数据，会直接移动到 location；所以如果 load 数据的数据文件跟 location 相同，会报错； [/b][/color]

[size=medium][color=red][b]2.Insert [/b][/color][/size]
插入数据到非分区表

Insert overwrite table table1 
Select * from table2

[color=blue][b]插入数据到分区表，需要指定分区值 [/b][/color]

insert overwrite table order_joined_extend partition (create_date='2012-09-01',type='ddclick_bang') 
select addr_id,alliance_id,allot_quantity,city_ship_type_desc, from 
order_joined_extend1;

一个输入，多个输出

Insert overwrite table table1 Select * from table2 
Insert overwrite table table3 Select * from table2

[color=red][b]动态分区 [/b][/color]

set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nostrict; 
INSERT OVERWRITE TABLE order_joined_extend PARTITION(createdate,type) 
SELECT * FROM order_joined_extend1 ;

Hive将会以select的最后两列作为动态分区的值，将createdate,type相同的列插入到一个
partition中
将query的结果写入文件
写到本地文件：

 insert overwrite local directory '/home/zhouweiping/directory.dat' 
select * from order_joined_extend limit 10;

写到hdfs：

insert overwrite  directory '/home/zhouweiping/directory.dat' 
select * from order_joined_extend limit 10;

[color=red][b]3.select [/b][/color]
[color=red][b]一般的 SQL 语句都支持 [/b][/color]

SELECT [ALL | DISTINCT] select_expr, select_expr, ... 
  FROM table_reference 
  [WHERE where_condition] 
  [GROUP BY col_list] 
  [CLUSTER BY col_list 
  | [DISTRIBUTE BY col_list] [SORT BY col_list] 
  ] 
[LIMIT number]

[b]在使用聚合函数时，select 的列必须是 group by 后面的字段或者只用了聚合函数的；[/b]

[color=red][b]4.Join [/b][/color]
Hive 只支持等值连接（equality joins）、外连接（outer joins）和（left semi join）。[color=blue][b]Hive不支持所有非等值的连接，因为非等值连接非常难转化到 map/reduce 任务；[/b][/color][color=red][b]Hive也不支持 in 子查询，但是可以用 left semi join 实现 in 操作。[/b][/color]另外，Hive支持多于 2 个表的连接。
JOIN子句中表的顺序很重要，[color=red][b]一般是把数据量大的表放后面。 [/b][/color]

转自：[url]http://qingyuan-jishu.iteye.com/blog/2068536[/url]