hive sql基本语法

最新推荐文章于 2024-08-07 10:49:21 发布

第六序列

最新推荐文章于 2024-08-07 10:49:21 发布

阅读量794

点赞数

分类专栏： hive 文章标签：数据库

本文链接：https://blog.csdn.net/weixin_55527323/article/details/126005088

版权

hive 专栏收录该内容

5 篇文章 2 订阅

订阅专栏

hsql部分

1.DDL

1.1外部表

1.外部表建表关键字 external
作用：删除该表不会删除数据
使用场景：每天将收集到的网站日志定期流入 HDFS 文本文件。在外部表（原始日志表）的基础上
做大量的统计分析，用到的中间表、结果表使用内部表存储，数据通过 SELECT+INSERT 进入内部表。

2.修改外部表 student2 为内部表
alter table student2 set tblproperties('EXTERNAL'='FALSE');

1.2修改表

1.重命名表
alter table text rename to test;
2.增加/修改/替换列信息
alter table text add columns(name string);
alter table text change id new_id string;
alter table text replace columns(id string);
替换是替换所有表信息。

1.3删除表

drop table text;

1.4查询表

desc formatted test;

2.DML

2.1加载数据

load data local inpath 'data/test.txt' overwrite into table test;
#overwrite 表示覆盖

2.2插入数据

insert into table test values('1006','big');
insert overwrite table test values('1006','big');#覆盖插入

2.3 as select 创建表

create table if not exists student3
as select id, name from student;

2.4localtion 指定数据源建表

create external table test10(
id string,
name string
)
row format delimited
fields terminated by '\t'
location '/user/hive/warehouse/test';

2.5查询（大块）

1.常用函数
	count(),msx(),min(),avg(),sum()
2.limit语句,长用于返回前几条数据
3.like和rlike
	like 'A%',rlike[A]查找名字中带有 A 的员工信息
4.Group by分组
5.having,常用于后接group by
6.join on
	join on内连接
	left join 左外
	right join 右外
	full join满外连接
7.笛卡尔积 
	省略连接条件，所有表相互连接（数据冗余恐怖）
8.order by 
	asc 
	desc 降序
	#全局排序，效率较低
9.Sort By
	每个reduce区内有序
	set mapreduce.job.reduces=3;#设置reduce个数
10.Distribute By
	后接sort by，根据分区字段对reduce个数%后的余数进行分区。
11.Cluster By
	兼具Distribute By和Sort By的功能
	select * from emp cluster by deptno;
	select * from emp distribute by deptno sort by deptno;
	#以上两种写法等价

2.6导入导出数据

1.格式化导出到本地
insert overwrite local directory 
'/opt/module/hive/data/export/student1'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select * from student;
#不加local就是hdfs路径