Hive------基本语句

最新推荐文章于 2024-09-06 22:10:34 发布

luoyunfan6

最新推荐文章于 2024-09-06 22:10:34 发布

阅读量110

点赞数

分类专栏： Hive 文章标签： hive

本文链接：https://blog.csdn.net/luoyunfan6/article/details/98106252

版权

Hive 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

hive基本语句

创建库：
craete database if not exists online;
显示库：
show databases;
切换库：
use databasename;
删除表：
drop table if exists u1;
删除空库：
drop database if exists name;
强制删除库：
drop database if exists name cascade;
查看数据库：
desc database name;
创建表：

create table  if not exists  u4(
    id bigint comment 'this userId',
    name String,
    sex tinyint
)
comment 'this is table of user'
row format delimited fields terminated by '\t'
lines terminated by '\n'
stored as textfile
;

创建外部表：（external关键字）

create external table if not exists log(
    id String,
    phone  bigint,
    upflow  int,
    downflow int
)
comment 'this is table of user'
row format delimited fields terminated by '\t'
lines terminated by '\n'
stored as textfile
;

加载数据：

//从本机加载数据


load data local inpath '/home/hadoop/u1' into table u1;

//从hdfs加载数据

load data inpath '/u1' into table u3;

//从另一个表中查询插入

insert into u2 select id,name,sex from u1;

查看表描述：

desc u1;
desc extended u1;
desc formatted u1;
show create table u1;

创建表的本质：在hdfs中对应的库下面创建目录，在源数据表中添加对应信息。

修改表

alter table：修改列，添加列，删除列，内部表转成外部表。
改变表名 - ALTER TABLE 旧表名 RENAME TO 新表名
增加一列 - ALTER TABLE 表名 ADD COLUMN 列名数据类型
删除列-alter table t2 drop column c;

//hive的外部表和内部表

创建：默认创建内部，创建外部表需要关键字external
删除：删除内部表会删除元数据和数据内容；
删除外部表只删除元数据，不删除数据内容（只删除数据库中的表，不删除hdfs上的文件）。

//适用场景：

内部表：多用于临时表、中间表
外部表：多用于数据源

//查询

1.左链接

left join,left semi join,left outer join

//查询jones的mgr

select e1.ename
from emp e left join emp e1
on e.mgr = e1.empno
where e.name = "JONES";

2.右连接

right join,right outer join

//hive 不支持right semi join

上述二者都是以右表为准，匹配坐标，坐标匹配不存在用null来代替。

3.内连接

多表用“，”分开，join，inner join

hive的on只支持等值连接，不支持 > < >= != …

4.//设置map端join

hive.auto.convert.join=true

//如果文件大于这个值则不会默认转换成map端join。hive.mapjoin.smalltable.filesize=25000000 23.8M

group by 语句

通常和having 搭配使用，通常和聚合函数搭配使用having:对分组玩之后的结果集进行过滤。
带group by的语句，select后面的字段要么在group by后面出现，要不在聚合函数中。

select a,sum(b) 
from table1 s 
group by a    
order by s.a desc;

排序

sort by：局部排序，只是一个reducer中的数据排序。

order by:全局排序。整个job中的所有reducer中的数据都会排序。（通常使用一个）

   当reducer数量为1时候，两者都一样。
   通常和desc、asc搭配使用，默认是asc
   //设置  set mapreduce.job.reduces=3;

distribute by: 分到多个reducer

它和sort同时存在，并且在sort by 前面

	select a,b
	from table
	distribute by a
	sort by b asc;

cluster by:

兼有distribute by 和 sort by 的功能，但是sort by 需要时升序。

 	select a,b
	from table
	distribute by a
	cluster by b asc;

合并

union：、union all：都是将多个结果合并

union：去重并排序

union all：不去重不排序，只是合并

luoyunfan6

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive------基本语句

hive基本语句创建库：craete database if not exists online;显示库：show databases;切换库：use databasename;删除表：drop table if exists u1;删除空库：drop database if exists name;强制删除库：drop database if exists name ca...
复制链接

扫一扫

专栏目录