Hive DDL DML

## DDL:Hive Data Definition Language
create、delete、alter…
Hive数据抽象/结构
在这里插入图片描述
官网Overview:
HiveQL DDL statements are documented here, including:

CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
TRUNCATE TABLE
ALTER DATABASE/SCHEMA, TABLE, VIEW
MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, VIEWS, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
DESCRIBE DATABASE/SCHEMA, table_name, view_name
PARTITION statements are usually options of TABLE statements, except for SHOW PARTITIONS.
/user/hive/warehouse是Hive默认的存储在HDFS上的路径

Create/Drop/Alter/Use Database
1.创建数据库Create Database

CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
  [COMMENT database_comment]
  [LOCATION hdfs_path]
  [WITH DBPROPERTIES (property_name=property_value, ...)];
example:1.CREATE DATABASE hive;
         2.CREATE DATABASE IF NOT EXISTS hive;
         3.CREATE DATABASE IF NOT EXISTS hive2 LOCATION '/test/location';
         4.CREATE DATABASE IF NOT EXISTS hive3 WITH DBPROPERTIES('creator'='ss');**
         5.查看创建信息 desc database hive3;
                   desc database extended hive3;

2.删除Drop Database

DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
example: drop database hive;

3.修改Alter Database

ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...);   -- (Note: SCHEMA added in Hive 0.14.0)
 
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
  
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)

ALTER DATABASE emp rename to emp2;

4.Use Database

USE database_name;
USE DEFAULT;

5.创建表Create Table

CREATE TABLE emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

查看表:desc formatted emp;

在这里插入图片描述
加载数据:
LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp;

## DML:Hive Data Manipulation Language
1.Loading files into tables
将文件加载到表中
在将数据加载到表中时,Hive不会进行任何转换。加载操作当前是纯复制/移动操作,它将数据文件移动到与Hive表对应的位置

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
 
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)

LOCAL:本地系统,如果没有local那么就是指的HDFS的路径
OVERWRITE:是否数据覆盖,如果没有那么就是数据追加

LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp;

LOAD DATA INPATH 'hdfs://hadoop000:8020/data/emp.txt' INTO TABLE emp;

2.Inserting data into Hive Tables from queries
从查询中将数据插入Hive表
可以使用insert子句将查询结果插入表中。

通过查询语句写结果到表中去:
create table emp1 as select *from emp;
create table emp2 as select empo,ename,job,deptno from emp;

3.Writing data into the filesystem from queries
从查询中将数据写入文件系统
可以使用上面语法的略微变化将查询结果插入到文件系统目录中:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/hive/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select empno,ename,sal,deptno from emp;

4.select常用语句:(统计)

select * from map;
select ename,empno from emp limit 3;
select * from emp where deptno=10;
select * from emp where ename='ss';
select * from emp where enpno >=780;
select * from emp where enpno <=780;
select * from where sal between 800 and 1300;
select * from emp where ename in ('ss','aa');
select * from emp where ename not in ('ss','aa');
select * from emp where comm is null;
select * from emp where comm not is null;

5.聚合max/min/sum/avg

select count(1) from emp where depto=10;
查看:select * from emp where depto=10;
select max(sql),min(sql),sum(sql),avg(sql) from emp;

6.分组函数: group by

求每个部门的平均工资
	出现在select中的字段,如果没有出现在聚合函数里,那么一定要实现在group by里
	select deptno, avg(sal) from emp group by deptno;

	求每个部门、工作岗位的平均工资
	select deptno,job, avg(sal) from emp group by deptno,job;
	
	求每个部门的平均工资大于2000的部门
	报错:select deptno, avg(sal) avg_sal from emp group by deptno where avg_sal>2000;

	对于分组函数过滤要使用having
select deptno, avg(sal) avg_sal from emp group by deptno having avg_sal>2000;	

7.join : 多表
假设现在有两张表:emp , dept

创建dept表:
CREATE TABLE dept(
deptno int,
dname string,
loc string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

导入数据:
LOAD DATA LOCAL INPATH '/home/hadoop/data/dept.txt' OVERWRITE INTO TABLE dept;

explain EXTENDED
select  
e.empno,e.ename,e.sal,e.deptno,d.dname 
from emp e join dept d 
on e.deptno=d.deptno;

8.内部表&外部表
MANAGED_TABLE:内部表
删除表:HDFS上的数据被删除&Meta信息删除

EXTERNAL_TABLE:外部表
删除表:HDFS上的数据不删除&Meta被删除
创建外部表:

CREATE EXTERNAL TABLE emp——external(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
location '/external/emp/;

加载数据到外部表emp_external:
LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp_external;

查看外部表:
desc formatted emp_external;(显示EXTERNAL_TABLE)
msql上查看元数据:
1.use hadoop_hive;
2.select * from TBLS \G;

9.分区表
partitoned by

create external table info(
ip string,
country string,
province string,
city string,
url string,
page string
) partitioned by (day string)
Row FORMAT DELIMITED FIELDS TERMINATED By '\t'
location '/pss/info/';
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值