## DDL:Hive Data Definition Language
create、delete、alter…
Hive数据抽象/结构
官网Overview:
HiveQL DDL statements are documented here, including:
CREATE DATABASE/SCHEMA, TABLE, VIEW, FUNCTION, INDEX
DROP DATABASE/SCHEMA, TABLE, VIEW, INDEX
TRUNCATE TABLE
ALTER DATABASE/SCHEMA, TABLE, VIEW
MSCK REPAIR TABLE (or ALTER TABLE RECOVER PARTITIONS)
SHOW DATABASES/SCHEMAS, TABLES, TBLPROPERTIES, VIEWS, PARTITIONS, FUNCTIONS, INDEX[ES], COLUMNS, CREATE TABLE
DESCRIBE DATABASE/SCHEMA, table_name, view_name
PARTITION statements are usually options of TABLE statements, except for SHOW PARTITIONS.
/user/hive/warehouse是Hive默认的存储在HDFS上的路径
Create/Drop/Alter/Use Database
1.创建数据库Create Database
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment]
[LOCATION hdfs_path]
[WITH DBPROPERTIES (property_name=property_value, ...)];
example:1.CREATE DATABASE hive;
2.CREATE DATABASE IF NOT EXISTS hive;
3.CREATE DATABASE IF NOT EXISTS hive2 LOCATION '/test/location';
4.CREATE DATABASE IF NOT EXISTS hive3 WITH DBPROPERTIES('creator'='ss');**
5.查看创建信息 desc database hive3;
desc database extended hive3;
2.删除Drop Database
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
example: drop database hive;
3.修改Alter Database
ALTER (DATABASE|SCHEMA) database_name SET DBPROPERTIES (property_name=property_value, ...); -- (Note: SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
ALTER (DATABASE|SCHEMA) database_name SET LOCATION hdfs_path; -- (Note: Hive 2.2.1, 2.4.0 and later)
ALTER DATABASE emp rename to emp2;
4.Use Database
USE database_name;
USE DEFAULT;
5.创建表Create Table
CREATE TABLE emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
查看表:desc formatted emp;
加载数据:
LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp;
## DML:Hive Data Manipulation Language
1.Loading files into tables
将文件加载到表中
在将数据加载到表中时,Hive不会进行任何转换。加载操作当前是纯复制/移动操作,它将数据文件移动到与Hive表对应的位置
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
LOCAL:本地系统,如果没有local那么就是指的HDFS的路径
OVERWRITE:是否数据覆盖,如果没有那么就是数据追加
LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp;
LOAD DATA INPATH 'hdfs://hadoop000:8020/data/emp.txt' INTO TABLE emp;
2.Inserting data into Hive Tables from queries
从查询中将数据插入Hive表
可以使用insert子句将查询结果插入表中。
通过查询语句写结果到表中去:
create table emp1 as select *from emp;
create table emp2 as select empo,ename,job,deptno from emp;
3.Writing data into the filesystem from queries
从查询中将数据写入文件系统
可以使用上面语法的略微变化将查询结果插入到文件系统目录中:
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/hive/'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
select empno,ename,sal,deptno from emp;
4.select常用语句:(统计)
select * from map;
select ename,empno from emp limit 3;
select * from emp where deptno=10;
select * from emp where ename='ss';
select * from emp where enpno >=780;
select * from emp where enpno <=780;
select * from where sal between 800 and 1300;
select * from emp where ename in ('ss','aa');
select * from emp where ename not in ('ss','aa');
select * from emp where comm is null;
select * from emp where comm not is null;
5.聚合max/min/sum/avg
select count(1) from emp where depto=10;
查看:select * from emp where depto=10;
select max(sql),min(sql),sum(sql),avg(sql) from emp;
6.分组函数: group by
求每个部门的平均工资
出现在select中的字段,如果没有出现在聚合函数里,那么一定要实现在group by里
select deptno, avg(sal) from emp group by deptno;
求每个部门、工作岗位的平均工资
select deptno,job, avg(sal) from emp group by deptno,job;
求每个部门的平均工资大于2000的部门
报错:select deptno, avg(sal) avg_sal from emp group by deptno where avg_sal>2000;
对于分组函数过滤要使用having
select deptno, avg(sal) avg_sal from emp group by deptno having avg_sal>2000;
7.join : 多表
假设现在有两张表:emp , dept
创建dept表:
CREATE TABLE dept(
deptno int,
dname string,
loc string
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
导入数据:
LOAD DATA LOCAL INPATH '/home/hadoop/data/dept.txt' OVERWRITE INTO TABLE dept;
explain EXTENDED
select
e.empno,e.ename,e.sal,e.deptno,d.dname
from emp e join dept d
on e.deptno=d.deptno;
8.内部表&外部表
MANAGED_TABLE:内部表
删除表:HDFS上的数据被删除&Meta信息删除
EXTERNAL_TABLE:外部表
删除表:HDFS上的数据不删除&Meta被删除
创建外部表:
CREATE EXTERNAL TABLE emp——external(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
location '/external/emp/;
加载数据到外部表emp_external:
LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE emp_external;
查看外部表:
desc formatted emp_external;(显示EXTERNAL_TABLE)
msql上查看元数据:
1.use hadoop_hive;
2.select * from TBLS \G;
9.分区表
partitoned by
create external table info(
ip string,
country string,
province string,
city string,
url string,
page string
) partitioned by (day string)
Row FORMAT DELIMITED FIELDS TERMINATED By '\t'
location '/pss/info/';