Hive SQL DML语法笔记

乐土挚友

已于 2022-08-12 00:57:12 修改

阅读量822

点赞数

文章标签： hive hadoop 大数据

于 2022-08-11 17:43:21 首次发布

本文链接：https://blog.csdn.net/qq_62380583/article/details/126269891

版权

(load data local inpath '/root/hivedata/students.txt' into tadle itheima.student)

3student 到HDFS/(数据在HDFS)

（load data local inpath ‘/students.txt' into tadle itheima.student——hdfs'）

Load加载数据

将数据文件移动到与Hive表对应的位置，移动时是纯复制、移动操作

不会对表中的数据内容进行任何转换，任何操作

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename;

local本地指的是Hiveserver2服务所在机器的本地Linux文件系统

没local从hdfs加载

filepath表示待移动数据的路径。可以指向文件（在这种情况下，Hive将文件移动到表中），也可以指向目录（在这种情况下，Hive将把该目录中的所有文件移动到表中）

(load data local inpath '/root/hivedata/students.txt' into tadle itheima.student)

Insert插入数据select

creat tadle t_2(id int,name string)

insert into table t_2 values(1,'zhangsan') (慢)

table students(num int,name string,sex string,age int,dept string)row format delimited fields terminated by ',';

load data local inpath '/root/hivedata/students.txt' into table students;

insert into table student_from_insert select num,name from students;

insert +select

将后面查询返回的结果作为内容插入到指定表中

需要保证查询结果列的数目和需要插入数据表格的列数目一致

如果查询出来的数据类型和插入表格对应的列数据类型不一致，将会进行转换，但是不能保证转换一定成功，转换失败的数据将会为NULL

INSERT INTO TABLE tablename select_statement1 FROM from_statement;

（insert into tadle 表名字1 select 列名1，列名2 from 表名2 ）

查询数据select

关键字执行顺序

from > where > group（含聚合）> having >order > select

聚合语句(sum,min,max,avg,count)要比having子句优先执行

where子句在查询过程中执行优先级别优先于聚合语句(sum,min,max,avg,count)

select语法树与参数详解

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[ORDER BY col_list]
[LIMIT [offset,] rows];

select_expr参数

例子：select * from 表名；

select 列名 from 表名；

select 1 from 表名字；（表名点ctrl q显示详细信息）

select crrent_database();（查询函数）

ALL | DISTINCT参数

all显示全部（默认）

distinct去除重复的

多个字段整体去重（把两个列（字段）当作一个整体）

例子：

a bbb 去重后： a bbb

a bbb a ccc

a ccc c mmm

c mmm

WHERE 参数

WHERE后面是一个布尔表达式（TF）用于查询过滤，当布尔表达式为true时，返回
select后面expr表达式的结果，否则返回空

例子：select * from t_usa_covid19 where state ='cailfornia'(当列名是cail....时返回)（按行）

select * from t_usa_covid19 where length(state)(当此列长（洲名）度超过10)

可以用与where的符

where中不能使用聚合函数

比较、逻辑运算

= < > <= != <>(不等于)

and or

select * from t_usa_covid19 where sal>100 or comm>500(sal大于100或comm大于500)

特殊条件

空值判断

where comm is null;

between and

where als between 1500 and 3000;

where sal in(1,2,3);

GROUP BY 参数

GROUP BY语句用于结合聚合函数，根据一个或多个列对结果集进行分组；
如果没有group by语法，则表中的所有行数据当成一组

例子：

select count(county) from t_usa_covid19 where count_date = "2021-01-28" group by state;

某天county人数按state分组

select state,count(county) from t_usa_covid19 where count_date = "2021-01-28" group by state;同时显示state名字

LIMIT [offset,] rows参数

限制SELECT语句返回的行数，接受一个或两个数字参数，这两个参数都必须是非负整数常量

limit 5 返回前5条

limit 2，3 从索引为2的开始（第三行）返回三行

语法限制

出现在GROUP BY中select_expr的字段：要么是GROUP BY分组的字段；要么是被聚合函数应用的字段。
原因：避免出现一个字段多个值的歧义

先分组再函数（max是组里的最小值。。。。。）

ORDER BY参数排序

根据指定的列对结果集进行排序默认按照升序asc（以行为单位）

降序dese

select * from t_usa_covid19 order by cases;（根据cases进行排序）

select * from t_usa_covid19 order by cases asc;（升序）

聚合操作（max min......）

不管原始数据有多少行记录，经过聚合操作只返回一条数据

常见集合操作

AVG(column) 返回某列的平均值
COUNT(column) 返回某列的行数（不包括 NULL 值）
COUNT(*) 返回被选行数
MAX(column) 返回某列的最高值
MIN(column) 返回某列的最低值
SUM(column) 返回某列的总和

例子：

select count(county) from t_usa_covid19;（返回county列的行数）

select count(county) as county_cnts from t_usa_covid19;（返回的列名改为county_cnts）

select count(distinct county) as county_cnts from t_usa_covid19;（顺便去重）

select count(county) from t_usa_covid19 where state = "California";（统计满足条件的）

Haveing分组后过滤

where 里不能有聚合函数

having可以

where sum（city）>1

where分组前对数据进行过滤having分组后进行过滤

先分组前分组同时聚合再分组后

HAVING子句可以让我们筛选分组后的各组数据,并且可以在Having中使用聚合函数，因为此时where，group by已经执行结束，结果集已经确定

select state,sum(deaths) from t_usa_covid19 where count_date = "2021-01-28" group by state having sum(deaths) > 10000;

sum（death）as mi

having mi(少一次计算更加高效)

Join关联查询

join语法规则

根据两个或多个表中的列之间的关系，从这些表中共同组合查询数据

最重要的两种join inner join（内连接）、left join（左连接）

join语法

join_table:
table_reference [INNER] JOIN table_factor [join_condition]
| table_reference {LEFT} [OUTER] JOIN table_reference join_condition

table_reference：是join查询中使用的表名
table_factor：与table_reference相同,是联接查询中使用的表名

join_condition：join查询关联的条件，如果在两个以上的表上需要连接，则使用AND关键字

inner join内链接

其中inner可以省略：inner join == join 进行连接的两个表中都存在与连接条件相匹配的数据才会被留下来

select e.id,e.name,e_a.city,e_a.streetfrom employee e inner join employee_address e_aon e.id =e_a.id;（给employee别名e  employee_address别名e_a   查其中id相同的部分）

隐式链接表示法

select e.id,e.name,e_a.city,e_a.streetfrom employee e , employee_address e_awhere e.id =e_a.id;

left join左连接

join时以左表的全部数据为准，右边与之关联；左表数据全部返回，右表关联上的显示返回，关联不上的显示null返回

select e.id,e.name,e_conn.phno,e_conn.emailfrom employee e left join employee_connection e_connon e.id =e_conn.id;

函数

show functions查看当下可用的所有函数

describe function extended 函数名来查看函数的使用方式

函数分类

内置函数（Built-in Functions）、用户定义函数UDF（User-Defined Functions）

内置函数可分为：数值类型函数、日期类型函数、字符串类型函数、集合函数、条件函数等

用户定义函数根据输入输出的行数可分为3类：UDF、UDAF、UDTF

UDF：普通函数，一进一出

UDAF：聚合函数，多进一出

UDTF：生成函数，一进多出

内置函数

官方地址LanguageManual UDF - Apache Hive - Apache Software Foundation

字符串函数

处理字符串

select length("itcast");        长度select reverse("itcast");       反转字符串select concat("angela","baby"); 链接字符串select concat_ws('.', 'www', array('itcast', 'cn'));拼接以.为分隔符  数组select substr("angelababy",-2);从-2索引向后截取select substr("angelababy",2,2);从索引2开始截取两位（ng）select split('apache hive', ' ');以空格为分隔符切割返回数组select split('apache hive', ' ')[0]利用索引取数 字符串是从1 数组从0

日期函数

时间戳：从1970 0101开始每过一秒加一

select current_date();显示当前日期select unix_timestamp();返回当前时间戳select unix_timestamp("2011-12-07 13:01:03");指定日期转化为时间戳select from_unixtime(1618238391);时间戳转化为当前日期

select datediff('2012-12-08','2012-05-09');看两个日期之间相差几天select date_add('2012-02-18',10);当前日期加10天是哪一天select date_sub('2012-01-1',10);当前日期减10天是哪一天

数学函数

select round(3.1415926);取整四舍五入select round(3.1415926,4);保留4位小数select rand();0 1之间取随机数select rand(3);加上种子 再次执行刷新 随机数不变

条件函数

if条件 if（1，2，3）如果1为T返回2 否则3

select if(1=2,100,200);select if(sex='男','m','w') from student limit 3；几行返回几个m w数量和

条件转换函数: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END

select case 100 when 50 then 'tom' when 100 then 'mary' else 'tim' end;select case sex when '男' then 'male' else 'female' end from student limit 3;

如果...... 则...... end

select nvl(null,"itcast");如果第一个参数不为空则返回第一个否则第二个

乐土挚友

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫