Hive(DML)数据操作

最新推荐文章于 2023-12-22 16:59:25 发布

小杨A

最新推荐文章于 2023-12-22 16:59:25 发布

阅读量357

点赞数 5

文章标签： hive hadoop 大数据

本文链接：https://blog.csdn.net/weixin_58837843/article/details/129831932

版权

1.上传数据给数据表两种式

从linux创建的表导入到hive(hadoop)中

-----在linux界面

hadoop fs -put 'hive数据表' hive中的数据表路径

从hadoop中的数据表导入到hive中

------在hive界面

语法:

load data [local] inpath '数据表的路径' [overwrite] into table table_name;

local是使用Hadoop上本地的数据表路径;反则是hdfs上的数据表路径

overwrite:写数据时覆盖原来表中的数据

案例:

load data inpath '/user/hive/warehouse/student001' overwrite into table student;

2.插入数据

1)将查询的结果插入数据表

语法:

      insert (into | overwrite)  table table_name select_statement;

案例:

      insert into table student1 select * from student;

2)values将数据插入

语法:

       insert (into | overwrite)  table table_name values(colu_name,colu_type),..;

案例:

       insert into table student values("yy",20,"China",123)

3)将查询结果插入到目标路径

语法:

      insert overwrite [local] directory '目标路径'
      row format serde...  
      select_statement;

案例:

      insert overwrite local directory '/opt/module/hive/datas/student001' 
      select * from student;

3.export | import

export是将hive中数据表导出到hdfs上的路径

import是将数据导入hive数据表中

适用于两个不同hive集群之间数据传输

1)export

语法:

         export table table_name to '导出目的路径'

案例:

         export table default.student to '/user/hive/export/student_001'

2)import

语法:

          import table table_exit_name from '需要导入数据的源路径'

案例:

          import table student2 from '/user/hive/export/student_001'

4.查询数据

语法:

       select [all|distinct] * from table_name
       [where 过滤条件] [group by colu_name] **分组**
       [having 分组后的过滤条件][order by colu_name] **排序**
       [limit number] **限制行数

案例:

       select * from student where id < 20;
       或者
       select name,id from student where id <20;

案例:聚合和group一起使用

1)关系运算符

      <, >, =, <=, >=, !=,[not]like, between B and C , in , is [not] null ...

案例:

     select * from student where id in(19,20);

2) 逻辑运算符

and , or , not

     select * from student where id = 19 or id = 20;

3)聚合函数

count(*) : 统计数据表有多少行

count(colu_name) : 统计数据表有多少行 null不算在其中

sum(colu_name) : 求总和

max():求最大值

min():求最小值

avg():求最平均值有null不算在其中

案例:分组

     select count(*) as a from student  group by id; ##统计每各组的人数
     或者
     select avg(id) from student;

4)having(通常和group by 一起使用)

语法: group by colu_name having '过滤条件'

案例:

     select * from ods_weblog_origin group by request having _c1 > 50;

4.join

1)join连接(内连接)

语法: from table table_name1 as a join table_name2 as b ##as 重新命名表名

on a.colu_name = b.colu_name

案例:两个表之间的连接

         select dept.*,emp.*  from dept join emp on dept.id = emp.id

2)左连接

语法: from table_name1 [as 别名]

left [outer] join table_name2 [as b]

on a.colu_name = b.colu_name

##左边的行都能显示出来,左边的数据,右边没有数据就显示null

案例:

       select dept.deptno,dept.ename,emp.* from dept 
       left join emp on dept.id = emp.id

3)右连接

语法:from table_name1 [as 别名]

right [outer] join table_name2 [as b]

on a.colu_name = b.colu_name

案例:

       select dept.deptno,dept.ename,emp.* from dept 
       right join emp on dept.id = emp.id

4)满连接

语法:from table_name1 [as 别名a]

full join table_name2 [as b]

on a.colu_name = b.colu_name

案例:

       select dept.deptno,dept.ename,emp.* from dept 
       full join emp on dept.id = emp.id

5)多表连接

语法:from table_name1 [as 别名a] join table_name2 [as b]

on a.colu_name = b.colu_name

join table_name3 [as c]

on b.no = c.no

案例:

        select * from ...

5.排序

1)order by(全局排序)

asc:升序

desc :降序

语法:select * from table_name order by colu_name;

##order by 通常和limit使用;作用是减少mapreduce的执行压力

案例1:

     select * from emp order by sal desc limit 3;

2)sort by(对每个reduce内部进行排序)

设置reduce个数: set mapreduce.job.reduces = 3

语法: sort by colu_name desc;

案例2:

       set mapreduce.job.reduces = 3
       selecet * from emp sort by deptno desc;
       或者可以查看每个reduce内部的排序
       insert overwrite local directory '/opt/module/datas/hive_demo/sort_result'
       select * from emp sort by deptno desc;

3)distribute by

需要设置reduce个数(分区排序)

语法: distribute by deptno sort by deptno desc;

案例3:

        insert overwrite local directory '/opt/module/datas/hive_demo/distribute_result'
        select * from emp distribute by deptno
        sort by sal desc;

4)cluster by

主要是对distribute by colu_name 和 sort by colu_name 进行简写 ;前提是colu_name需要相同

小杨A

关注

5
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
11
评论
Hive(DML)数据操作

语法: from table table_name1 as a join table_name2 as b ##as 重新命名表名。import table table_exit_name from '需要导入数据的源路径'语法:from table_name1 [as 别名a] join table_name2 [as b][having 分组后的过滤条件][order by colu_name] **排序**[where 过滤条件] [group by colu_name] **分组**
复制链接

扫一扫