数据仓库(Hive)——DML操作、查询(上)

最新推荐文章于 2024-05-11 23:50:33 发布

唉.

最新推荐文章于 2024-05-11 23:50:33 发布

阅读量605

点赞数

分类专栏：大数据相关技术文章标签：数据仓库(Hive)——DML操作、查询(上)

本文链接：https://blog.csdn.net/weixin_44240370/article/details/91481425

版权

大数据相关技术专栏收录该内容

10 篇文章 1 订阅

订阅专栏

文章目录

三、DML操作

1、数据导入

<1>、Load向表中装载数据

语法：
（1）load data:表示加载数据
（2）local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表
（3）inpath:表示加载数据的路径
（4）into table:表示加载到哪张表
（5）student:表示具体的表
（6）overwrite:表示覆盖表中已有数据，否则表示追加
（7）partition:表示上传到指定分区

实操案例：
创建一张表

hive (default)> create table student(id string, name string) row format delimited fields terminated by '\t';

加载本地文件到Hive

hive (default)> load data local inpath '/opt/module/datas/student.txt' into table default.student;

加载HDFS文件到Hive

上传文件到HDFS
hive (default)> dfs -put /opt/module/datas/student.txt /user/1/hive;

加载HDFS上数据
hive (default)>load data inpath '/user/1/hive/student.txt' into table default.student;

加载数据覆盖表中已有的数据

hive (default)>load data inpath '/user/1/hive/student.txt' overwrite into table default.student;

<2>、通过查询语句向表中插入数据

创建一张分区表

hive (default)> create table student(id string, name string) partitioned by (month string) row format delimited fields terminated by '\t';

基本插入数据

hive (default)> insert into table  student partition(month='201709') values('1004','zhangsi');

基本模式插入(根据单张表查询结果)

hive (default)> insert overwrite table student partition(month='201708') select id, name from student where month='201709';

多插入模式(根据多张表查询结果)

hive (default)> from student
              insert overwrite table student partition(month='201707')
              select id, name where month='201709'
              insert overwrite table student partition(month='201706')
              select id, name where month='201709';

<3>、查询语句中创建表并加载数据

create table if not exists student3 as select id, name from student;

<4>、创建表时通过Location指定加载数据路径

创建表并指定hdfs上的位置

hive (default)> create table if not exists student5(
              id int, name string
              )
              row format delimited fields terminated by '\t'
              location '/user/hive/warehouse/student5';

上传数据到hdfs上

hive (default)> dfs -put /opt/module/datas/student.txt  /user/hive/warehouse/student5;

查询数据

hive (default)> select * from student5;

<5>、Import数据到指定Hive表中

先用export导出后，再将数据导入

hive (default)> import table student2 partition(month='201709') from '/user/hive/warehouse/export/student';

2、数据导出

<1>、Insert导出

(1)、查询的结果导出到本地

hive (default)> insert overwrite local directory '/opt/module/datas/export/student' select * from student;

如果这么输出的话，在输出csv文件的时候会出现一些乱码的情况，所以就有了我们的(2)

(2)、查询的结果格式化导出到本地

hive (default)> insert overwrite local directory '/opt/module/datas/export/student1'
             ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n'
             select * from student;

(3)、查询的结果导出到HDFS上

hive (default)> insert overwrite directory '/user/atguigu/hive/warehouse/student2'
             ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY '\n'
             select * from student;

<2>、Hadoop命令导出到本地

hive (default)> dfs -get /user/hive/warehouse/student/month=201709/000000_0 /opt/module/datas/export/student3.txt;

<3>、Hive Shell命令导出

[1@hadoop1 hive]$ bin/hive -e 'select * from default.student;' > /opt/module/datas/export/student4.txt;

<4>、Export导出到HDFS上

hive (default)> export table default.student to '/user/hive/warehouse/export/student';

3、清除表中数据

hive (default)> truncate table student;

truncate只能删除管理表，不能删除外部表中数据

四、查询(上)

1、基本查询

<1>、列别名

重命名一个列有助于计算，这个其实在学HBase的时候就想到了，列存储数据库(虽然Hive并不是)如果列名过长会过多的消耗内存空间，产生不必要的麻烦，所以as一下是非常方便的事情，下面就来介绍一下

hive (default)> select ename AS name, deptno dn from emp;

as可以写也可以不写

<2>、算术运算符

在这里插入图片描述

<3>、常用函数

(1)、count

hive (default)> select count(*) cnt from emp;

(2)、max

hive (default)> select max(sal) max_sal from emp;

(3)、min

hive (default)> select min(sal) min_sal from emp;

(4)、sum

hive (default)> select sum(sal) sum_sal from emp;

(5)、avg

hive (default)> select avg(sal) avg_sal from emp;

<4>、Limit语句

查询会返回多行数据，limit子句用于限制返回的行数

hive (default)> select * from emp limit 5;

2、Where语句

在这里插入图片描述

<1>、比较运算符（Between/In/ Is Null）

(1)、Between…and…

查询在…和…之间的数据，用在where之后

hive (default)> select * from emp where sal between 500 and 1000;

(2)、in

hive (default)> select * from emp where sal IN (1500, 5000);

(3)、is null

查询空值

hive (default)> select * from emp where comm is null;

<2>、Like和RLike

使用like运算选择类似的值
选择条件可以包含字符或数字(%代表零个或多个字符(任意个字符)；_代表一个字符)
RLike子句是Hive中这个功能的一个扩展，其可以通过java的正则表达式这个更强大的语言来指定匹配条件

查询以2开头薪水的员工信息

hive (default)> select * from emp where sal LIKE '2%';

查找第二个数值为2的薪水的员工信息

hive (default)> select * from emp where sal LIKE '_2%';

查找薪水中含有2的员工信息

hive (default)> select * from emp where sal RLIKE '[2]';

<3>、逻辑运算符(And/Or/Not)

在这里插入图片描述

3、分组

<1>、Group By(important)

GROUP BY语句通常会和聚合函数一起使用，按照一个或者多个列队结果进行分组，然后对每个组执行聚合操作。

hive (default)> select t.deptno, avg(t.sal) avg_sal from emp t group by t.deptno;

<2>、Having

having	where
针对查询结果中的列发挥作用，筛选数据	针对表中的列发挥作用
后面可以使用分组函数	后面不能写分组函数
只用于group by分组统计语句

求每个部门的平均薪水大于2000的部门

hive (default)> select deptno, avg(sal) avg_sal from emp group by deptno having avg_sal > 2000;

唉.

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
数据仓库(Hive)——DML操作、查询(上)

文章目录三、DML操作1、数据导入<1>、Load向表中装载数据<2>、通过查询语句向表中插入数据<3>、查询语句中创建表并加载数据<4>、创建表时通过Location指定加载数据路径<5>、Import数据到指定Hive表中2、数据导出<1>、Insert导出(1)、查询的结果导出到本地(2)、查询的结果格式化导出到本地(3...
复制链接

扫一扫

专栏目录