Hive(二)——数据导入、查询、函数等

最新推荐文章于 2022-07-29 11:50:38 发布

千里草竹

最新推荐文章于 2022-07-29 11:50:38 发布

阅读量226

点赞数

分类专栏：大数据系列

本文链接：https://blog.csdn.net/u012848709/article/details/83241100

版权

大数据系列专栏收录该内容

36 篇文章 0 订阅

订阅专栏

楔子

学习了解hive

数据操作

数据导入

向表中装载数据(load)

语法load data [local] inpath '路径' [overwrite] into table 表名 partition(partcol1=val1,...)

load data:表示加载数据
local :表示从本地加载数据到hive,否则从HDFS加载数据到hive表
inpath:表示加载数据的路劲
into table :表示加载到哪张表
overwrite:表示覆盖表已有数据，否则表示追加
patrtition:表示上传到指定分区

1 创建表

create table student(id string ,name string) row format delimited fields terminated by '\t' ;

2 加载本地数据

load data local inpath '/opt/module/data/stu.txt' into table student ;

`Linux效果`

Time taken: 0.318 seconds, Fetched: 3 row(s)
hive (db_hive)> create table student(id string ,name string) row format delimited fields terminated by '\t' ;
OK
Time taken: 1.87 seconds
hive (db_hive)> load data local inpath '/opt/module/data/stu.txt' into table student ;
Loading data to table db_hive.student
Table db_hive.student stats: [numFiles=1, totalSize=54]
OK
Time taken: 18.68 seconds
hive (db_hive)> select * from student;
OK
student.id	student.name
1	grq
2	sunjie
3	taoshuai
4	zhucaixi
5	sunjiege
Time taken: 0.899 seconds, Fetched: 5 row(s)
hive (db_hive)>

3 加载HDFS文件到hive中

#上传文件到hdfs
[grq@hadoop102 data]$ hadoop fs -put /opt/module/data/stu.txt /
# 加载HDFS上数据
hive (db_hive)> load data inpath '/stu.txt' into table student;

通过查询语句创建向表中插入

1 创建分区表

hive (db_hive)> create table stu(id string ,name string ) partitioned by (month string) row format  delimited fields terminated by '\t';

2 插入数据

# 基本插入数据(会执行MapReduce)
hive (db_hive)> insert into table stu partition(month='201809') values('0001','grq');

# 基本模式插入数据(根据单张表查询结果)
hive (db_hive)> insert  overwrite table stu partition(month='201810') select * from student;

查询语句创建并加载数据

hive (db_hive)> create table if not exists stud  as select * from stu;

创建时通过location指定加载数据路劲

1 创建表，并指定在hdfs上的位置

hive (db_hive)> create table if not exists stude (id string ,name string) row format delimited fields terminated by '\t' location '/stu.txt';

import 数据到指定hive表中

# 测试有问题，有待解决
import table stu partition(month='201806') from '/stu.txt'

数据导出

insert 导出

将查询结果导出到本地

-- 会执行MP过程
hive (db_hive)> insert overwrite local directory '/opt/module/data/export/stu'
              > select * from stu;

在这里插入图片描述
将查询结果格式化导出到本地

文件夹会自动创建

hive (db_hive)> insert overwrite local directory '/opt/module/export/stu1'
              > row format delimited fields terminated by '\t'
              > collection items terminated by '\n'
              > select * from stu;

在这里插入图片描述
将查询结果导出到HDFS (没有local)

hive (db_hive)> insert overwrite directory '/user/'
              > row format delimited fields terminated by '\t'
              > collection items terminated by '\n'
              > select * from stu;

在这里插入图片描述

hadoop 命令导出到本地

hive (db_hive)> dfs -get  /user/hive/warehouse/db_hive.db/stud/000000_0 /opt/module/data/export/stu_hadoop.txt;

hive shell命令导出

hive -f/-e 执行语句或者脚本 > file

hive -e 'select * from stu;'>/opt/module/data/export/stu_hive.txt

export导出到hdfs

hive (db_hive)> export table db_hive.stu to '/user/hibe/warehouse/export/stu';

查询

常用函数

-- count 求总行数
select count(*) from stu;

-- 最大 最小 总和 平均
hive (db_hive)> select max(id),min(id),sum(id),avg(id) from stu;

-- limit语句 限制返回的行数
hive (default)> select * from emp limit 5;

-- where 条件查询

group by 和 having

-- group by

---- 查询部门平均工资 (会执行MP过程)
hive (default)> select t.deptno,avg(t.sal) from emp t group by t.deptno;
---- 查询部门最高工资
hive (default)> select t.deptno ,t.job,max(t.sal) from emp t group by t.deptno,t.job


-- having（只是用于 group by分组统计语句）
---- 每个部门平均薪水大于200的部门
hive (default)> select deptno,avg(sal)  dep_sal from emp group by deptno having dep_sal>200;