Hive

最新推荐文章于 2024-04-22 09:30:00 发布

亮子zl

最新推荐文章于 2024-04-22 09:30:00 发布

阅读量146

点赞数

分类专栏：大数据hadoop等

本文链接：https://blog.csdn.net/zhaoliang831214/article/details/109128319

版权

大数据hadoop等专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Hive 是数据统计
Hive 是基于Hadoop的一个数据仓库工具，可以将结构的数据文件映射为一张表，并提供类SQL查询功能。
本质是：将HQL转化成MapReduce程序。
1.Hive处理的数据储在HDFS
2.分析数据底层的实现是MapReduce
3.执行程序运行在YARN上

Hive数据类型
DDL数据义
DML数据操作
查询

企业级调优
9.1 Fetch抓取
9.2 本地模式
9.3 表的优化
9.4 数所倾
9.5 并行执行
9.6 严格模式
9.7 JVM重用
9.8 推测执行
9.9 压缩
9.10 执行计划（Explain）

hive 交互命令
show databases; //查看数据库
use default; //打开数据库使用库
show tables; //查看表
create table student(id int,name string);

交互
bin/hive --help
create table student(id int,name string) row format delimited fields terminated by '\t';
//导入数居
load data local inpath '/opt/module/datas/student.txt' into table default.student
//直接执行sql语句
bin/hive -e "select * from student;";
//执行sql脚本里的sql语句
bin/hive -f /opt/module/datas/hivef.sql
//把查询结果在放到 hive_result.txt文件里
bin/hive -f /opt/module/datas/hivef.sql > /opt/module/datas/hive_result.txt

hive 其他命令操作
退出hive窗口
hive(default)>exit;
hive(default)>quit;

在hive cli命令窗口中如何查看hdfs文件系统
hive> dfs -lsr /
hive> dfs -ls /

在hive cli命令窗口中如何查看hdfs本地系统

hive> ! ls /opt

hive> ! cat /opt/module/datas/student.txt;
1 aa
2 bb 等

查看在hive中输入的所有历史命令
[aa@hadoop101 ~]$ cat.hivehistory

hive 数据类型

=================================分区表========================================
分区：
分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录，把一个大的数据集根据业务需要分割成小的数据集。
在查询时通过WHERRE子句中的表达式选择查询所需要的指定的分区，这样的查询效率会提高很多。

1）创建分区表语法
hive(default)>create table dept_partition(
deptno int, dname string, loc string
)
partitioned by(month string) //**创建分区表关键语句 month按月分区
row format delimited fields terminated by '\t';

加载数据到分区表中
hive (default)>load data local inpath '/opt/module/datas/dept.txt' into table dept_partition partition(month='201712');

查询分区表中数据
单分区查询

多分区联合查询
hive (default)> select * from dept_partition where month='201709'
union
select * from dept_partition where month='201708'
union
select * from dept_partition where month='201707';

增加分区
创建单个分区
hive (default)> alter table dept_partition add partition(month='201706');
周时创建多个分区
hive (default)> alter table dept_partition add partition(month='201705') partition(month='201704');

删除分区
hive (default)> alter table dept_partition drop partition(month='201707');

同时删除多个分区
hive (default)> alter table dept_partition drop partition(month='201705'),partition(month='201704');

查看分区表有多少分区
hive (default)> show partitions dept_partition;

查看分区表结构
hive (default)> desc formatted dept_partition;

分区表注意事项
1）创建二级分区表
hive (default)>create table dept_partition2(
deptno int,dname string,loc string
)
partitioned by(month string,day string)
row format delimited fields terminated by '\t';

2)正常的加载数据（二级分区表导入数据）
hive (default)> load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201712',day='23');
hive (default)>select * from dept_partition2 where month='201712' and day='23';

3)把数据直接上传到分区目录上，让分区表

方式二：上传数据后添加分区
上传数据
hive (default)>dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=12; //先创建文件夹
hive (default)>dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=12; //上传数据
执行修复
hive (default)>msck repair table dept_partition2;

执行添加分区（上传数据）
hive (default)>dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=11; //先创建文件夹
hive (default)>dfs -put /opt/module/datas/dept.txt /user/hive/warehouse/dept_partition2/month=201709/day=11; //上传数据

hive (default)> alter table dept_partition2 add partition(month='201709',day='11');

方式三：上传数据后load数据到分区
创建目录
hive (default)>dfs -mkdir -p /user/hive/warehouse/dept_partition2/month=201709/day=10; //先创建文件夹
hive (default)>load data local inpath '/opt/module/datas/dept.txt' into table dept_partition2 partition(month='201709',day='10');

======================修改表===========================
修改表
1)重命名表
语法：ALTER TABLE table_name RENAME TO new_table_name
hive (default)> alter table dept_partition2 rename to dept_partition3;

2)增加，修改和删除表分区

3)增加/修改/替换列信息
语法：
更新列：ALTER TABLE table_name CHANGE[COLUMN] col_old_name col_new_name column_type[COMMENT col_comment] [FIRST|ALTER column_name]
增加和替换列：ALTER TABLE table_name ADD|REPLACE COLUMNS(col_name data_type[COMMENT col_comment],...)
注：ADD是代表新增一字段，字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。

实操案例：
1)查询表结构：
hive>desc dept_partition;

2)添加列：
hive>alter table dept_partition add columns(desc string);

3)更新列
hive>alter table dept_partition change column deptdesc旧字段 desc新字段 int;

4)替换列(全替换)
hive>alter table dept_partition replace columns(depton string,dname string,loc string);

======================DML数据操作==========================================
====================数据导入==================================
1)向表中装载数据(Load)
1.1语法
hive>load data [local] inpath '/opt/module/datas/student.txt' [overwrite] into table student [partition (partcol1=val1,...)];
(1)load data：表示加载数据
(2)local:表示从本地加载数据到hive表；否则从HDFS加载数据到hive表
(3)inpath:表示加载数据的路径
(4)overwrite:表示覆盖表中已有数据，否则表示追加
(5)into table:表示加载到哪张表
(6)student:表示具体的表
(7)partition:表示上传到指定分区
1.2实操案例
(1)创建一张表
create table student(id string,name string) row format delimited fields terminated by '\t';
(2)加载本地文件到hive
load data local inpath '/opt/module/datas/student.txt' into table default.student;
(3)加载HDFS文件到hive中
上传文件到HDFS
       hive>dfs -put /opt/module/datas/student.txt /user/zhaoliang/hive;
       加载HDFS上数据
       hive>load data inpath '/user/zhaoliang/hive/student.txt' into table default.student;
(4)加载数据覆盖表中已有数据
上传文件到HDFS
       hive>dfs -put /opt/module/datas/student.txt /user/zhaoliang/hive;
       加载数据覆盖表中已有的数据
       hive>load data inpath '/user/zhaoliang/hive/student.txt' overwrite into table default.student;

2)通过查询语句向表中插入数据(Insert)
2.1创建一张分区表
   hive>create table student(id int,name string) partitioned by (month string) row format delimited fields terminated by '\t';
   2.2基本插入数据
   hive>insert into table student partition(month='201709') values(1,'wangwu');
   2.3基本模式插入(根据单张表查询结果)
   hive>insert overwrite table student partition(month='201708') select id,name from student where month='201709';
   2.4多插入模试(根据多张表查询结果)
hive>from student
       insert overwrite table student
   partition(month='201707')
           select id,name where month='201709'
           insert overwrite table student
       partition(month='201706')
       select id,name where month='201709';

3)查询语句中创建表并加载数据(As Select)
根据查询结果创建表(查询的结果会添加到新创建的表中)
   create table if not exits student3 as select id,name from student;

4)创建表时通过Location指定加载数据路径
4.1创建表，并指定在hdfs上的位置
   hive>create table if not exists student5(
       id int, name string
       )
           row format delimited fields terminated by '\t'
           location '/user/hive/warehouse/student5';

   4.2上传数据到hdfs上
   hive>dfs -put /opt/module/datas/student.txt /user/hive/warehouse/student5;
   4.3查询数据

5)Import数据到指定Hive表中
hive>import table student2 partition(month='201709') from '/user/hive/warehouse/student';

====================数据导出==================================
1)Insert导出
1.1将查询的结果导出到本地
   hive>insert overwrite local directory '/opt/module/datas/export/student' select * from student;
   1.2将查询的结果格式化导出到本地
   hive>insert overwrite local directory '/opt/module/datas/export/student' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student;
   1.3将查询的结果导出到HDFS上(没有)
hive>insert overwrite directory '/user/zhaoliang/student2' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' select * from student;

2)Hadoop命令导出到本地
hive>dfs -get /user/hive/warehouse/student/month=201709/000000_0
   /opt/module/datas/export/student3.txt

3)Hive Shell 命令导出
hive]$ bin/hive -e 'select * from default.studen;' > /opt/module/datas/export/student4.txt;
4)Export导出到HDFS上
hive>export table default.student to '/user/hive/warehouse/export/student';

5)Sqoop导出

3)清除表中数据(Truncate)
hive>truncate table student;
注意：Truncated只能删除管理表，不能删除外部表中数据

====================查询==================================
1) 基本查询(select ... from)
1.1 全表和特定字段查询
全表查询
       hive>select * from table
       选择特定列查询
       hive>select id,name from table
1.2 列别名

1.3 算术运算符

1.4 常用函数

1.5 Limit语句

2) Where语句

3) 分组

4) Join语句

5) 排序

6) 分桶及抽样查询

spark

创建临时表
hive (default)> create external table dept(deptno int,dname string,loc int) row format delimited fields terminated by '\t';

查看表的类型
hive (default)> desc formatted dept表名;
Table Type: EXTERNAL_TABLE (外表)
MANAGED_TABLE (内表，管理表)

hive (default)>