Hive基本使用（二）

最新推荐文章于 2024-07-01 21:31:18 发布

Supper宝宝

最新推荐文章于 2024-07-01 21:31:18 发布

阅读量1.7k

点赞数 1

分类专栏：大数据 hiv

本文链接：https://blog.csdn.net/kebexue/article/details/85206242

版权

大数据同时被 2 个专栏收录

21 篇文章 0 订阅

订阅专栏

hiv

3 篇文章 0 订阅

订阅专栏

增加头信息增加配置

<property>
<name>hive.cli.print.current.db</name>
<value>true</value>

</property>

<name>hive.cli.print.header</name>

</property>

-》内置函数：

min max sum avg

-》区间范围

between xxx and xxx；

-》模糊查询

in 、like（%表示多个字符、 _表示一个字符）

-》Rlike匹配的是正则表达式

select * from student where name rlike '[m]';
OK
student.id student.name
2 Amy
Time taken: 0.742 seconds, Fetched: 1 row(s)

-》分组后增加条件判断不能用where，用having

-》join on操作

select * from student3;
OK
student3.id student3.name
1 Alex
2 Amy
3 Mia
Time taken: 0.113 seconds, Fetched: 3 row(s)
hive (mongdb)> select * from student_5;
OK
student_5.id student_5.name
2 Amy
Time taken: 0.099 seconds, Fetched: 1 row(s)
hive (mongdb)> select * from student;
OK
student.id student.name
4 Tonny
1 Alex
2 Amy
3 Mia
NULL NULL

select * from student a join student3 b on a.id=b.id;

结果：执行MR程序

Total MapReduce CPU Time Spent: 9 seconds 560 msec
OK
a.id a.name b.id b.name
1 Alex 1 Alex
2 Amy 2 Amy
3 Mia 3 Mia
Time taken: 150.25 seconds, Fetched: 3 row(s)

student3数据增加为

select * from student3;
OK
student3.id student3.name
1 Alex
2 Amy
3 Mia
8 Apple
23 James
24 Kobe
Time taken: 0.125 seconds, Fetched: 6 row(s)

-》right join 右关联主要以右表为主，如果左边找不到就显示为null

select * from student a right join student3 b on a.id=b.id;

结果显示：

Total MapReduce CPU Time Spent: 11 seconds 60 msec
OK
a.id a.name b.id b.name
1 Alex 1 Alex
2 Amy 2 Amy
3 Mia 3 Mia
NULL NULL 8 Apple
NULL NULL 23 James
NULL NULL 24 Kobe
Time taken: 84.236 seconds, Fetched: 6 row(s)

多表关联的话需要使用多个join进行关联

join on 。。join on。

如何使用左关联还是右关联，主要看表的数据的多少

-》笛卡尔积：

为了避免查询的不合理性，需要进行设置严格模式，否则会引起笛卡尔积现象

select id,name from student,student3;
FAILED: SemanticException Column id Found in more than One Tables/Subqueries
hive (mongdb)> set hive.mapred.mode = strict;

select star,name from student, movie_world;

set hive.mapred.mode = strict;
hive (mongdb)> set hive.strict.checks.cartesian.product = true;
hive (mongdb)> select star,id from student, movie_world;
FAILED: SemanticException Cartesian products are disabled for safety reasons. If you know what you are doing, please sethive.strict.checks.cartesian.product to false and that hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.
hive (mongdb)>

启用严格模式：hive.mapred.mode = strict // Deprecated
hive.strict.checks.large.query = true
该设置会禁用：1. 不指定分页的orderby
　　　　　　 2. 对分区表不指定分区进行查询
　　　　　　 3. 和数据量无关，只是一个查询模式

hive.strict.checks.type.safety = true
严格类型安全，该属性不允许以下操作：1. bigint和string之间的比较
　　　　　　　　　　　　　　　　　　2. bigint和double之间的比较

hive.strict.checks.cartesian.product = true
该属性不允许笛卡尔积操作

-》限制查询前面的几行

limit 行数

select * from student limit 2;
OK
student.id student.name
1 Alex
2 Amy
Time taken: 0.162 seconds, Fetched: 2 row(s)

分桶表

-》创建表

create table asiainfo(deptno int, name string, address string)
> clustered by(deptno) into 4 buckets
> row format
> delimited fields
> terminated by ' ';
OK
Time taken: 1.892 seconds

-》设置桶的数据属性

如果不修该值，与普通一样的数据，hdfs不会展示分桶

set hive.enforce.bucketing =true;

-》插入数据

子查询的方式插入数据

-》删除表数据

truncate table 表名；

-》排序 order by

默认为升序 asc desc 降序

分区和分桶的区别：

分区指的是数据的存放路径

分桶指的是数据文件

hive (mongdb)> select * from huawei order by deptno;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190331111320_c4b279eb-ba30-4aed-bc1e-537ba62b0f37
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1554001877312_0001, Tracking URL = http://Master:8088/proxy/application_1554001877312_0001/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job -kill job_1554001877312_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-03-31 11:13:45,130 Stage-1 map = 0%, reduce = 0%
2019-03-31 11:14:02,706 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.9 sec
2019-03-31 11:14:07,925 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.46 sec
MapReduce Total cumulative CPU time: 3 seconds 460 msec
Ended Job = job_1554001877312_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.46 sec HDFS Read: 7900 HDFS Write: 368 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 460 msec
OK
huawei.deptno huawei.name huawei.address
1 Alex
2 Amy
3 Mia
4 kobe los
5 James los
6 meizi wuhan
7 yun taiyuan
8 mayun alibaba
9 mahuateng shenzhen
10 yuhongming beijing
11 abao shanghai
Time taken: 48.963 seconds, Fetched: 11 row(s)
hive (mongdb)>