Hive基本使用(二)

增加头信息增加配置

<property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>

</property>

<property>

<name>hive.cli.print.header</name>

<value>true</value>

</property>

-》内置函数:

min max sum avg  

-》区间范围

between  xxx and  xxx;  

-》模糊查询

in 、like(%表示多个字符 、 _表示一个字符)

-》Rlike匹配的是正则表达式

select * from student where name  rlike '[m]';
OK
student.id      student.name
2       Amy
Time taken: 0.742 seconds, Fetched: 1 row(s)

-》分组后增加条件判断不能用where,用having

-》join on操作

select * from student3;
OK
student3.id     student3.name
1       Alex
2       Amy
3       Mia
Time taken: 0.113 seconds, Fetched: 3 row(s)
hive (mongdb)> select * from student_5;
OK
student_5.id    student_5.name
2       Amy
Time taken: 0.099 seconds, Fetched: 1 row(s)
hive (mongdb)> select * from student;
OK
student.id      student.name
4       Tonny
1       Alex
2       Amy
3       Mia
NULL    NULL

select * from student a join  student3 b on a.id=b.id;

结果:执行MR程序

Total MapReduce CPU Time Spent: 9 seconds 560 msec
OK
a.id    a.name  b.id    b.name
1       Alex    1       Alex
2       Amy     2       Amy
3       Mia     3       Mia

Time taken: 150.25 seconds, Fetched: 3 row(s)

student3数据增加为

select * from student3;
OK
student3.id     student3.name
1       Alex
2       Amy
3       Mia
8       Apple
23      James
24      Kobe
Time taken: 0.125 seconds, Fetched: 6 row(s)

-》right join 右关联主要以右表为主,如果左边找不到就显示为null

select * from student a right join  student3 b on a.id=b.id;

结果显示:

Total MapReduce CPU Time Spent: 11 seconds 60 msec
OK
a.id    a.name  b.id    b.name
1       Alex    1       Alex
2       Amy     2       Amy
3       Mia     3       Mia
NULL    NULL    8       Apple
NULL    NULL    23      James
NULL    NULL    24      Kobe

Time taken: 84.236 seconds, Fetched: 6 row(s)

多表关联的话需要使用多个join进行关联

join on 。。join on。

如何使用左关联还是右关联,主要看表的数据的多少

-》笛卡尔积:

为了避免查询的不合理性,需要进行设置严格模式,否则会引起笛卡尔积现象

 select id,name from student,student3;
FAILED: SemanticException Column id Found in more than One Tables/Subqueries
hive (mongdb)> set hive.mapred.mode = strict;

select star,name from student, movie_world;

 set hive.mapred.mode = strict;
hive (mongdb)> set hive.strict.checks.cartesian.product = true;

hive (mongdb)> select star,id from student, movie_world;
FAILED: SemanticException Cartesian products are disabled for safety reasons. If you know what you are doing, please sethive.strict.checks.cartesian.product to false and that hive.mapred.mode is not set to 'strict' to proceed. Note that if you may get errors or incorrect results if you make a mistake while using some of the unsafe features.
hive (mongdb)> 

启用严格模式:hive.mapred.mode = strict // Deprecated
hive.strict.checks.large.query = true
该设置会禁用:1. 不指定分页的orderby
         2. 对分区表不指定分区进行查询 
         3. 和数据量无关,只是一个查询模式

hive.strict.checks.type.safety = true
严格类型安全,该属性不允许以下操作:1. bigint和string之间的比较
                  2. bigint和double之间的比较

hive.strict.checks.cartesian.product = true
该属性不允许笛卡尔积操作

-》限制查询前面的几行

limit 行数

 select * from student limit 2;
OK
student.id      student.name
1       Alex
2       Amy
Time taken: 0.162 seconds, Fetched: 2 row(s)

分桶表

-》创建表

 create table asiainfo(deptno int, name string, address string)
             > clustered by(deptno) into 4 buckets
             > row format
             > delimited fields
             > terminated by ' ';
OK
Time taken: 1.892 seconds

-》设置桶的数据属性

 如果不修该值,与普通一样的数据,hdfs不会展示分桶

set hive.enforce.bucketing =true;

-》插入数据

子查询的方式插入数据

-》删除表数据

truncate table 表名;

-》排序 order by

默认为升序 asc desc 降序

分区和分桶的区别:

分区指的是数据的存放路径

分桶指的是数据文件

hive (mongdb)> select * from huawei order by deptno;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20190331111320_c4b279eb-ba30-4aed-bc1e-537ba62b0f37
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1554001877312_0001, Tracking URL = http://Master:8088/proxy/application_1554001877312_0001/
Kill Command = /opt/hadoop/hadoop-2.6.5/bin/hadoop job  -kill job_1554001877312_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-03-31 11:13:45,130 Stage-1 map = 0%,  reduce = 0%
2019-03-31 11:14:02,706 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.9 sec
2019-03-31 11:14:07,925 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.46 sec
MapReduce Total cumulative CPU time: 3 seconds 460 msec
Ended Job = job_1554001877312_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.46 sec   HDFS Read: 7900 HDFS Write: 368 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 460 msec
OK
huawei.deptno   huawei.name     huawei.address
1               Alex
2               Amy
3               Mia
4       kobe    los
5       James   los
6       meizi   wuhan
7       yun     taiyuan
8       mayun   alibaba
9       mahuateng       shenzhen
10      yuhongming      beijing
11      abao    shanghai
Time taken: 48.963 seconds, Fetched: 11 row(s)
hive (mongdb)> 

 

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值