数据仓库Hive——DML和查询(上)

三、DML操作

1.数据导入

1.1Load给表里装入数据

语法:
[1]load data:加载数据
[2]local:表示从本地加载数据,否则从HDFS
[3]inpath:表示加载路径
[4]overwrite:覆盖写入
[5]into table:写入表中
[6]short_Table:具体的表名
[7]partition:指定分区号

hive> load data local inpath '/home/centos01/modules/apache-hive-1.2.2-bin/examples/files/x.txt' overwrite into table shortTable partition(year='2010',month='04');
Loading data to table default.shorttable partition (year=2010, month=04)
Partition default.shorttable{year=2010, month=04} stats: [numFiles=1, numRows=0, totalSize=13, rawDataSize=0]
OK
Time taken: 0.616 seconds

1.2通过查询语句向表中插入数据

先创建一张分区表

hive> create table stu1(tdate bigint,interest double,seven_interest double) partitioned by (year string) row format delimited fields terminated by ',';
OK
Time taken: 0.137 seconds

平时普通插入数据

hive> insert into table stu1 partition(year = '2010') values (2010,1.025,9.999);
Query ID = centos01_20190223155025_f28003cd-2808-47a9-82a0-a97a760ee1a0
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-02-23 15:50:28,030 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local120146978_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://linux01:8020/user/hive/warehouse/stu1/year=2010/.hive-staging_hive_2019-02-23_15-50-25_921_6506809753036083134-1/-ext-10000
Loading data to table default.stu1 partition (year=2010)
Partition default.stu1{year=2010} stats: [numFiles=1, numRows=1, totalSize=17, rawDataSize=16]
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 9946 HDFS Write: 49077 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.357 seconds

现在使用基本模式插入语句

hive> insert overwrite table stu1 partition(year='2010') select tdate,interest,seven_interest from stu1  where year='2010';
Query ID = centos01_20190223160342_25f36789-e5b1-4b25-9fc2-9d970d51b106
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-02-23 16:03:44,018 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local487572732_0003
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://linux01:8020/user/hive/warehouse/stu1/year=2010/.hive-staging_hive_2019-02-23_16-03-42_190_3152532333837736856-1/-ext-10000
Loading data to table default.stu1 partition (year=2010)
Partition default.stu1{year=2010} stats: [numFiles=1, numRows=1, totalSize=17, rawDataSize=16]
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 10058 HDFS Write: 49172 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.198 seconds

查询一下

hive> insert overwrite table stu1 partition(year='2010') select tdate,interest,seven_interest from stu1  where year='2011';
Query ID = centos01_20190223162020_17a24fcf-1db8-4945-9373-560448678eba
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-02-23 16:20:21,712 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1592452816_0010
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://linux01:8020/user/hive/warehouse/stu1/year=2010/.hive-staging_hive_2019-02-23_16-20-20_378_2894671489011544195-1/-ext-10000
Loading data to table default.stu1 partition (year=2010)
Partition default.stu1{year=2010} stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 59332 HDFS Write: 88626 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.034 seconds
hive> desc stu1;
OK
tdate                   bigint                                      
interest                double                                      
seven_interest          double                                      
year                    string                                      
                 
# Partition Information          
# col_name              data_type               comment             
                 
year                    string                                      
Time taken: 0.03 seconds, Fetched: 9 row(s)

2.数据导出

2.1Insert导出

2.1.1将查询结果导出到本地
hive> insert overwrite local directory '/home/centos01/modules/apache-hive-1.2.2-bin/iotmp/outprint' select * from stu1;
Query ID = centos01_20190223165232_50f55171-8490-4b3c-92f9-fa327e038867
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-02-23 16:52:33,561 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local643035792_0012
Copying data to local directory /home/centos01/modules/apache-hive-1.2.2-bin/iotmp/outprint
Copying data to local directory /home/centos01/modules/apache-hive-1.2.2-bin/iotmp/outprint
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 157852 HDFS Write: 196798 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.308 seconds

如果我们这么输出的话,在输出csv文件的时候会出现一些乱码的情况,所以就有了我们的2.1.2

2.1.2将查询结果格式化导出到本地
hive> insert overwrite local directory '/home/centos01/modules/apache-hive-1.2.2-bin/iotmp/outprint' row format delimited fields terminated by ',' select * from stu1;
Query ID = centos01_20190223165555_eab0d523-271f-41ce-ba3d-66cce2b40bf6
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-02-23 16:55:57,185 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local1800033420_0013
Copying data to local directory /home/centos01/modules/apache-hive-1.2.2-bin/iotmp/outprint
Copying data to local directory /home/centos01/modules/apache-hive-1.2.2-bin/iotmp/outprint
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 177398 HDFS Write: 196798 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 1.312 seconds
2.1.3将查询结果格式化导出到HDFS上
hive> insert overwrite directory 'user/hive/warehouse/outprint' row format delimited fields terminated by ','select * from stu1;
Query ID = centos01_20190223170255_f1b5f343-cf17-47c3-a757-92e164404f4a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-02-23 17:02:57,085 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_local229905877_0014
Stage-3 is filtered out by condition resolver.
Stage-2 is selected by condition resolver.
Stage-4 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Job running in-process (local Hadoop)
2019-02-23 17:02:58,321 Stage-2 map = 100%,  reduce = 0%
Ended Job = job_local1230355508_0015
Moving data to: user/hive/warehouse/outprint
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 196944 HDFS Write: 219734 SUCCESS
Stage-Stage-2:  HDFS Read: 109940 HDFS Write: 121335 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 2.79 seconds

然后我们到我们的hdfs上顺着路径去查找这个文件
在这里插入图片描述
成功。

2.2Hadoop命令导出到本地

hive> dfs -get /user/hive/warehouse/outprint/000000_0 /home/centos01/modules/apache-hive-1.2.2-bin/iotmp/test.csv;

2.3Hive Shell命令导出

hive> show tables;[centos01@linux01 apache-hive-1.2.2-bin]$ bin/hive -e 'select * from default.stu1;' > /home/centos01/modules/apache-hive-1.2.2-bin/iotmpv
outprint/test2.csv 
Logging initialized using configuration in file:/home/centos01/modules/apache-hive-1.2.2-bin/conf/hive-log4j.properties
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/centos01/modules/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/centos01/modules/hbase-1.3.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
OK
Time taken: 1.944 seconds, Fetched: 429 row(s)

在这里插入图片描述
导出成功

2.4Export导出到HDFS上

hive> export table default.stu1 to '/user/hive/warehouse/outprint/testprint.csv';
Copying data from file:/tmp/centos01/0dceefbc-1723-4afe-8154-449feedfe263/hive_2019-02-23_21-44-30_221_8295514754594839775-1/-local-10000/_metadata
Copying file: file:/tmp/centos01/0dceefbc-1723-4afe-8154-449feedfe263/hive_2019-02-23_21-44-30_221_8295514754594839775-1/-local-10000/_metadata
Copying data from hdfs://linux01:8020/user/hive/warehouse/stu1/year=2010
Copying file: hdfs://linux01:8020/user/hive/warehouse/stu1/year=2010/mfd_day_share_interest.csv
Copying data from hdfs://linux01:8020/user/hive/warehouse/stu1/year=2011
Copying file: hdfs://linux01:8020/user/hive/warehouse/stu1/year=2011/000000_0
OK
Time taken: 1.534 seconds

导出成功
在这里插入图片描述

3.删除表中数据

hive> truncate table stu;
OK
Time taken: 0.106 seconds

注意:truncate只能删除内部表,不能删除外部表

四、查询

1.基本查询

1.1列别名

重命名一个列有助于计算,这个其实在学HBase的时候就想到了,列存储数据库(虽然Hive并不是)如果列名过长会过多的消耗内存空间,产生不必要的麻烦,所以as一下是很方便的事情,下面就来介绍一下

hive> select tdate t,interest as i from stu1;

至于起别名的时候写不写as都行,看个人习惯。

1.2算术运算符

运算符描述
A+B加法
A-B减法
A*B乘法
A/B除法
A%BA对B取余
A&BAB按照位置取与
A^BAB按位置取异或
~AA按位取反
A丨BAB按照位置取或

1.3常用函数

1.3.1 count
hive> select count(*) from stu1;
Query ID = centos01_20190223222341_07b9de4b-07c5-446f-9710-f72e4c3c8233
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-23 22:23:43,461 Stage-1 map = 100%,  reduce = 0%
2019-02-23 22:23:44,475 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1064054946_0001
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 87957 HDFS Write: 40383 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
429
Time taken: 3.308 seconds, Fetched: 1 row(s)
1.3.2 max
hive> select max(tdate) sumNum from stu1;
Query ID = centos01_20190224085856_03d4ed01-3491-44ae-bd7a-878db173ca80
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-24 08:58:57,897 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1109629230_0002
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 58638 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
20140831
Time taken: 1.459 seconds, Fetched: 1 row(s)
1.3.3 min
hive> select min(tdate) sumNum from stu1;
Query ID = centos01_20190224085918_e8284ec5-ad16-4313-b302-b1cb065453e8
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-24 08:59:19,748 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1173673069_0003
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 87957 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
20130701
Time taken: 1.48 seconds, Fetched: 1 row(s)
1.3.4 sum
hive> select sum(tdate) sumNum from stu1;
Query ID = centos01_20190224085754_367acdef-2f0e-4707-9f5c-7ee23321d623
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-24 08:58:04,335 Stage-1 map = 0%,  reduce = 0%
2019-02-24 08:58:06,446 Stage-1 map = 100%,  reduce = 0%
2019-02-24 08:58:07,473 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local489812945_0001
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 29319 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
8618362131
Time taken: 12.605 seconds, Fetched: 1 row(s)
1.3.5 avg
hive> select avg(tdate) sumNum from stu1;
Query ID = centos01_20190224085943_ef097957-7f9a-4c0c-b911-d79e8a182032
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-24 08:59:45,237 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1920915701_0004
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 117276 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
2.013636011915888E7
Time taken: 1.478 seconds, Fetched: 1 row(s)

1.4 Limit语句

查询返回多行字句的时候,limit可以限制返回的行数

hive> select * from stu1 limit 5;
OK
NULL    NULL    NULL    2010
20130701        1.5787  6.307   2010
20130701        1.5787  6.307   2010
20130702        1.5461  6.174   2010
20130703        1.467   6.034   2010
Time taken: 0.095 seconds, Fetched: 5 row(s)

2.Where语句

操作符支持的数据类型描述
A=B基本数据类型如果 A等于 B 则返回 TRUE,反之返回 FALSE
A<=>B基本数据类型如果 A 和 B 都为 NULL,返回 TRUE,其他的和等号(=)操作符的结果一致,如果任一 为 NULL 则结果为 NULL
A<>B, A!=B基本数据类型A 或者 B 为 NULL 则返回 NULL;如果 A 不 等于 B,则返回 TRUE,反之返回 FALSE
A<B基本数据类型A 或者 B 为 NULL,则返回 NULL;如果 A 小 于 B,则返回 TRUE,反之返回 FALSE
A<=B基本数据类型A 或者 B 为 NULL,则返回 NULL;如果 A 小 于等于 B,则返回 TRUE,反之返回 FALSE
A>B基本数据类型A 或者 B 为 NULL,则返回 NULL;如果 A 大 于 B,则返回 TRUE,反之返回 FALSE
A>=B基本数据类型A 或者 B 为 NULL,则返回 NULL;如果 A 大 于等于 B,则返回 TRUE,反之返回 FALSE
A [NOT] BETWEEN B AND C基本数据类型如果 A,B 或者 C 任一为 NULL,则结果为 NULL。如果 A 的值大于等于 B 而且小于或等 于 C,则结果为 TRUE,反之为 FALSE。如果 使用 NOT 关键字则可达到相反的效果
A IS NULL所有数据类型如果 A 等于 NULL,则返回 TRUE,反之返回 FALSE
A IS NOT NULL所有数据类型如果 A 不等于 NULL,则返回 TRUE,反之返 回 FALSE
IN(数值 1, 数值 2)所有数据类型使用 IN 运算显示列表中的值
A [NOT] LIKE BSTRING 类型B 是一个 SQL 下的简单正则表达式,如果 A 与其匹配的话,则返回 TRUE;反之返回 FALSE。B 的表达式说明如下:‘x%’表示 A 必 须以字母‘x’开头,‘%x’表示 A 必须以字母’x’ 结尾,而‘%x%’表示 A 包含有字母’x’,可以位 于开头,结尾或者字符串中间。如果使用 NOT关键字则可达到相反的效果。
A RLIKE B, A REGEXP BSTRING 类型B 是一个正则表达式,如果 A 与其匹配,则返 回TRUE;反之返回FALSE。匹配使用的是JDK 中的正则表达式接口实现的,因为正则也依据 其中的规则。例如,正则表达式必须和整个字 符串 A 相匹配,而不是只需与其字符串匹配。

2.1比较运算符(Between…and…,Null,In)

2.1.1 Between…and…

查询在…和…之间的数据,用在where之后

hive> select * from stu1 where tdate between 20140821 and 20140831;
OK
20140821        1.1168  4.158   2010
20140822        1.104   4.153   2010
20140823        1.0997  4.146   2010
20140824        1.0993  4.139   2010
20140825        1.1122  4.139   2010
20140826        1.1085  4.131   2010
20140827        1.1079  4.123   2010
20140828        1.1043  4.116   2010
20140829        1.1169  4.123   2010
20140830        1.1206  4.134   2010
20140831        1.1204  4.146   2010
2.1.2 is Null

查询空值

hive> select * from stu1 where tdate is NUll;
OK
NULL    NULL    NULL    2010
Time taken: 0.081 seconds, Fetched: 1 row(s)
2.1.3 In(n1,n2)

查询是满足值是n1和n2的数据

hive> select * from stu1 where tdate in (20140401,20140831);
OK
20140401        1.4368  5.356   2010
20140831        1.1204  4.146   2010
Time taken: 0.086 seconds, Fetched: 2 row(s)

2.2.Like和RLike

使用like运算可以选择类似的值
选择条件可以包含字符或者数字(%表示任意个字符 _表示一个字符)
RLike字句是Hive这个功能的扩展,其可以通过Java的正则表达式这个更强大的语言来匹配条件
举例:
查询以1开头的利率信息

hive> select * from stu1 where interest like '1%';
OK
20130701        1.5787  6.307   2010
20130701        1.5787  6.307   2010
20130702        1.5461  6.174   2010
20130703        1.467   6.034   2010

查询含有2的利率信息

hive> select * from stu1 where interest like '%2%';

查询第二个数值是3的利率信息

hive> select * from stu1 where interest like '_3%';

2.4 逻辑运算符(And逻辑并 Or逻辑或 Not逻辑非)

3.分组

※※※3.1 Group By

Group by通常和聚合函数一起使用,按照一个或者多个列队结果进行分组,然后对每个组进行聚合操作

hive> select avg(interest),tdate from stu1 group by tdate;
Query ID = centos01_20190224094703_82c686d7-36d1-4fd7-a041-4d2fb2e53f74
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-24 09:47:04,787 Stage-1 map = 100%,  reduce = 0%
2019-02-24 09:47:05,793 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1477191845_0005
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 364116 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK

3.2 Having

whereHaving
针对查询的时候发挥作用针对查询结果发挥作用
后面不能写分组函数后面可以写分组函数
只用于Group By的统计语句
hive> select avg(interest),tdate from stu1 group by tdate having avg(interest)> 1.5;
Query ID = centos01_20190224095607_6fe9db90-2c79-470c-abf6-d8eee70e0b19
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-24 09:56:09,270 Stage-1 map = 100%,  reduce = 0%
2019-02-24 09:56:10,277 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1381061535_0006
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 393435 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值