Hive简单SQL语句描述

hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。(该段话来自于百度百科)

下面进行一些简单的Hive命令,供Hive小白看。


1.准备测试数据

新建文件ht_p.txt,然后输入一组数据:

vi /ht_p.txt
数据之间以Tab符号隔开。

01      lgh     24
02      zmy     25
03      zyx     23
04      lgq     19
05      zly     28


2.建表

首先 输入hive命令进入hive模式,然后通过以下命令建表:
hive> create table ht_person(id int, name string, age int) row format delimited fields terminated by '\t';
OK
Time taken: 3.77 seconds
建表完成后,输入show tables查询表:
hive> show tables;
OK
ht_person
Time taken: 0.04 seconds, Fetched: 1 row(s)

3.数据导入

将/ht_p.txt文本数据导入到hive库中,SQL代码如下:
hive> load data local inpath '/ht_p.txt' overwrite into table ht_person;
Copying data from file:/ht_p.txt
Copying file: file:/ht_p.txt
Loading data to table default.ht_person
Table default.ht_person stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 50, raw_data_size: 0]
OK
Time taken: 1.679 seconds

4.查询数据

查询表格中的所有数据:
hive> select * from ht_person;
OK
1       lgh     24
2       zmy     25
3       zyx     23
4       lgq     19
5       zly     28
Time taken: 0.431 seconds, Fetched: 5 row(s)
查询表结构:
hive> desc ht_person;
OK
id                      int                     None
name                    string                  None
age                     int                     None
Time taken: 0.11 seconds, Fetched: 3 row(s)


5.修改表

5.1 增加一个列

hive> alter table ht_person add columns (place string);
OK
Time taken: 0.834 seconds
插入完成后,查询表结构:
hive> desc ht_person;
OK
id                      int                     None
name                    string                  None
age                     int                     None
place                   string                  None
Time taken: 0.08 seconds, Fetched: 4 row(s)

5.2 修改表名

hive> alter table ht_person rename to ht_persons;
OK
Time taken: 0.763 seconds
修改完成后,查看表如下:
hive> show tables;
OK
ht_persons
Time taken: 0.035 seconds, Fetched: 1 row(s)

5.3 删除表(最好别删除,后面例子用)

hive> drop table ht_persons;
OK
Time taken: 0.763 seconds
删除完成后,查看表如下:
hive> show tables;
OK
Time taken: 0.763 seconds

6.用已有表创建新表,并复制原有表

如果在创建过程中出现各种错误可参见: http://blog.csdn.net/ghuil/article/details/45170993
hive> create table ht_p2 as select * from ht_persons;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1429240731064_0011, Tracking URL = http://compute-24-13.local:8088/proxy/application_1429240731064_0011/
Kill Command = /home/hbase/install/hadoop-2.3.0-cdh5.1.0/bin/hadoop job  -kill job_1429240731064_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-04-21 16:00:12,105 Stage-1 map = 0%,  reduce = 0%
2015-04-21 16:00:17,307 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.46 sec
2015-04-21 16:00:18,344 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.46 sec
2015-04-21 16:00:19,377 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.46 sec
MapReduce Total cumulative CPU time: 1 seconds 460 msec
Ended Job = job_1429240731064_0011
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://compute-24-13.local:9000/tmp/hive-root/hive_2015-04-21_16-00-01_866_8125068571975125560-1/-ext-10001
Moving data to: hdfs://compute-24-13.local:9000/user/hive/warehouse/ht_p2
Table default.ht_p2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 60, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.46 sec   HDFS Read: 275 HDFS Write: 60 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 460 msec
OK
Time taken: 20.57 seconds
hive> show tables;
OK
ht_p2
ht_persons
Time taken: 0.088 seconds, Fetched: 2 row(s)
创建完成后,通过select查询如下:
hive> select * from ht_p2;
OK
1       lgh     24      NULL
2       zmy     25      NULL
3       zyx     23      NULL
4       lgq     19      NULL
5       zly     28      NULL
Time taken: 0.314 seconds, Fetched: 5 row(s)
可见创建成功。

7. 创建新表时仅复制表结构,不要数据

hive> create table ht_p3 like ht_persons;
OK
Time taken: 1.839 seconds
创建完成后,通过select查询如下:
hive> select * from ht_p3;
OK
Time taken: 0.279 seconds
可见表中并无数据。

8.复合查询

8.1 查询年龄23岁以上的人:
hive> from (select name, age as big from ht_persons) t
    > select t.name, t.big
    > where t.big > 23
    > limit 3;
结果如下:
Total MapReduce CPU Time Spent: 1 seconds 770 msec
OK
lgh     24
zmy     25
zly     28
Time taken: 22.579 seconds, Fetched: 3 row(s)

8.2 JOIN链接查询
hive> select t1.name, t1.age
    > from ht_persons t1 join ht_p2 t2 on t1.age=t2.age
    > where t1.age>23;
结果如下:
Total MapReduce CPU Time Spent: 4 seconds 930 msec
OK
lgh     24
zmy     25
zly     28
Time taken: 24.229 seconds, Fetched: 3 row(s)

8.3 聚合查询
hive> select count(*), avg(t.age)
    > from ht_persons t;
结果如下:
Total MapReduce CPU Time Spent: 3 seconds 220 msec
OK
5       23.8
Time taken: 24.699 seconds, Fetched: 1 row(s)



全文完,转载请注明出处: http://blog.csdn.net/ghuil/article/details/45170245


  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值