hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。(该段话来自于百度百科)
下面进行一些简单的Hive命令,供Hive小白看。
1.准备测试数据
新建文件ht_p.txt,然后输入一组数据:
vi /ht_p.txt
数据之间以Tab符号隔开。
01 lgh 24
02 zmy 25
03 zyx 23
04 lgq 19
05 zly 28
2.建表
首先
输入hive命令进入hive模式,然后通过以下命令建表:
hive> create table ht_person(id int, name string, age int) row format delimited fields terminated by '\t';
OK
Time taken: 3.77 seconds
建表完成后,输入show tables查询表:
hive> show tables;
OK
ht_person
Time taken: 0.04 seconds, Fetched: 1 row(s)
3.数据导入
将/ht_p.txt文本数据导入到hive库中,SQL代码如下:
hive> load data local inpath '/ht_p.txt' overwrite into table ht_person;
Copying data from file:/ht_p.txt
Copying file: file:/ht_p.txt
Loading data to table default.ht_person
Table default.ht_person stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 50, raw_data_size: 0]
OK
Time taken: 1.679 seconds
4.查询数据
查询表格中的所有数据:
hive> select * from ht_person;
OK
1 lgh 24
2 zmy 25
3 zyx 23
4 lgq 19
5 zly 28
Time taken: 0.431 seconds, Fetched: 5 row(s)
查询表结构:
hive> desc ht_person;
OK
id int None
name string None
age int None
Time taken: 0.11 seconds, Fetched: 3 row(s)
5.修改表
5.1 增加一个列
hive> alter table ht_person add columns (place string);
OK
Time taken: 0.834 seconds
插入完成后,查询表结构:
hive> desc ht_person;
OK
id int None
name string None
age int None
place string None
Time taken: 0.08 seconds, Fetched: 4 row(s)
5.2 修改表名
hive> alter table ht_person rename to ht_persons;
OK
Time taken: 0.763 seconds
修改完成后,查看表如下:
hive> show tables;
OK
ht_persons
Time taken: 0.035 seconds, Fetched: 1 row(s)
5.3 删除表(最好别删除,后面例子用)
hive> drop table ht_persons;
OK
Time taken: 0.763 seconds
删除完成后,查看表如下:
hive> show tables;
OK
Time taken: 0.763 seconds
6.用已有表创建新表,并复制原有表
如果在创建过程中出现各种错误可参见:
http://blog.csdn.net/ghuil/article/details/45170993
hive> create table ht_p2 as select * from ht_persons;
Total MapReduce jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1429240731064_0011, Tracking URL = http://compute-24-13.local:8088/proxy/application_1429240731064_0011/
Kill Command = /home/hbase/install/hadoop-2.3.0-cdh5.1.0/bin/hadoop job -kill job_1429240731064_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-04-21 16:00:12,105 Stage-1 map = 0%, reduce = 0%
2015-04-21 16:00:17,307 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.46 sec
2015-04-21 16:00:18,344 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.46 sec
2015-04-21 16:00:19,377 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.46 sec
MapReduce Total cumulative CPU time: 1 seconds 460 msec
Ended Job = job_1429240731064_0011
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://compute-24-13.local:9000/tmp/hive-root/hive_2015-04-21_16-00-01_866_8125068571975125560-1/-ext-10001
Moving data to: hdfs://compute-24-13.local:9000/user/hive/warehouse/ht_p2
Table default.ht_p2 stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 60, raw_data_size: 0]
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.46 sec HDFS Read: 275 HDFS Write: 60 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 460 msec
OK
Time taken: 20.57 seconds
hive> show tables;
OK
ht_p2
ht_persons
Time taken: 0.088 seconds, Fetched: 2 row(s)
创建完成后,通过select查询如下:
hive> select * from ht_p2;
OK
1 lgh 24 NULL
2 zmy 25 NULL
3 zyx 23 NULL
4 lgq 19 NULL
5 zly 28 NULL
Time taken: 0.314 seconds, Fetched: 5 row(s)
可见创建成功。
7. 创建新表时仅复制表结构,不要数据
hive> create table ht_p3 like ht_persons;
OK
Time taken: 1.839 seconds
创建完成后,通过select查询如下:
hive> select * from ht_p3;
OK
Time taken: 0.279 seconds
可见表中并无数据。
8.复合查询
8.1 查询年龄23岁以上的人:
hive> from (select name, age as big from ht_persons) t
> select t.name, t.big
> where t.big > 23
> limit 3;
结果如下:
Total MapReduce CPU Time Spent: 1 seconds 770 msec
OK
lgh 24
zmy 25
zly 28
Time taken: 22.579 seconds, Fetched: 3 row(s)
8.2 JOIN链接查询
hive> select t1.name, t1.age
> from ht_persons t1 join ht_p2 t2 on t1.age=t2.age
> where t1.age>23;
结果如下:
Total MapReduce CPU Time Spent: 4 seconds 930 msec
OK
lgh 24
zmy 25
zly 28
Time taken: 24.229 seconds, Fetched: 3 row(s)
8.3 聚合查询
hive> select count(*), avg(t.age)
> from ht_persons t;
结果如下:
Total MapReduce CPU Time Spent: 3 seconds 220 msec
OK
5 23.8
Time taken: 24.699 seconds, Fetched: 1 row(s)
全文完,转载请注明出处:
http://blog.csdn.net/ghuil/article/details/45170245