hive-0.10.0-bin.tar.gz,解压到hadoop目录下, 编辑 hive/bin/hive-config.sh文件配置如下
export HIVE_HOME=/home/ssy/hadoop/hive
export HADOOP_HOME=/home/ssy/hadoop
export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-amd64
执行bin/hive测试hive是否可用
hive> show tables;
hive> create table log(day int, bytes int, tag string, user string);
hive> describe table log;
hive> drop table log;
创建测试数据样本test.log
20121221 04567 user s00001
20121221 75531 user s00003
20121222 52369 user s00002
20121222 01297 user s00001
20121223 61223 user s00002
20121223 33121 user s00003
将数据导入到hive中
hive> create table log(day int, bytes int, tag string, user string) row format delimited fields terminated by ' ';
hive> load data local inpath '../test.log' into table log;
//清空以前的数据表数据
//load data local inpath '../test.log' overwrite into table log;
//需要指定间隔符为' ',不然加载后会出现null
//hive> select * from log;
//OK
//NULL NULL NULL NULL
//NULL NULL NULL NULL
//NULL NULL NULL NULL
//NULL NULL NULL NULL
//NULL NULL NULL NULL
//NULL NULL NULL NULL
hive> select * from logs;
OK
20121221 4567 user s00001
20121221 75531 user s00003
20121222 52369 user s00002
20121222 1297 user s00001
20121223 61223 user s00002
20121223 33121 user s00003
这样数据就存储到了hadoop集群中,不用我们手动进行fs文件操作。hive的元数据存储在本机metastore_db文件夹中,这样就可以像操作数据库一样对数据进行查询
查找最大值
hive> select day, max(bytes) from logs group by day;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201305221738_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid =job_201305221738_0002
Kill Command = /root/hadoop/libexec/../bin/hadoop job -kill job_201305221738_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-22 18:45:04,552 Stage-1 map = 0%, reduce = 0%
2013-05-22 18:45:07,586 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:08,596 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:09,608 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:10,620 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:11,628 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:12,640 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:13,648 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:14,665 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.96 sec
2013-05-22 18:45:15,673 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 0.96 sec
2013-05-22 18:45:16,686 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.3 sec
2013-05-22 18:45:17,698 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.3 sec
2013-05-22 18:45:18,713 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.3 sec
MapReduce Total cumulative CPU time: 3 seconds 300 msec
Ended Job = job_201305221738_0002
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.3 sec HDFS Read: 371 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 300 msec
OK
20121221 75531
20121222 52369
20121223 61223
Time taken: 21.278 seconds
计算总和
hive> select day, sum(bytes) from logs group by day;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201305221738_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid =job_201305221738_0003
Kill Command = /root/hadoop/libexec/../bin/hadoop job -kill job_201305221738_0003
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2013-05-22 18:46:03,892 Stage-1 map = 0%, reduce = 0%
2013-05-22 18:46:06,911 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:07,919 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:08,928 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:09,935 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:10,943 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:11,952 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:12,960 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:13,967 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.93 sec
2013-05-22 18:46:14,974 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 0.93 sec
2013-05-22 18:46:15,983 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.38 sec
2013-05-22 18:46:16,990 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.38 sec
2013-05-22 18:46:18,004 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.38 sec
MapReduce Total cumulative CPU time: 3 seconds 380 msec
Ended Job = job_201305221738_0003
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 Cumulative CPU: 3.38 sec HDFS Read: 371 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 380 msec
OK
20121221 80098
20121222 53666
20121223 94344
Time taken: 20.743 seconds
一些简单的操作就可以不手动创建mapreduce任务,而用hive直接统计就行了,非常方便,如果要将结果保存为本地文件,可以执行如下命令
bin/hive -e "select day, sum(bytes) from logs group by day" >> res.csv
还有貌似这个版本hive只能同时支持一个终端进入hive操作,再有终端进入hive操作会提示如下错误。希望新的版本能解决这个问题
FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
See Also : Hadoop权威指南 第12章 Hive简介