首先告诉大家一个结论,hive不管是在伪分布式的hadoop中,还是在完全分布式的hadoop集群中,安装与配置方法都是不变的,我们已经说过了,也专门写过了详细的文章,参见hive的安装与测试。我们这篇文章只是在hadoop集群中再给大家验证下hive的可用性。下面直接上测试例子。
怎么创建数据库和表以及从linux本地导入数据到hive的表中,请参见上边的文章hive实例--分析每个月的查询量,我这里直接运行sql了。
hive> select count(*) from t_searchword; Query ID = root_20180119032121_086e9c67-3578-4ff0-8959-24b0ea5a7845 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1516341173755_0002, Tracking URL = http://node113:8088/proxy/application_1516341173755_0002/ Kill Command = /usr/local/hadoop/hadoop-2.6.5/bin/hadoop job -kill job_1516341173755_0002 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-01-19 03:21:36,191 Stage-1 map = 0%, reduce = 0% 2018-01-19 03:21:46,068 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.58 sec 2018-01-19 03:21:56,996 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.36 sec MapReduce Total cumulative CPU time: 3 seconds 360 msec Ended Job = job_1516341173755_0002 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.36 sec HDFS Read: 7563 HDFS Write: 3 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 360 msec OK 34 Time taken: 37.009 seconds, Fetched: 1 row(s) hive>
运行成功,原先hive的配置是在hadoop伪分布式集群中配置的,伪分布式改为hadoop完全分布式集群cluster之后,发现hive没有收到影响,只是原先创建的数据库不能再使用了,因为原先的伪分布式集群中的hdfs文件都清空了啊。