使用hive之前先启动hadoop
进入到安装的路径:/opt/hadoop/hadoop-2.7.3/sbin
执行命令:
./start-dfs.sh
./start-yarn.sh
./hadoop-daemon.sh start datanode
./mr-jobhistory-daemon.sh start historyserver
执行完命令之后使用jps查看
[root@tiancunPC sbin]# jps4992 org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar
16066 JobHistoryServer
15237 DataNode
12837 ResourceManager
16102 Jps
12427 NameNode
12620 SecondaryNameNode
15503 RunJar
启动hive,进入到安装目录bin中
执行./hive
进入到hive的命令行:
hive>
1、查看有多少数据库:
hive> show databases;
OK
default
Time taken: 0.016 seconds, Fetched: 1 row(s)
hive>
2、使用数据库:
hive> use default;
OK
Time taken: 0.041 seconds
hive>
3、查看有多少张表:
hive> show tables;
OK
Time taken: 0.256 seconds
hive>
4、创建表:
hive> create table student(id int,name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 5.467 seconds
hive>
其中ROW 和 FIELDS指定分割符,指定行和字段的分割;
5、往表中放几条数据
创建了一个/opt/datas 文件夹用来放student表的数据,创建一个文件student.txt
编辑student.txt
1001 zhangsan
1002 lisi
1003 wangwu
现在要把student.txt文件中的数据加载到 student表中去
hive> load data local inpath '/opt/hadoop/datas/student.txt' into table student;
指定这个命令可能会报一个错误:
Failed with exception File /tmp/hive-root/hive_2016-11-12_15-03-17_165_7992996511195552405-1/-ext-10000/student.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
是datanode没有启动,重新执行命令./hadoop-daemon.sh start datanode 启动一下即可
6、查询语句
select * from student; 发现没有问题,但是select id from student;就报错了
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1478941270145_0012, Tracking URL = http://tiancunPC:8088/proxy/application_1478941270145_0012/
Kill Command = /opt/hadoop/hadoop-2.7.3/bin/hadoop job -kill job_1478941270145_0012
Interrupting... Be patient, this might take some time.
Press Ctrl+C again to kill JVM
killing job with: job_1478941270145_0012
Exiting the JVM
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2016-11-12 22:59:56,214 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1478941270145_0012 with errors
Error during job, obtaining debugging information...
killing job with: job_1478941270145_0012
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: HDFS Read: 0 HDFS Write: 0 FAIL
解决办法:转载地址 https://www.iteblog.com/archives/831
说明:
如果你想查询某个表的某一列,Hive默认是会启用MapReduce Job来完成这个任务,如下:
hive> SELECT id, money FROM m limit
10
;
Total MapReduce jobs =
1
Launching Job
1
out of
1
Number of reduce tasks is set to
0
since there's no reduce operator
Cannot run job locally: Input Size (=
235105473
) is larger than
hive.exec.mode.local.auto.inputbytes.max (=
134217728
)
Starting Job = job_1384246387966_0229, Tracking URL =
http:
//l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/
Kill Command = /home/q/hadoop-
2.2
.
0
/bin/hadoop job
-kill job_1384246387966_0229
hadoop job information
for
Stage-
1
: number of mappers:
1
;
number of reducers:
0
2013
-
11
-
13
11
:
35
:
16
,
167
Stage-
1
map =
0
%, reduce =
0
%
2013
-
11
-
13
11
:
35
:
21
,
327
Stage-
1
map =
100
%, reduce =
0
%,
Cumulative CPU
1.26
sec
2013
-
11
-
13
11
:
35
:
22
,
377
Stage-
1
map =
100
%, reduce =
0
%,
Cumulative CPU
1.26
sec
MapReduce Total cumulative CPU time:
1
seconds
260
msec
Ended Job = job_1384246387966_0229
MapReduce Jobs Launched:
Job
0
: Map:
1
Cumulative CPU:
1.26
sec
HDFS Read:
8388865
HDFS Write:
60
SUCCESS
Total MapReduce CPU Time Spent:
1
seconds
260
msec
OK
1
122
1
185
1
231
1
292
1
316
1
329
1
355
1
356
1
362
1
364
Time taken:
16.802
seconds, Fetched:
10
row(s)
|
我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句,不需要起MapReduce job,直接通过Fetch task获取数据,可以通过下面几种方法实现:
方法一:
hive> set hive.fetch.task.conversion=more;
hive> SELECT id, money FROM m limit
10
;
OK
1
122
1
185
1
231
1
292
1
316
1
329
1
355
1
356
1
362
1
364
Time taken:
0.138
seconds, Fetched:
10
row(s)
|
上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!
方法二:
bin/hive --hiveconf hive.fetch.task.conversion=more
|
方法三:
上面的两种方法都可以开启了Fetch任务,但是都是临时起作用的;如果你想一直启用这个功能,可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置:
<property>
<name>hive.fetch.task.conversion</name>
<value>more</value>
<description>
Some select queries can be converted to single FETCH task
minimizing latency.Currently the query should be single
sourced not having any subquery and should not have
any aggregations or distincts (which incurrs RS),
lateral views and joins.
1
. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
2
. more : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
</description>
</property>