Hive的简单操作

使用hive之前先启动hadoop

进入到安装的路径:/opt/hadoop/hadoop-2.7.3/sbin

执行命令:

./start-dfs.sh

./start-yarn.sh

./hadoop-daemon.sh start datanode

./mr-jobhistory-daemon.sh start historyserver

执行完命令之后使用jps查看

[root@tiancunPC sbin]# jps
4992 org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar
16066 JobHistoryServer
15237 DataNode
12837 ResourceManager
16102 Jps
12427 NameNode
12620 SecondaryNameNode
15503 RunJar


启动hive,进入到安装目录bin中

执行./hive

进入到hive的命令行:

hive>

1、查看有多少数据库:

hive> show databases;
OK
default
Time taken: 0.016 seconds, Fetched: 1 row(s)
hive>


2、使用数据库:

hive> use default;
OK
Time taken: 0.041 seconds
hive>


3、查看有多少张表:

hive> show tables;
OK
Time taken: 0.256 seconds
hive>


4、创建表:

hive> create table student(id int,name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 5.467 seconds
hive>

其中ROW 和 FIELDS指定分割符,指定行和字段的分割;


5、往表中放几条数据

创建了一个/opt/datas 文件夹用来放student表的数据,创建一个文件student.txt

编辑student.txt

1001    zhangsan

1002    lisi

1003    wangwu

现在要把student.txt文件中的数据加载到 student表中去

hive> load data local inpath '/opt/hadoop/datas/student.txt' into table student;

指定这个命令可能会报一个错误:

Failed with exception File /tmp/hive-root/hive_2016-11-12_15-03-17_165_7992996511195552405-1/-ext-10000/student.txt could only be replicated to 0 nodes instead of minReplication (=1).  There are 0 datanode(s) running and no node(s) are excluded in this operation.
    at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)


是datanode没有启动,重新执行命令./hadoop-daemon.sh start datanode  启动一下即可


6、查询语句

select * from student; 发现没有问题,但是select id from student;就报错了

Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1478941270145_0012, Tracking URL = http://tiancunPC:8088/proxy/application_1478941270145_0012/
Kill Command = /opt/hadoop/hadoop-2.7.3/bin/hadoop job  -kill job_1478941270145_0012
Interrupting... Be patient, this might take some time.
Press Ctrl+C again to kill JVM
killing job with: job_1478941270145_0012
Exiting the JVM
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2016-11-12 22:59:56,214 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1478941270145_0012 with errors
Error during job, obtaining debugging information...
killing job with: job_1478941270145_0012
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0:  HDFS Read: 0 HDFS Write: 0 FAIL


解决办法:转载地址 https://www.iteblog.com/archives/831

说明:

如果你想查询某个表的某一列,Hive默认是会启用MapReduce Job来完成这个任务,如下:

hive> SELECT id, money FROM m limit 10 ;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Cannot run job locally: Input Size (= 235105473 ) is larger than
hive.exec.mode.local.auto.inputbytes.max (= 134217728 )
Starting Job = job_1384246387966_0229, Tracking URL =
http: //l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/
Kill Command = /home/q/hadoop- 2.2 . 0 /bin/hadoop job 
-kill job_1384246387966_0229
hadoop job information for Stage- 1 : number of mappers: 1 ;
number of reducers: 0
2013 - 11 - 13 11 : 35 : 16 , 167 Stage- 1 map = 0 %,  reduce = 0 %
2013 - 11 - 13 11 : 35 : 21 , 327 Stage- 1 map = 100 %,  reduce = 0 %,
  Cumulative CPU 1.26 sec
2013 - 11 - 13 11 : 35 : 22 , 377 Stage- 1 map = 100 %,  reduce = 0 %,
  Cumulative CPU 1.26 sec
MapReduce Total cumulative CPU time: 1 seconds 260 msec
Ended Job = job_1384246387966_0229
MapReduce Jobs Launched:
Job 0 : Map: 1   Cumulative CPU: 1.26 sec  
HDFS Read: 8388865 HDFS Write: 60 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 260 msec
OK
1       122
1       185
1       231
1       292
1       316
1       329
1       355
1       356
1       362
1       364
Time taken: 16.802 seconds, Fetched: 10 row(s)

  我们都知道,启用MapReduce Job是会消耗系统开销的。对于这个问题,从Hive0.10.0版本开始,对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句,不需要起MapReduce job,直接通过Fetch task获取数据,可以通过下面几种方法实现:
  方法一:

hive> set hive.fetch.task.conversion=more;
hive> SELECT id, money FROM m limit 10 ;
OK
1       122
1       185
1       231
1       292
1       316
1       329
1       355
1       356
1       362
1       364
Time taken: 0.138 seconds, Fetched: 10 row(s)

上面 set hive.fetch.task.conversion=more;开启了Fetch任务,所以对于上述简单的列查询不在启用MapReduce job!
  方法二:

bin/hive --hiveconf hive.fetch.task.conversion=more

  方法三:
上面的两种方法都可以开启了Fetch任务,但是都是临时起作用的;如果你想一直启用这个功能,可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置:

<property>
   <name>hive.fetch.task.conversion</name>
   <value>more</value>
   <description>
     Some select queries can be converted to single FETCH task
     minimizing latency.Currently the query should be single
     sourced not having any subquery and should not have
     any aggregations or distincts (which incurrs RS),
     lateral views and joins.
     1 . minimal : SELECT STAR, FILTER on partition columns, LIMIT only
     2 . more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns)
   </description>
</property>









  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值