Hive的简单操作

最新推荐文章于 2024-06-02 21:45:11 发布

序猿小弟

最新推荐文章于 2024-06-02 21:45:11 发布

阅读量1.5k

点赞数

分类专栏： hive hive简单使用文章标签： hive hive简单使用

hive 同时被 2 个专栏收录

2 篇文章 0 订阅

订阅专栏

hive简单使用

1 篇文章 0 订阅

订阅专栏

使用hive之前先启动hadoop

进入到安装的路径：/opt/hadoop/hadoop-2.7.3/sbin

执行命令：

./start-dfs.sh

./start-yarn.sh

./hadoop-daemon.sh start datanode

./mr-jobhistory-daemon.sh start historyserver

执行完命令之后使用jps查看

[root@tiancunPC sbin]# jps
4992 org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar
16066 JobHistoryServer
15237 DataNode
12837 ResourceManager
16102 Jps
12427 NameNode
12620 SecondaryNameNode
15503 RunJar

启动hive，进入到安装目录bin中

执行./hive

进入到hive的命令行：

hive>

1、查看有多少数据库：

hive> show databases;
OK
default
Time taken: 0.016 seconds, Fetched: 1 row(s)
hive>

2、使用数据库：

hive> use default;
OK
Time taken: 0.041 seconds
hive>

3、查看有多少张表：

hive> show tables;
OK
Time taken: 0.256 seconds
hive>

4、创建表：

hive> create table student(id int,name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 5.467 seconds
hive>

其中ROW 和 FIELDS指定分割符，指定行和字段的分割；

5、往表中放几条数据

创建了一个/opt/datas 文件夹用来放student表的数据，创建一个文件student.txt

编辑student.txt

1001 zhangsan

1002 lisi

1003 wangwu

现在要把student.txt文件中的数据加载到 student表中去

hive> load data local inpath '/opt/hadoop/datas/student.txt' into table student;

指定这个命令可能会报一个错误：

Failed with exception File /tmp/hive-root/hive_2016-11-12_15-03-17_165_7992996511195552405-1/-ext-10000/student.txt could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
   at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1571)
   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3107)
   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3031)
   at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:725)
   at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
   at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:422)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

是datanode没有启动，重新执行命令./hadoop-daemon.sh start datanode 启动一下即可

6、查询语句

select * from student; 发现没有问题，但是select id from student;就报错了

Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1478941270145_0012, Tracking URL = http://tiancunPC:8088/proxy/application_1478941270145_0012/
Kill Command = /opt/hadoop/hadoop-2.7.3/bin/hadoop job -kill job_1478941270145_0012
Interrupting... Be patient, this might take some time.
Press Ctrl+C again to kill JVM
killing job with: job_1478941270145_0012
Exiting the JVM
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2016-11-12 22:59:56,214 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1478941270145_0012 with errors
Error during job, obtaining debugging information...
killing job with: job_1478941270145_0012
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: HDFS Read: 0 HDFS Write: 0 FAIL

解决办法：转载地址 https://www.iteblog.com/archives/831

说明：

如果你想查询某个表的某一列，Hive默认是会启用MapReduce Job来完成这个任务，如下：

 
   
        hive> SELECT id, money FROM m limit  
        10 
        ; 
       
 
        Total MapReduce jobs =  
        1 
       
 
        Launching Job  
        1 
        out of  
        1 
       
 
        Number of reduce tasks is set to  
        0 
        since there's no reduce operator 
       
 
        Cannot run job locally: Input Size (=  
        235105473 
        ) is larger than  
       
 
        hive.exec.mode.local.auto.inputbytes.max (=  
        134217728 
        ) 
       
 
        Starting Job = job_1384246387966_0229, Tracking URL =  
       
 
        http: 
        //l-datalogm1.data.cn1:9981/proxy/application_1384246387966_0229/ 
       
 
        Kill Command = /home/q/hadoop- 
        2.2 
        . 
        0 
        /bin/hadoop job   
       
 
        -kill job_1384246387966_0229 
       
 
        hadoop job information  
        for 
        Stage- 
        1 
        : number of mappers:  
        1 
        ;  
       
 
        number of reducers:  
        0 
       
 
        2013 
        - 
        11 
        - 
        13 
        11 
        : 
        35 
        : 
        16 
        , 
        167 
        Stage- 
        1 
         map =  
        0 
        %,  reduce =  
        0 
        % 
       
 
        2013 
        - 
        11 
        - 
        13 
        11 
        : 
        35 
        : 
        21 
        , 
        327 
        Stage- 
        1 
         map =  
        100 
        %,  reduce =  
        0 
        %, 
       
 
          
        Cumulative CPU  
        1.26 
        sec 
       
 
        2013 
        - 
        11 
        - 
        13 
        11 
        : 
        35 
        : 
        22 
        , 
        377 
        Stage- 
        1 
         map =  
        100 
        %,  reduce =  
        0 
        %, 
       
 
          
        Cumulative CPU  
        1.26 
        sec 
       
 
        MapReduce Total cumulative CPU time:  
        1 
        seconds  
        260 
        msec 
       
 
        Ended Job = job_1384246387966_0229 
       
 
        MapReduce Jobs Launched: 
       
 
        Job  
        0 
        : Map:  
        1   
        Cumulative CPU:  
        1.26 
        sec    
       
 
        HDFS Read:  
        8388865 
        HDFS Write:  
        60 
        SUCCESS 
       
 
        Total MapReduce CPU Time Spent:  
        1 
        seconds  
        260 
        msec 
       
 
        OK 
       
 
        1       
         122 
       
 
        1       
         185 
       
 
        1       
         231 
       
 
        1       
         292 
       
 
        1       
         316 
       
 
        1       
         329 
       
 
        1       
         355 
       
 
        1       
         356 
       
 
        1       
         362 
       
 
        1       
         364 
       
 
        Time taken:  
        16.802 
        seconds, Fetched:  
        10 
        row(s) 
       
 
 

　　我们都知道，启用MapReduce Job是会消耗系统开销的。对于这个问题，从Hive0.10.0版本开始，对于简单的不需要聚合的类似SELECT <col> from <table> LIMIT n语句，不需要起MapReduce job，直接通过Fetch task获取数据，可以通过下面几种方法实现：
　　方法一：

 
        hive> set hive.fetch.task.conversion=more; 
       
        hive> SELECT id, money FROM m limit  
        10 
        ; 
       
        OK 
       
        1       
         122 
       
        1       
         185 
       
        1       
         231 
       
        1       
         292 
       
        1       
         316 
       
        1       
         329 
       
        1       
         355 
       
        1       
         356 
       
        1       
         362 
       
        1       
         364 
       
        Time taken:  
        0.138 
        seconds, Fetched:  
        10 
        row(s)

上面 set hive.fetch.task.conversion=more;开启了Fetch任务，所以对于上述简单的列查询不在启用MapReduce job！
　　方法二：

bin/hive --hiveconf hive.fetch.task.conversion=more

　　方法三：
上面的两种方法都可以开启了Fetch任务，但是都是临时起作用的；如果你想一直启用这个功能，可以在${HIVE_HOME}/conf/hive-site.xml里面加入以下配置：

 
  <property> 
 
  <name>hive.fetch.task.conversion</name> 
 
  <value>more</value> 
 
  <description> 
 
  Some select queries can be converted to single FETCH task  
 
  minimizing latency.Currently the query should be single  
 
  sourced not having any subquery and should not have 
 
  any aggregations or distincts (which incurrs RS),  
 
  lateral views and joins. 
 
  1 
  . minimal : SELECT STAR, FILTER on partition columns, LIMIT only 
 
  2 
  . more    : SELECT, FILTER, LIMIT only (+TABLESAMPLE, virtual columns) 
 
  </description> 
 
  </property>

序猿小弟

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hive的简单操作

进入到hive的命令行：hive>1、查看有多少数据库：hive> show databases;OKdefaultTime taken: 0.016 seconds, Fetched: 1 row(s)hive>2、使用数据库：hive> use default;OKTime taken: 0.041 secondshive>3
复制链接

扫一扫