Hive表中四种不同数据导出方式以及如何自定义导出列分隔符

最新推荐文章于 2022-10-25 16:34:31 发布

小朋友,你是否有很多问号?

最新推荐文章于 2022-10-25 16:34:31 发布

阅读量3.2k

点赞数

本文详细介绍了Hive中导出数据的四种方法，并重点解释了如何使用hive-e和hive-f参数执行查询语句，同时提供了自定义导出文件列分隔符的方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

文章转载于http://blog.csdn.net/niityzu/article/details/42238483

问题导读：

1、Hive表数据四种导出方式是？

2、导出命令中LOCAL的作用及有无的区别？

3、导出命令中是否可以向导入命令一样使用INTO？

4、如何自定义导出文件的列分隔符？

5、hive的-e和-f参数的作用及如何使用其来导出数据？

6、hive shell环境中source命令的作用？

我们以test1表为例演示几种数据导出方式，test1表中的数据内容如下：

[html]view plaincopy 
   
 hive> select * from test1;  
 OK  
 Tom     24.0    NanJing Nanjing University  
 Jack    29.0    NanJing Southeast China University  
 Mary Kake       21.0    SuZhou  Suzhou University  
 John Doe        24.0    YangZhou        YangZhou University  
 Bill King       23.0    XuZhou  Xuzhou Normal University  
 Time taken: 0.064 seconds, Fetched: 5 row(s)  

一、导出数据至本地文件系统

(1) 导出数据

[html]view plaincopy 
   
 hive> insert overwrite local directory "/home/hadoopUser/data"  
     > select name,age,address  
     > from test1;  

/home/hadoopUser/data是一个目录，一个文件或者多个文件将会被写到该目录下，具体个数取决于调用的reducer的个数。此处的local表示导出到本地文件系统，如果不加local，则表示导出到分布式文件系统。

(2) 查看导出的数据

[html]view plaincopy 
   
 Tom^A24.0^ANanJing  
 Jack^A29.0^ANanJing  
 Mary Kake^A21.0^ASuZhou  
 John Doe^A24.0^AYangZhou  
 Bill King^A23.0^AXuZhou  

在本地文件系统目录/home/hadoopUser/data中生成了一个000000_0文件，里面的内容如上所示。每个列以^A(八进制\001)分隔。

二、导出数据至分布式文件系统

(1) 导出数据

[html]view plaincopy 
   
 hive> insert overwrite  directory "/output"  
     > select name,age,address  
     > from test1;  

此处与第一种方法的不同之处就在于没有local关键字，目录/output为分布式文件系统的一个目录，也可以写全：hdfs://master-server/output

(2) 查看导出的数据

结果和第一种方法一样

注意：如果我们将overwrite改成into，就像数据导入表一样，结果会怎么样。

[html]view plaincopy 
   
 hive> insert into local directory "/home/hadoopUser/data"  
     > select name,age,address  
     > from test1;  
 NoViableAltException(85@[184:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );])  
         at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)  
         at org.antlr.runtime.DFA.predict(DFA.java:116)  
         at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4945)  
         at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:40208)  
         at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.tableOrPartition(HiveParser_IdentifiersParser.java:10233)  
         at org.apache.hadoop.hive.ql.parse.HiveParser.tableOrPartition(HiveParser.java:40210)  
         at org.apache.hadoop.hive.ql.parse.HiveParser.insertClause(HiveParser.java:39685)  
         at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:37647)  
         at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:36898)  
         at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:36774)  
         at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1338)  
         at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)  
         at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)  
         at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)  
         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)  
         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)  
         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)  
         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)  
         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)  
         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)  
         at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)  
         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)  
         at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)  
         at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)  
         at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)  
         at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)  
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
         at java.lang.reflect.Method.invoke(Method.java:606)  
         at org.apache.hadoop.util.RunJar.main(RunJar.java:212)  
 FAILED: ParseException line 1:12 missing TABLE at 'local' near 'into' in table name  

和Hive数据导入不一样，不能使用INSERT INTO命令导出表中数据。

三、将表中数据导出至另外一个表中

此种方法与我写的另外一篇博客内容相似，此处不再累赘。数据导入表中的一种方法，即如何将查询结果插入表中，附上链接：通过查询语句向表中插入数据

四、hive -e和-f参数的使用及数据导出

(1) 使用hive -e数据导出

用户可能有时候期望执行一个或者多个查询（使用分号分隔），执行结束后hive CLI立即退出。hive使用-e 命令在终端执行一条或者多条语句，语句使用引号引起来，还可以将查询的结果重定向到一个文件中。

注意：不需要在Hive shell中执行，直接在终端执行即可。

需要先使用"use hive"进入对应的数据库，否则会报错。如下

[html]view plaincopy 
   
 [hadoopUser@secondmgt ~]$ hive -e "use hive;select * from test1" > /home/hadoopUser/data/myquery  
 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces  
 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize  
 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative  
 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node  
 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive  
 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack  
 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize  
 14/12/29 18:53:18 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed  
   
 Logging initialized using configuration in file:/home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties  
 OK  
 Time taken: 0.597 seconds  
 OK  
 Time taken: 0.66 seconds, Fetched: 5 row(s)  

查看导出文件结果

[html]view plaincopy 
   
 Tom     24.0    NanJing Nanjing University  
 Jack    29.0    NanJing Southeast China University  
 Mary Kake       21.0    SuZhou  Suzhou University  
 John Doe        24.0    YangZhou        YangZhou University  
 Bill King       23.0    XuZhou  Xuzhou Normal University  

注意：默认是以\t作为列分隔符的。

(2) 使用hive -f 执行查询文件

有时候我们将多条查询语句写入到一个文件中，使用hive -f 可以执行该文件中的一条或者多条语句。按照惯例，一般把这些Hive查询文件保存为具有.q或者.hql后缀名的文件。

编辑query.hql，内容如下：

[html]view plaincopy 
   
 use hive;  
 select name,age from test1;  

执行命令及过程如下：

[html]view plaincopy 
   
 [hadoopUser@secondmgt ~]$ hive -f data/query.hql  
 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces  
 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize  
 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative  
 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node  
 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive  
 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack  
 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize  
 14/12/29 19:07:38 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed  
   
 Logging initialized using configuration in file:/home/hadoopUser/cloud/hive/apache-hive-0.13.1-bin/conf/hive-log4j.properties  
 OK  
 Time taken: 0.564 seconds  
 Total jobs = 1  
 Launching Job 1 out of 1  
 Number of reduce tasks is set to 0 since there's no reduce operator  
 Starting Job = job_1419317102229_0038, Tracking URL = http://secondmgt:8088/proxy/application_1419317102229_0038/  
 Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job  -kill job_1419317102229_0038  
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0  
 2014-12-29 19:07:55,168 Stage-1 map = 0%,  reduce = 0%  
 2014-12-29 19:08:06,671 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.93 sec  
 MapReduce Total cumulative CPU time: 2 seconds 930 msec  
 Ended Job = job_1419317102229_0038  
 MapReduce Jobs Launched:  
 Job 0: Map: 1   Cumulative CPU: 2.93 sec   HDFS Read: 415 HDFS Write: 63 SUCCESS  
 Total MapReduce CPU Time Spent: 2 seconds 930 msec  
 OK  
 Tom     24.0  
 Jack    29.0  
 Mary Kake       21.0  
 John Doe        24.0  
 Bill King       23.0  
 Time taken: 27.17 seconds, Fetched: 5 row(s)  

(3) 在Hive shell中使用source命令执行一个脚本文件

[html]view plaincopy 
   
 hive> source /home/hadoopUser/data/query.hql;  
 OK  
 Time taken: 0.394 seconds  
 Total jobs = 1  
 Launching Job 1 out of 1  
 Number of reduce tasks is set to 0 since there's no reduce operator  
 Starting Job = job_1419317102229_0039, Tracking URL = http://secondmgt:8088/proxy/application_1419317102229_0039/  
 Kill Command = /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0/bin/hadoop job  -kill job_1419317102229_0039  
 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0  
 2014-12-29 19:13:03,718 Stage-1 map = 0%,  reduce = 0%  
 2014-12-29 19:13:15,459 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.7 sec  
 MapReduce Total cumulative CPU time: 2 seconds 700 msec  
 Ended Job = job_1419317102229_0039  
 MapReduce Jobs Launched:  
 Job 0: Map: 1   Cumulative CPU: 2.7 sec   HDFS Read: 415 HDFS Write: 63 SUCCESS  
 Total MapReduce CPU Time Spent: 2 seconds 700 msec  
 OK  
 Tom     24.0  
 Jack    29.0  
 Mary Kake       21.0  
 John Doe        24.0  
 Bill King       23.0  
 Time taken: 27.424 seconds, Fetched: 5 row(s)  

附：

自定义导出文件列分隔符

在默认情况下，从Hive表中导出的文件是以^A作为列分隔符的，但是有时候此方法分隔显示的不直观，为此Hive允许指定列的分隔符。我们还是以上面第一种方法导出表中数据为例。

(1) 导出表中数据

[html]view plaincopy 
   
 hive> insert overwrite local directory "/home/hadoopUser/data"  
     > row format delimited  
     > fields terminated by '\t'  
     > select * from test1;  

此处是以\t作为列的分隔符的。

(2) 查看导出结果

[html]view plaincopy 
   
 Tom     24.0    NanJing Nanjing University  
 Jack    29.0    NanJing Southeast China University  
 Mary Kake       21.0    SuZhou  Suzhou University  
 John Doe        24.0    YangZhou        YangZhou University  
 Bill King       23.0    XuZhou  Xuzhou Normal University