Hive 命令分为进入 Hive 环境可以执行的命令和 hive 提供的在操作系统中执行的命令
1. Hive 环境可以执行的命令
命令 | 解释 |
---|---|
quit, exit. | 使用 quit 或 exit 离开交互接口 |
reset | 重置配置项为默认值。通过使用 set 命令或者在 hive 命令行用 -hiveconf 设置的参数,会设置为默认值。 通过 set hiveconf:; 的方式设置的变量,由于历史原因,不适用。 |
set <key>=<value> | 设置配置参数的值。如果变量名拼写错误,不会给出提示 |
set | 输出用户的和 Hive 的所有的配置 |
set -v | 输出所有 Hadoop 和 Hive 的配置 |
add FILE[S] <filepath> <filepath>* add JAR[S] <filepath> <filepath>* add ARCHIVE[S] <filepath> <filepath>* | 增加一个或多个文件,jar 包或者归档文件到分布式缓存。add files 可以一次添加一个目录下得所有文件,只用指定目录名,add jars 和 add archives 类似。 |
add FILE[S] <ivyurl> <ivyurl>* add JAR[S] <ivyurl> <ivyurl>* add ARCHIVE[S]<ivyurl> <ivyurl>* | 和上行作用一样,以 Ivy URL 的格式 ivy://group:module:version?query_string。需要服务器可以上网 |
list FILE[S] list JAR[S] list ARCHIVE[S] | 列出已经添加的文件,jar 包,归档文件等,可以用于检查资源是否已经添加到分布式缓存 |
delete FILE[S] <filepath>* delete JAR[S] <filepath>* delete ARCHIVE[S] <filepath>* | 从分布式缓存删除资源。 |
delete FILE[S]<ivyurl> <ivyurl>* delete JAR[S] <ivyurl> <ivyurl>* delete ARCHIVE[S] <ivyurl> <ivyurl>* | 删除 <ivyurl > 格式的资源 |
! <command> | 从 Hive shell 中执行一个 shell 命令 |
dfs <dfs command> | 从 Hive shell 中执行一个 dfs 命令 |
<query string> | 执行 Hive查询,并且输出结果到标准输出 |
source FILE <filepath> | 执行一个脚本文件。 |
compile <groovy string> AS GROOVY NAMED <name> |
示例
compile `import org.apache.hadoop.hive.ql.exec.UDF \;
public class Madd extends UDF {
public double evaluate(double a, double b){
return a+b \;
}
} ` AS GROOVY NAMED Madd.groovy;
CREATE TEMPORARY FUNCTION Madd as 'Madd';
SELECT Madd(3,4);
DROP TEMPORARY FUNCTION Madd;
2. hive 提供的在操作系统中执行的命令
2.1 —help
直接执行 hive --help
,输出信息如下:
[houzhizhen@localhost ~]$ hive --help
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cleardanglingscratchdir cli fixacidkeyindex help hiveburninclient hiveserver2 hplsql jar lineage llapdump llap llapstatus metastore metatool orcfiledump rcfilecat schemaTool strictmanagedmigration tokentool version
Parameters parsed:
--auxpath : Auxiliary jars
--config : Hive configuration directory
--service : Starts specific service/component. cli is default
Parameters used:
HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
HIVE_OPT : Hive options
For help on a particular service:
./hive --service serviceName --help
Debug help: ./hive --debug --help
可以放到任何 serivce 后面,代表看此 service 的help。
如
hive --service cli --help
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/houzhizhen/software/hive/apache-hive-3.1.3-bin/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/houzhizhen/software/hadoop/hadoop-3.2.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = d644b5ac-8244-4177-a01e-6f8c69ade60f
usage: hive
-d,--define <key=value> Variable substitution to apply to Hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable substitution to apply to Hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the
console)
2.2 hive --debug
hive --debug --help
可以看到 debug 的参数的设置。
[houzhizhen@localhost ~]$ hive --debug --help
Allows to debug Hive by connecting to it via JDI API
Usage: hive --debug[:comma-separated parameters list]
Parameters:
recursive=<y|n> Should child JVMs also be started in debug mode. Default: y
port=<port_number> Port on which main JVM listens for debug connection. Default: 8000
mainSuspend=<y|n> Should main JVM wait with execution for the debugger to connect. Default: y
childSuspend=<y|n> Should child JVMs wait with execution for the debugger to connect. Default: n
swapSuspend Swaps suspend options between main and child JVMs
--
可以加到任何 service 上,如hive --service beeline --debug
,会已 debug 模式启动 beeline。执行效果如下,等待远程debug 模式连接此 JVM。
[houzhizhen@localhost ~]$ hive --service beeline --debug
Listening for transport dt_socket at address: 8000
2.3 beeline
beeline 可以作为 jdbc 客户端连接远程 hiveserver 或者其他数据库。
直接执行 beeline
命令和 hive --service beeline
的效果一样。其实 beeline 的内容如下:
#!/usr/bin/env bash
# Omit some copyright information
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
. "$bin"/hive --service beeline "$@"
2.4 cleardanglingscratchdir
hive --service cleardanglingscratchdir
执行的 class 是 org.apache.hadoop.hive.ql.session.ClearDanglingScratchDir
. 以下的注释说明了工作原理。
/**
* A tool to remove dangling scratch directory. A scratch directory could be left behind
* in some cases, such as when vm restarts and leave no chance for Hive to run shutdown hook.
* The tool will test a scratch directory is use, if not, remove it.
* We rely on HDFS write lock for to detect if a scratch directory is in use:
* 1. A HDFS client open HDFS file ($scratchdir/inuse.lck) for write and only close
* it at the time the session is closed
* 2. cleardanglingscratchDir can try to open $scratchdir/inuse.lck for write. If the
* corresponding HiveCli/HiveServer2 is still running, we will get exception.
* Otherwise, we know the session is dead
* 3. If the HiveCli/HiveServer2 dies without closing the HDFS file, NN will reclaim the
* lease after 10 min, ie, the HDFS file hold by the dead HiveCli/HiveServer2 is writable
* again after 10 min. Once it become writable, cleardanglingscratchDir will be able to
* remove it
*/
2.5 cli
执行hive --service cli
和直接执行hive
的结果一样,如 hive 命令没有参数,则默认 service 是 cli。
hive --service cli --help
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/houzhizhen/software/hive/apache-hive-3.1.3-bin/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/houzhizhen/software/hadoop/hadoop-3.2.3/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = d644b5ac-8244-4177-a01e-6f8c69ade60f
usage: hive
-d,--define <key=value> Variable substitution to apply to Hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable substitution to apply to Hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the
console)
2.6 fixacidkeyindex
可以检测一个或者多个目录。
hive --service fixacidkeyindex --help
usage ./hive --service fixacidkeyindex [-h] --check-only|--recover [--backup-path <new-path>] <path_to_orc_file_or_directory>
--check-only Check acid orc file for valid acid key index and exit without fixing
--recover Fix the acid key index for acid orc file if it requires fixing
--backup-path <new_path> Specify a backup path to store the corrupted files (default: /tmp)
--help (-h) Print help message
执行的 class 是org.apache.hadoop.hive.ql.io.orc.FixAcidKeyIndex
.
/**
* Utility to check and fix the ACID key index of an ORC file if it has been written incorrectly
* due to HIVE-18817.
* The condition that will be checked in the ORC file will be if the number of stripes in the
* acid key index matches the number of stripes in the ORC StripeInformation.
*/
2.8 hiveburninclient
Execute test with specified loop.
2.9 hiveserver2
hive --service hiveserver2 --help
usage: hiveserver2
--deregister <versionNumber> Deregister all instances of given
version from dynamic service discovery
--failover <workerIdentity> Manually failover Active HS2 instance
to passive standby mode
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--listHAPeers List all HS2 instances when running in
Active Passive HA mode
hplsql
vim a.hql
CREATE FUNCTION hello(text STRING)
RETURNS STRING
BEGIN
RETURN 'Hello, ' || text || '!';
END;
PRINT hello('world')
hplsql -f a.hql
Hello, world!
jar
./hive --service jar <yourjar> <yourclass> HIVE_OPTS <your_args>
lineage
hive --service lineage 'select sr_customer_sk as ctr_customer_sk,sr_store_sk as ctr_store_sk,sum(SR_FEE) as ctr_total_return from tpcds_hdfs_orc_3.store_returns,tpcds_hdfs_orc_3.date_dim where sr_returned_date_sk = d_date_sk and d_year =2000 group by sr_customer_sk ,sr_store_sk'
InputTable=tpcds_hdfs_orc_3.date_dim
InputTable=tpcds_hdfs_orc_3.store_returns
llapdump
llap
llapstatus
metastore
hive --service metastore --help
usage: hivemetastore
-h,--help Print help information
--hiveconf <property=value> Use value for given property
-p <port> Hive Metastore port number, default:9083
-v,--verbose Verbose mode
metatool
metatool
Initializing HiveMetaTool..
HiveMetaTool:Parsing failed. Reason: Invalid arguments:
usage: metatool
-dryRun Perform a dry run of
updateLocation changes.When run
with the dryRun option
updateLocation changes are
displayed but not persisted.
dryRun is valid only with the
updateLocation option.
-executeJDOQL <query-string> execute the given JDOQL query
-help print this message
-listFSRoot print the current FS root
locations
-prepareAcidUpgrade <find-compactions> Generates a set Compaction
commands to run to prepare for
Hive 2.x to 3.0 upgrade
-serdePropKey <serde-prop-key> Specify the key for serde
property to be updated.
serdePropKey option is valid
only with updateLocation option.
-tablePropKey <table-prop-key> Specify the key for table
property to be updated.
tablePropKey option is valid
only with updateLocation option.
-updateLocation <new-loc> <old-loc> Update FS root location in the
metastore to new location.Both
new-loc and old-loc should be
valid URIs with valid host names
and schemes.When run with the
dryRun option changes are
displayed but are not persisted.
When run with the
serdepropKey/tablePropKey option
updateLocation looks for the
serde-prop-key/table-prop-key
that is specified and updates
its value if found.