配置文件
WebHCat (Templeton)的配置将常规Hadoop配置与特定于WebHCat的变量合并在一起。因为WebHCat的设计目的是连接通常不连接的服务,所以配置要复杂得多。
特定于webhcat的配置分为两层:
- webhcat-default.xml WebHCat需要的所有配置变量。这个文件设置了WebHCat附带的默认值,并且只能由WebHCat开发人员更改。不要复制此文件或更改它以维护本地安装设置。因为WebHCat -default.xml存在于WebHCat war文件中,所以编辑它的本地副本不会更改配置。
- webhcat-site.xml 系统管理员可以在其中为Hadoop集群设置变量的配置文件(可能是空的)。创建此文件并在其中维护配置变量的条目,这些配置变量要求您根据本地安装重写默认值。
请注意
在对配置进行任何更改之后,WebHCat服务器都需要重新启动。
配置文件按如下顺序加载,后面的文件覆盖前面的文件:
要找到配置文件,WebHCat首先尝试从类路径加载一个文件,然后查看TEMPLETON_HOME环境变量中指定的目录。
配置文件可以访问所有环境变量的特殊环境变量env。例如,可以使用以下命令指定Pig可执行文件:
${env.PIG_HOME}/bin/pig
使用文件系统路径的配置变量尽量具有合理的缺省值。但是,如果存在任何不确定性,指定完整的路径总是安全的。
日志文件位置
webhcat-log4j。属性文件设置WebHCat创建的日志文件和日志系统的其他属性的位置。
配置变量
变量名 | 描述 |
templeton.frame.options.filter | Adds web server protection from clickjacking using X-Frame-Options header. The possible values are DENY, SAMEORIGIN, ALLOW-FROM <uri>. Versions: Hive 3.0.0 and later. |
templeton.override.enabled | Enable the override path in templeton.override.jars. |
templeton.exec.timeout | How long in milliseconds a program is allowed to run on the WebHCat box. |
templeton.callback.retry.interval | How long to wait between callback retry attempts in milliseconds. |
templeton.callback.retry.attempts | How many times to retry the callback. |
templeton.libjars | Jars to add to the classpath. |
templeton.override.jars | Jars to add to the |
templeton.controller.mr.child.opts | Java options to be passed to WebHCat controller map task. |
templeton.hadoop.queue.name | MapReduce queue name where WebHCat map-only jobs will be submitted to. Can be used to avoid a deadlock where all map slots in the cluster are taken over by Templeton launcher tasks. Versions: Hive 0.12.0 and later. |
templeton.hive.properties | Properties to set when running Hive (during job submission). This is expected to be a comma-separated prop=value list. If some value is itself a comma-separated list, the escape character is '\' </description> (from Hive 0.13.1 onward). To use it in a cluster with Kerberos security enabled, set |
templeton.storage.class | The class to use as storage. |
templeton.exec.encoding | The encoding of the stdout and stderr data. |
templeton.exec.envs | The environment variables passed through to exec. |
templeton.streaming.jar | The HDFS path to the Hadoop streaming jar file. |
templeton.port | The HTTP port for the main server. |
templeton.kerberos.principal | The Kerberos principal to used by the server. As stated by the Kerberos SPNEGO specification, it should be |
templeton.kerberos.keytab | The keytab file containing the credentials for the Kerberos principal. |
templeton.hdfs.cleanup.maxage | The maximum age of a WebHCat job. |
templeton.zookeeper.cleanup.maxage | The maximum age of a WebHCat job. |
templeton.hdfs.cleanup.interval | The maximum delay between a thread's cleanup checks. |
templeton.zookeeper.cleanup.interval | The maximum delay between a thread's cleanup checks. |
templeton.exec.max-output-bytes | The maximum number of bytes from stdout or stderr stored in ram. |
templeton.exec.max-procs | The maximum number of processes allowed to run at once. |
templeton.storage.root | The path to the directory to use for storage. |
templeton.hadoop.config.dir | The path to the Hadoop configuration. |
templeton.hadoop | The path to the Hadoop executable. |
templeton.hcat | The path to the HCatalog executable. |
templeton.hive.archive | The path to the Hive archive. |
templeton.hive.path | The path to the Hive executable. |
templeton.pig.archive | The path to the Pig archive. |
templeton.pig.path | The path to the Pig executable. |
Obsolete: templeton.jar | The path to the WebHCat jar file. (Not used in recent releases, so removed in Hive 0.14.0.) |
templeton.kerberos.secret | The secret used to sign the HTTP cookie value. The default value is a random value. Unless multiple WebHCat instances need to share the secret the random value is adequate. |
templeton.mapper.memory.mb | WebHCat controller job's Launch mapper's memory limit in megabytes. When submitting a controller job, WebHCat will overwrite Versions: Hive 0.14.0 and later. |
templeton.zookeeper.hosts | ZooKeeper servers, as comma-separated host:port pairs. |
templeton.zookeeper.session-timeout | ZooKeeper session timeout in milliseconds. |
默认值
WebHCat配置变量的一些默认值取决于版本号。有关您正在使用的Hive发行版中的默认值,请参阅webht -default.xml文件。可在SVN资料库找到:
http://svn.apache.org/repos/asf/hive/branches/branch- < release_number > / hcatalog / webhcat svr / src / main / config / webhcat-default.xml
其中<release_number>为0.11、0.12,依此类推。在Hive 0.11之前,WebHCat在Apache孵化器中。例如:
- Hive 0.12.0: http://svn.apache.org/repos/asf/hive/branches/branch-0.12/hcatalog/webhcat/svr/src/main/config/webhcat-default.xml
- Hive 0.13.0: http://svn.apache.org/repos/asf/hive/branches/branch-0.13/hcatalog/webhcat/svr/src/main/config/webhcat-default.xml
Hive 0.11之前的默认值列在HCatalog 0.5.0文档中:
- HCatalog 0.5.0: WebHCat 配置变量
Navigation Links
Previous: Installation
Next: Reference
Hive configuration: Configuring Hive, Hive Configuration Properties, Thrift Server Setup
General: WebHCat Manual – HCatalog Manual – Hive Wiki Home – Hive Project Site
参考:
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Configure