HADOOP集群配置(Configuration)

http://hadoop.apache.org/common/docs/stable/cluster_setup.html

Configuration:配置

The following sections describe how to configure a Hadoop cluster.

下面的各个章节描述怎样配置HADOOP集群

Configuration Files:配置文件

Hadoop configuration is driven by two types of important configuration files:

HADOOP有两种重要的配置文件;

  1. Read-only default configuration - src/core/core-default.xmlsrc/hdfs/hdfs-default.xml and src/mapred/mapred-default.xml.
  2. Site-specific configuration - conf/core-site.xmlconf/hdfs-site.xml and conf/mapred-site.xml.
  3. 只读默认配置 这些配置文件分别是src/core/core-default.xmlsrc/hdfs/hdfs-default.xml and src/mapred/mapred-default.xml.
  4. 节点具体配置,这些文件分别是 conf/core-site.xmlconf/hdfs-site.xml and conf/mapred-site.xml.

To learn more about how the Hadoop framework is controlled by these configuration files, look here.

需要学习更多关于HADOO框架如何被这些配置文件控制,查看这里。

Additionally, you can control the Hadoop scripts found in the bin/ directory of the distribution, by setting site-specific values via the conf/hadoop-env.sh.

另外,你可以控制HADOOP脚本,他们被存放到 bin目录下,设置节点具体配置值通过conf/hadoop-env.sh.

这里也就是说,你可以通过脚本conf/hadoop-env.sh.修改具体配置值。

Site Configuration:节点配置

To configure the Hadoop cluster you will need to configure the environment in which the Hadoop daemons execute as well as the configuration parameters for the Hadoop daemons.

配置HADOOP集群,你将需要配置环境变量,HADOOP坚守进程很好执行需要这些配置的参数。

The Hadoop daemons are NameNode/DataNode and JobTracker/TaskTracker.

HADOOP坚守进程是NameNode/DataNode and JobTracker/TaskTracker。名字节点、数据节点,工作TRACKER和任务TRACKER.

Configuring the Environment of the Hadoop Daemons:配置HADOOP坚守进程的环境变量

Administrators should use the conf/hadoop-env.sh script to do site-specific customization of the Hadoop daemons' process environment.

At the very least you should specify the JAVA_HOME so that it is correctly defined on each remote node.

管理员可以使用conf/hadoop-env.sh脚本去设定坚守进程环境。

Administrators can configure individual daemons using the configuration options HADOOP_*_OPTS. Various options available are shown below in the table.

管理员可以通过配置选项HADOOP_*_OPTS配置单独守护进程。各种选项显示在下面列表中

Daemon Configure Options
NameNode HADOOP_NAMENODE_OPTS
DataNode HADOOP_DATANODE_OPTS
SecondaryNamenode HADOOP_SECONDARYNAMENODE_OPTS
JobTracker HADOOP_JOBTRACKER_OPTS
TaskTracker HADOOP_TASKTRACKER_OPTS

For example, To configure Namenode to use parallelGC, the following statement should be added in hadoop-env.sh : 

例如,去配置一个名字节点去使用parallelGC,下面的语句应该被添加到hadoop-env.sh脚本里;
export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}" 


Other useful configuration parameters that you can customize include:

其他有用的配置参数,你可以自己定义,他们包括:

  • HADOOP_LOG_DIR - The directory where the daemons' log files are stored. They are automatically created if they don't exist.
  • HADOOP_LOG_DIR  这个文件夹用于指定守护进程日志的存储路径,他们自动的被创建,如果没有指定。
  • HADOOP_HEAPSIZE - The maximum amount of heapsize to use, in MB e.g. 1000MB. This is used to configure the heap size for the hadoop daemon. By default, the value is 1000MB.
  • HADOOP_HEAPSIZE  heapsize 的最大使用数量,以MB为单位,这个用来给HADOOP坚守进程配置堆大小,默认值为1000MB。

Configuring the Hadoop Daemons:配置HADOOP守护进程

This section deals with important parameters to be specified in the following: 

这部分处理重要的参数,用于指定在下面的文件中。
conf/core-site.xml:

Parameter Value Notes
fs.default.name URI of NameNode. hdfs://hostname/


conf/hdfs-site.xml:

Parameter Value Notes
dfs.name.dir Path on the local filesystem where the NameNodestores the namespace and transactions logs persistently.
If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.data.dir Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.


conf/mapred-site.xml:

Parameter Value Notes
mapred.job.tracker Host or IP and port ofJobTracker. host:port pair.
mapred.system.dir Path on the HDFS where where the MapReduce framework stores system files e.g./hadoop/mapred/system/. This is in the default filesystem (HDFS) and must be accessible from both the server and client machines.
mapred.local.dir Comma-separated list of paths on the local filesystem where temporary MapReduce data is written. Multiple paths help spread disk i/o.
mapred.tasktracker.{map|reduce}.tasks.maximum The maximum number of MapReduce tasks, which are run simultaneously on a given TaskTracker, individually. Defaults to 2 (2 maps and 2 reduces), but vary it depending on your hardware.
dfs.hosts/dfs.hosts.exclude List of permitted/excluded DataNodes. If necessary, use these files to control the list of allowable datanodes.
mapred.hosts/mapred.hosts.exclude List of permitted/excluded TaskTrackers. If necessary, use these files to control the list of allowable TaskTrackers.
mapred.queue.names Comma separated list of queues to which jobs can be submitted. The MapReduce system always supports atleast one queue with the name as default. Hence, this parameter's value should always contain the string default. Some job schedulers supported in Hadoop, like the Capacity Scheduler, support multiple queues. If such a scheduler is being used, the list of configured queue names must be specified here. Once queues are defined, users can submit jobs to a queue using the property namemapred.job.queue.name in the job configuration. There could be a separate configuration file for configuring properties of these queues that is managed by the scheduler. Refer to the documentation of the scheduler for information on the same.
mapred.acls.enabled Boolean, specifying whether checks for queue ACLs and job ACLs are to be done for authorizing users for doing queue operations and job operations. If true, queue ACLs are checked while submitting and administering jobs and job ACLs are checked for authorizing view and modification of jobs. Queue ACLs are specified using the configuration parameters of the formmapred.queue.queue-name.acl-name, defined below under mapred-queue-acls.xml. Job ACLs are described at Job Authorization


conf/mapred-queue-acls.xml

Parameter Value Notes
mapred.queue.queue-name.acl-submit-job List of users and groups that can submit jobs to the specified queue-name. The list of users and groups are both comma separated list of names. The two lists are separated by a blank. Example: user1,user2 group1,group2. If you wish to define only a list of groups, provide a blank at the beginning of the value.
mapred.queue.queue-name.acl-administer-jobs List of users and groups that can view job details, change the priority or kill jobs that have been submitted to the specifiedqueue-name. The list of users and groups are both comma separated list of names. The two lists are separated by a blank. Example: user1,user2 group1,group2. If you wish to define only a list of groups, provide a blank at the beginning of the value. Note that the owner of a job can always change the priority or kill his/her own job, irrespective of the ACLs.

Typically all the above parameters are marked as final to ensure that they cannot be overriden by user-applications.



  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值