HADOOP集群配置（Configuration）

最新推荐文章于 2022-08-31 10:51:39 发布

Dalek

最新推荐文章于 2022-08-31 10:51:39 发布

阅读量1.3k

点赞数

分类专栏： linux/unix c/c++ 文章标签： hadoop 集群 mapreduce parameters list jobs

本文链接：https://blog.csdn.net/hellozdd/article/details/7246047

版权

c/c++ 同时被 2 个专栏收录

83 篇文章 0 订阅

订阅专栏

linux/unix

82 篇文章 0 订阅

订阅专栏

http://hadoop.apache.org/common/docs/stable/cluster_setup.html

Configuration：配置

The following sections describe how to configure a Hadoop cluster.

下面的各个章节描述怎样配置HADOOP集群

Configuration Files：配置文件

Hadoop configuration is driven by two types of important configuration files:

HADOOP有两种重要的配置文件；

Read-only default configuration - src/core/core-default.xml, src/hdfs/hdfs-default.xml and src/mapred/mapred-default.xml.
Site-specific configuration - conf/core-site.xml, conf/hdfs-site.xml and conf/mapred-site.xml.
只读默认配置这些配置文件分别是src/core/core-default.xml, src/hdfs/hdfs-default.xml and src/mapred/mapred-default.xml.
节点具体配置，这些文件分别是 conf/core-site.xml, conf/hdfs-site.xml and conf/mapred-site.xml.

To learn more about how the Hadoop framework is controlled by these configuration files, look here.

需要学习更多关于HADOO框架如何被这些配置文件控制，查看这里。

Additionally, you can control the Hadoop scripts found in the bin/ directory of the distribution, by setting site-specific values via the conf/hadoop-env.sh.

另外，你可以控制HADOOP脚本，他们被存放到 bin目录下，设置节点具体配置值通过conf/hadoop-env.sh.

这里也就是说，你可以通过脚本conf/hadoop-env.sh.修改具体配置值。

Site Configuration:节点配置

To configure the Hadoop cluster you will need to configure the environment in which the Hadoop daemons execute as well as the configuration parameters for the Hadoop daemons.

配置HADOOP集群，你将需要配置环境变量，HADOOP坚守进程很好执行需要这些配置的参数。

The Hadoop daemons are NameNode/DataNode and JobTracker/TaskTracker.

HADOOP坚守进程是NameNode/DataNode and JobTracker/TaskTracker。名字节点、数据节点，工作TRACKER和任务TRACKER.

Configuring the Environment of the Hadoop Daemons：配置HADOOP坚守进程的环境变量

Administrators should use the conf/hadoop-env.sh script to do site-specific customization of the Hadoop daemons' process environment.

At the very least you should specify the JAVA_HOME so that it is correctly defined on each remote node.

管理员可以使用conf/hadoop-env.sh脚本去设定坚守进程环境。

Administrators can configure individual daemons using the configuration options HADOOP_*_OPTS. Various options available are shown below in the table.

管理员可以通过配置选项HADOOP_*_OPTS配置单独守护进程。各种选项显示在下面列表中

Daemon	Configure Options
NameNode	HADOOP_NAMENODE_OPTS
DataNode	HADOOP_DATANODE_OPTS
SecondaryNamenode	HADOOP_SECONDARYNAMENODE_OPTS
JobTracker	HADOOP_JOBTRACKER_OPTS
TaskTracker	HADOOP_TASKTRACKER_OPTS

For example, To configure Namenode to use parallelGC, the following statement should be added in hadoop-env.sh :

例如，去配置一个名字节点去使用parallelGC，下面的语句应该被添加到hadoop-env.sh脚本里；
export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}"

Other useful configuration parameters that you can customize include:

其他有用的配置参数，你可以自己定义，他们包括：

HADOOP_LOG_DIR - The directory where the daemons' log files are stored. They are automatically created if they don't exist.
HADOOP_LOG_DIR 这个文件夹用于指定守护进程日志的存储路径，他们自动的被创建，如果没有指定。
HADOOP_HEAPSIZE - The maximum amount of heapsize to use, in MB e.g. 1000MB. This is used to configure the heap size for the hadoop daemon. By default, the value is 1000MB.
HADOOP_HEAPSIZE heapsize 的最大使用数量，以MB为单位，这个用来给HADOOP坚守进程配置堆大小，默认值为1000MB。

Configuring the Hadoop Daemons：配置HADOOP守护进程

This section deals with important parameters to be specified in the following:

这部分处理重要的参数，用于指定在下面的文件中。
conf/core-site.xml:

Parameter	Value	Notes
fs.default.name	URI of NameNode.	hdfs://hostname/

conf/hdfs-site.xml:

Parameter	Value	Notes
dfs.name.dir	Path on the local filesystem where the NameNodestores the namespace and transactions logs persistently.	If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.data.dir	Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.	If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.

conf/mapred-site.xml:

Parameter	Value	Notes
mapred.job.tracker	Host or IP and port ofJobTracker.	host:port pair.
mapred.system.dir	Path on the HDFS where where the MapReduce framework stores system files e.g./hadoop/mapred/system/.	This is in the default filesystem (HDFS) and must be accessible from both the server and client machines.
mapred.local.dir	Comma-separated list of paths on the local filesystem where temporary MapReduce data is written.	Multiple paths help spread disk i/o.
mapred.tasktracker.{map\|reduce}.tasks.maximum	The maximum number of MapReduce tasks, which are run simultaneously on a given TaskTracker, individually.	Defaults to 2 (2 maps and 2 reduces), but vary it depending on your hardware.
dfs.hosts/dfs.hosts.exclude	List of permitted/excluded DataNodes.	If necessary, use these files to control the list of allowable datanodes.
mapred.hosts/mapred.hosts.exclude	List of permitted/excluded TaskTrackers.	If necessary, use these files to control the list of allowable TaskTrackers.
mapred.queue.names	Comma separated list of queues to which jobs can be submitted.	The MapReduce system always supports atleast one queue with the name as default. Hence, this parameter's value should always contain the string default. Some job schedulers supported in Hadoop, like the Capacity Scheduler, support multiple queues. If such a scheduler is being used, the list of configured queue names must be specified here. Once queues are defined, users can submit jobs to a queue using the property namemapred.job.queue.name in the job configuration. There could be a separate configuration file for configuring properties of these queues that is managed by the scheduler. Refer to the documentation of the scheduler for information on the same.
mapred.acls.enabled	Boolean, specifying whether checks for queue ACLs and job ACLs are to be done for authorizing users for doing queue operations and job operations.	If true, queue ACLs are checked while submitting and administering jobs and job ACLs are checked for authorizing view and modification of jobs. Queue ACLs are specified using the configuration parameters of the formmapred.queue.queue-name.acl-name, defined below under mapred-queue-acls.xml. Job ACLs are described at Job Authorization

conf/mapred-queue-acls.xml

Parameter	Value	Notes
mapred.queue.queue-name.acl-submit-job	List of users and groups that can submit jobs to the specified queue-name.	The list of users and groups are both comma separated list of names. The two lists are separated by a blank. Example: user1,user2 group1,group2. If you wish to define only a list of groups, provide a blank at the beginning of the value.
mapred.queue.queue-name.acl-administer-jobs	List of users and groups that can view job details, change the priority or kill jobs that have been submitted to the specifiedqueue-name.	The list of users and groups are both comma separated list of names. The two lists are separated by a blank. Example: user1,user2 group1,group2. If you wish to define only a list of groups, provide a blank at the beginning of the value. Note that the owner of a job can always change the priority or kill his/her own job, irrespective of the ACLs.

Typically all the above parameters are marked as final to ensure that they cannot be overriden by user-applications.

Dalek

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
HADOOP集群配置（Configuration）

http://hadoop.apache.org/common/docs/stable/cluster_setup.htmlConfiguration：配置The following sections describe how to configure a Hadoop cluster.下面的各个章节描述怎样配置HADOOP集群Configuration
复制链接

扫一扫