官方给出的集群需要的配置项及说明

3 篇文章 0 订阅
1 篇文章 0 订阅
Configuring Environment of Hadoop Daemons

Administrators should use the conf/hadoop-env.sh and conf/yarn-env.sh script to do site-specific customization of the Hadoop daemons' process environment.

At the very least you should specify the JAVA_HOME so that it is correctly defined on each remote node.

In most cases you should also specify HADOOP_PID_DIR and HADOOP_SECURE_DN_PID_DIR to point to directories that can only be written to by the users that are going to run the hadoop daemons. Otherwise there is the potential for a symlink attack.

Administrators can configure individual daemons using the configuration options shown below in the table:

Daemon Environment Variable
NameNode HADOOP_NAMENODE_OPTS
DataNode HADOOP_DATANODE_OPTS
Secondary NameNode HADOOP_SECONDARYNAMENODE_OPTS
ResourceManager YARN_RESOURCEMANAGER_OPTS
NodeManager YARN_NODEMANAGER_OPTS
WebAppProxy YARN_PROXYSERVER_OPTS
Map Reduce Job History Server HADOOP_JOB_HISTORYSERVER_OPTS

For example, To configure Namenode to use parallelGC, the following statement should be added in hadoop-env.sh :

  export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}"

Other useful configuration parameters that you can customize include:

  • HADOOP_LOG_DIR / YARN_LOG_DIR - The directory where the daemons' log files are stored. They are automatically created if they don't exist.
  • HADOOP_HEAPSIZE / YARN_HEAPSIZE - The maximum amount of heapsize to use, in MB e.g. if the varibale is set to 1000 the heap will be set to 1000MB. This is used to configure the heap size for the daemon. By default, the value is 1000. If you want to configure the values separately for each deamon you can use.
    Daemon Environment Variable
    ResourceManager YARN_RESOURCEMANAGER_HEAPSIZE
    NodeManager YARN_NODEMANAGER_HEAPSIZE
    WebAppProxy YARN_PROXYSERVER_HEAPSIZE
    Map Reduce Job History Server HADOOP_JOB_HISTORYSERVER_HEAPSIZE
Configuring the Hadoop Daemons in Non-Secure Mode

This section deals with important parameters to be specified in the given configuration files:

  • conf/core-site.xml
    Parameter Value Notes
    fs.defaultFS NameNode URI hdfs://host:port/
    io.file.buffer.size 131072 Size of read/write buffer used in SequenceFiles.
  • conf/hdfs-site.xml
    • Configurations for NameNode:
      Parameter Value Notes
      dfs.namenode.name.dir Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
      dfs.namenode.hosts /dfs.namenode.hosts.exclude List of permitted/excluded DataNodes. If necessary, use these files to control the list of allowable datanodes.
      dfs.blocksize 268435456 HDFS blocksize of 256MB for large file-systems.
      dfs.namenode.handler.count 100 More NameNode server threads to handle RPCs from large number of DataNodes.
    • Configurations for DataNode:
      Parameter Value Notes
      dfs.datanode.data.dir Comma separated list of paths on the local filesystem of a DataNodewhere it should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.
  • conf/yarn-site.xml
    • Configurations for ResourceManager and NodeManager:
      Parameter Value Notes
      yarn.acl.enable true /false Enable ACLs? Defaults to false.
      yarn.admin.acl Admin ACL ACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of * which meansanyone. Special value of just space means no one has access.
      yarn.log-aggregation-enable false Configuration to enable or disable log aggregation
    • Configurations for ResourceManager:
      Parameter Value Notes
      yarn.resourcemanager.address ResourceManager host:port for clients to submit jobs. host:port
      yarn.resourcemanager.scheduler.address ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources. host:port
      yarn.resourcemanager.resource-tracker.address ResourceManager host:port for NodeManagers. host:port
      yarn.resourcemanager.admin.address ResourceManager host:port for administrative commands. host:port
      yarn.resourcemanager.webapp.address ResourceManager web-ui host:port. host:port
      yarn.resourcemanager.scheduler.class ResourceManager Scheduler class. CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler
      yarn.scheduler.minimum-allocation-mb Minimum limit of memory to allocate to each container request at the Resource Manager. In MBs
      yarn.scheduler.maximum-allocation-mb Maximum limit of memory to allocate to each container request at the Resource Manager. In MBs
      yarn.resourcemanager.nodes.include-path /yarn.resourcemanager.nodes.exclude-path List of permitted/excluded NodeManagers. If necessary, use these files to control the list of allowable NodeManagers.
    • Configurations for NodeManager:
      Parameter Value Notes
      yarn.nodemanager.resource.memory-mb Resource i.e. available physical memory, in MB, for given NodeManager Defines total available resources on the NodeManager to be made available to running containers
      yarn.nodemanager.vmem-pmem-ratio Maximum ratio by which virtual memory usage of tasks may exceed physical memory The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
      yarn.nodemanager.local-dirs Comma-separated list of paths on the local filesystem where intermediate data is written. Multiple paths help spread disk i/o.
      yarn.nodemanager.log-dirs Comma-separated list of paths on the local filesystem where logs are written. Multiple paths help spread disk i/o.
      yarn.nodemanager.log.retain-seconds 10800 Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
      yarn.nodemanager.remote-app-log-dir /logs HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
      yarn.nodemanager.remote-app-log-dir-suffix logs Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
      yarn.nodemanager.aux-services mapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications.
    • Configurations for History Server (Needs to be moved elsewhere):
      Parameter Value Notes
      yarn.log-aggregation.retain-seconds -1 How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.
      yarn.log-aggregation.retain-check-interval-seconds -1 Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.
  • conf/mapred-site.xml
    • Configurations for MapReduce Applications:
      Parameter Value Notes
      mapreduce.framework.name yarn Execution framework set to Hadoop YARN.
      mapreduce.map.memory.mb 1536 Larger resource limit for maps.
      mapreduce.map.java.opts -Xmx1024M Larger heap-size for child jvms of maps.
      mapreduce.reduce.memory.mb 3072 Larger resource limit for reduces.
      mapreduce.reduce.java.opts -Xmx2560M Larger heap-size for child jvms of reduces.
      mapreduce.task.io.sort.mb 512 Higher memory-limit while sorting data for efficiency.
      mapreduce.task.io.sort.factor 100 More streams merged at once while sorting files.
      mapreduce.reduce.shuffle.parallelcopies 50 Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
    • Configurations for MapReduce JobHistory Server:
      Parameter Value Notes
      mapreduce.jobhistory.address MapReduce JobHistory Server host:port Default port is 10020.
      mapreduce.jobhistory.webapp.address MapReduce JobHistory Server Web UI host:port Default port is 19888.
      mapreduce.jobhistory.intermediate-done-dir /mr-history/tmp Directory where history files are written by MapReduce jobs.
      mapreduce.jobhistory.done-dir /mr-history/done Directory where history files are managed by the MR JobHistory Server.

  • from : http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值