hadoop2.x整合手册【1】--hadoop2.x安装与配置

前置工作

  1. 此处阅读需要曾经有1.x版本的安装部署经验
  2. ssh无密码配置如同1.X版本和防火墙需要关闭,而需要的java环境也要准备好
  3. 将hadoop集群进行网络隔离,服务请求以及数据请求不要直接访问hadoop集群

安装步骤

一、核心配置文件以及配置(此处配置来源于官网)

  • conf/core-site.xml
    ParameterValueNotes
    fs.defaultFSNameNode URIhdfs://host:port/
    io.file.buffer.size131072Size of read/write buffer used in SequenceFiles.
  • conf/hdfs-site.xml
    • Configurations for NameNode:
      ParameterValueNotes
      dfs.namenode.name.dirPath on the local filesystem where the NameNode stores the namespace and transactions logs persistently.If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
      dfs.namenode.hosts / dfs.namenode.hosts.excludeList of permitted/excluded DataNodes.If necessary, use these files to control the list of allowable datanodes.
      dfs.blocksize268435456HDFS blocksize of 256MB for large file-systems.
      dfs.namenode.handler.count100More NameNode server threads to handle RPCs from large number of DataNodes.
    • Configurations for DataNode:
      ParameterValueNotes
      dfs.datanode.data.dirComma separated list of paths on the local filesystem of a DataNode where it should store its blocks.If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices.
  • conf/yarn-site.xml
    • Configurations for ResourceManager and NodeManager:
      ParameterValueNotes
      yarn.acl.enabletrue / falseEnable ACLs? Defaults to false.
      yarn.admin.aclAdmin ACLACL to set admins on the cluster. ACLs are of for comma-separated-usersspacecomma-separated-groups. Defaults to special value of* which meansanyone. Special value of justspace means no one has access.
      yarn.log-aggregation-enablefalseConfiguration to enable or disable log aggregation
    • Configurations for ResourceManager:
      ParameterValueNotes
      yarn.resourcemanager.addressResourceManager host:port for clients to submit jobs.host:port
      yarn.resourcemanager.scheduler.addressResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources.host:port
      yarn.resourcemanager.resource-tracker.addressResourceManager host:port for NodeManagers.host:port
      yarn.resourcemanager.admin.addressResourceManager host:port for administrative commands.host:port
      yarn.resourcemanager.webapp.addressResourceManager web-ui host:port.host:port
      yarn.resourcemanager.scheduler.classResourceManager Scheduler class.CapacityScheduler (recommended), FairScheduler (also recommended), orFifoScheduler
      yarn.scheduler.minimum-allocation-mbMinimum limit of memory to allocate to each container request at theResource Manager.In MBs
      yarn.scheduler.maximum-allocation-mbMaximum limit of memory to allocate to each container request at theResource Manager.In MBs
      yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-pathList of permitted/excluded NodeManagers.If necessary, use these files to control the list of allowable NodeManagers.
    • Configurations for NodeManager:
      ParameterValueNotes
      yarn.nodemanager.resource.memory-mbResource i.e. available physical memory, in MB, for given NodeManagerDefines total available resources on the NodeManager to be made available to running containers
      yarn.nodemanager.vmem-pmem-ratioMaximum ratio by which virtual memory usage of tasks may exceed physical memoryThe virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
      yarn.nodemanager.local-dirsComma-separated list of paths on the local filesystem where intermediate data is written.Multiple paths help spread disk i/o.
      yarn.nodemanager.log-dirsComma-separated list of paths on the local filesystem where logs are written.Multiple paths help spread disk i/o.
      yarn.nodemanager.log.retain-seconds10800Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
      yarn.nodemanager.remote-app-log-dir/logsHDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
      yarn.nodemanager.remote-app-log-dir-suffixlogsSuffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
      yarn.nodemanager.aux-servicesmapreduce_shuffleShuffle service that needs to be set for Map Reduce applications.
    • Configurations for History Server (Needs to be moved elsewhere):
      ParameterValueNotes
      yarn.log-aggregation.retain-seconds-1How long to keep aggregation logs before deleting them. -1 disables. Be careful, set this too small and you will spam the name node.
      yarn.log-aggregation.retain-check-interval-seconds-1Time between checks for aggregated log retention. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful, set this too small and you will spam the name node.
  • conf/mapred-site.xml
    • Configurations for MapReduce Applications:
      ParameterValueNotes
      mapreduce.framework.nameyarnExecution framework set to Hadoop YARN.
      mapreduce.map.memory.mb1536Larger resource limit for maps.
      mapreduce.map.java.opts-Xmx1024MLarger heap-size for child jvms of maps.
      mapreduce.reduce.memory.mb3072Larger resource limit for reduces.
      mapreduce.reduce.java.opts-Xmx2560MLarger heap-size for child jvms of reduces.
      mapreduce.task.io.sort.mb512Higher memory-limit while sorting data for efficiency.
      mapreduce.task.io.sort.factor100More streams merged at once while sorting files.
      mapreduce.reduce.shuffle.parallelcopies50Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.
    • Configurations for MapReduce JobHistory Server:
      ParameterValueNotes
      mapreduce.jobhistory.addressMapReduce JobHistory Server host:portDefault port is 10020.
      mapreduce.jobhistory.webapp.addressMapReduce JobHistory Server Web UI host:portDefault port is 19888.
      mapreduce.jobhistory.intermediate-done-dir/mr-history/tmpDirectory where history files are written by MapReduce jobs.
      mapreduce.jobhistory.done-dir/mr-history/doneDirectory where history files are managed by the MR JobHistory Server.

  • slave和master文件也是必不可少的,但是为了便于系统运维不要使用ip,而应该使用机器名。但是如果局域网内的ip是动态的,机器名无法正常解析,最后将各个节点的host也进行对应配置
二、系统操作命令:

启动命令:

格式化DFS文件系统

$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>

启动dfs实例

$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode

启动所有slave上的DataNode实例

$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start datanode

运行yarn实例,启动ResourceManager:

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager

启动所有slave上的NodeManagers

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start nodemanager

启动WebAppProxy server

$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh start proxyserver --config $HADOOP_CONF_DIR

启动MapReduce JobHistory

$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh start historyserver --config $HADOOP_CONF_DIR
停止服务与启动命令对应:
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop nodemanager
$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_DIR
$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
三、运行监控:

hdfs状态监控:http://master的地址:50070

任务监控:http://master的地址:19888

程序和节点监控:http://master的地址:8034/cluster

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值