环境:centos6.4 64位
1、禁用selinux(所有节点)、关闭防火墙(所有节点必须做)
禁用selinux:
$vim /etc/sysconfig/selinux
将其中属性SELINUX修改为:SELINUX=disabled
关闭防火墙:
$service iptables stop #暂时关闭防火墙,重启后失效
$chkconfig iptables off #设置系统启动时关闭防火墙,即永久关闭防火墙
2、修改hostname(所有节点必须做)
$vim /etc/sysconfig/network #本文件只能重启后生效 $hostname master #修改本次的主机名,无需重启 将属性HOSTNAME修改为需要的主机名HOSTNAME=master
3、修改/etc/hosts(本文件需要分发,保持所有节点一致)
$vim /etc/hosts 文件格式为每个ip地址一行【IP地址 空格或者tab 主机名】如: 192.168.1.1 master 192.168.1.2 slave1 192.168.1.3 slave2
4、ssh(所有节点必须做)
$vim /etc/ssh/sshd_config #修改 将#AuthorizedKeysFile .ssh/authorized_keys一行的#注释去掉
$service sshd restart #重启ssh服务
$ssh-keygen #生成ssh所需的公钥和私钥对,若之前未运行过此命令,连续多次回车即可;若运行过则如下:
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): #回车
/root/.ssh/id_rsa already exists.#若之前运行过此命令则会出现词句
Overwrite (y/n)?#输入y,回车
Enter passphrase (empty for no passphrase): #回车
Enter same passphrase again: #回车
结果如:
The key's randomart image is:
+--[ RSA 2048]----+
| o. |
| . + . |
| E * * |
| . B = + |
| S = O . |
| . * B |
| * |
| |
| |
+-----------------+
$cd ~/.ssh/
$ll
总用量 16
-rw-r--r-- 1 root root 395 4月 27 05:36 authorized_keys #免密码登陆的公钥文件集合
-rw------- 1 root root 1675 4月 28 20:43 id_rsa #私钥
-rw-r--r-- 1 root root 395 4月 28 20:43 id_rsa.pub #公钥
-rw-r--r-- 1 root root 1579 4月 27 05:37 known_hosts #已知机器的指纹;有时重装机器后指纹改变了,导致ssh时报错,需到这里把原先的指纹删掉!!!
$cat id_rsa >> authorized_keys
#将其他机器的id_rsa.pub用scp命令传到本节点,然后追加到authorized_keys文件,如:
#登陆到slave1机器
运行ssh-keygen,生成公钥私钥对后
$scp ~/.ssh/id_rsa.pub root@master:~/.ssh/id_rsa.slave1.pub
回到master主机:
$cat ~/.ssh/id_rsa.slave1.pub ~/.ssh/authorized_keys
其它所有机器重复执行此动作,将所有机器的id_rsa.pub都追加到master的authorized_keys文件。
5、解压Hadoop
$tar -xzvf hadoop-2.2.0.tar.gz
$cd hadoop-2.2.0/etc/hadoop #进入Hadoop的配置文件目录
6、修改hadoop-env.sh(本文件需要分发,保持所有节点一致)
$vim hadoop-env.sh
修改:export JAVA_HOME=${JAVA_HOME}指向实际的jdk目录
如:export JAVA_HOME=/usr/java/jdk1.6.0_31
7、修改core-site.xml(本文件需要分发,保持所有节点一致)
<property> <name>fs.defaultFS</name> <value>hdfs://master:8020/</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem. </description> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> <description>The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations. </description> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoopneed/</value> <description>A base for other temporary directories.</description> </property>
8、 修改hdfs-site.xml(本文件需要分发,保持所有节点一致)
<property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.tmp.dir}/dfs/name</value> <description>Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. </description> </property> <property> <name>dfs.namenode.hosts</name> <value></value> <description>If necessary, use these files to control the list of allowable datanodes. </description> </property> <property> <name>dfs.namenode.hosts.exclude</name> <value></value> <description>If necessary, use these files to control the list of allowable datanodes. </description> </property> <property> <name>dfs.namenode.handler.count</name> <value>100</value> <description>The number of server threads for the namenode. </description> </property> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.tmp.dir}/dfs/data</value> <description>Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored. </description> </property>
9、修改yarn-site.xml(本文件需要分发,保持所有节点一致)
<!-- Site specific YARN configuration properties --> <property> <description>Are acls enabled.</description> <name>yarn.acl.enable</name> <value>false</value> </property> <property> <description>ACL of who can be admin of the YARN cluster. </description> <name>yarn.admin.acl</name> <value>*</value> </property> <property> <description>Whether to enable log aggregation</description> <name>yarn.log-aggregation-enable</name> <value>false</value> </property> <property> <description>The hostname of the RM.</description> <name>yarn.resourcemanager.hostname</name> <value>0.0.0.0</value> </property> <property> <description>The address of the applications manager interface in the RM.</description> <name>yarn.resourcemanager.address</name> <value>${yarn.resourcemanager.hostname}:8032</value> </property> <property> <description>The address of the scheduler interface.</description> <name>yarn.resourcemanager.scheduler.address</name> <value>${yarn.resourcemanager.hostname}:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>${yarn.resourcemanager.hostname}:8031</value> </property> <property> <description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>${yarn.resourcemanager.hostname}:8033</value> </property> <property> <description>The http address of the RM web application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <property> <description>The class to use as the resource scheduler.</description> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler </value> </property> <property> <description>The minimum allocation for every container request at the RM, in MBs. Memory requests lower than this won't take effect, and the specified value will get allocated at minimum. </description> <name>yarn.scheduler.minimum-allocation-mb</name> <value>1024</value> </property> <property> <description>The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value. </description> <name>yarn.scheduler.maximum-allocation-mb</name> <value>8192</value> </property> <property> <description>Path to file with nodes to include.</description> <name>yarn.resourcemanager.nodes.include-path</name> <value></value> </property> <property> <description>Path to file with nodes to exclude.</description> <name>yarn.resourcemanager.nodes.exclude-path</name> <value></value> </property> <property> <description>Amount of physical memory, in MB, that can be allocated for containers. </description> <name>yarn.nodemanager.resource.memory-mb</name> <value>8192</value> </property> <property> <description>Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio. </description> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property> <property> <description>List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work directories, called container_${contid}, will be subdirectories of this. </description> <name>yarn.nodemanager.local-dirs</name> <value>${hadoop.tmp.dir}/nm-local-dir</value> </property> <property> <description> Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container. </description> <name>yarn.nodemanager.log-dirs</name> <value>${yarn.log.dir}/userlogs</value> </property> <property> <description>Time in seconds to retain user logs. Only applicable if log aggregation is disabled </description> <name>yarn.nodemanager.log.retain-seconds</name> <value>10800</value> </property> <property> <description>Where to aggregate logs to.</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/tmp/logs</value> </property> <property> <description>The remote log dir will be created at {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam} </description> <name>yarn.nodemanager.remote-app-log-dir-suffix</name> <value>logs</value> </property> <property> <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <!--<value>mapreduce_shuffle</value> --> </property> <property> <description>How long to keep aggregation logs before deleting them. -1 disables. Be careful set this too small and you will spam the name node. </description> <name>yarn.log-aggregation.retain-seconds</name> <value>-1</value> </property> <property> <description>How long to wait between aggregated log retention checks. If set to 0 or a negative value then the value is computed as one-tenth of the aggregated log retention time. Be careful set this too small and you will spam the name node. </description> <name>yarn.log-aggregation.retain-check-interval-seconds</name> <value>-1</value> </property>
10、修改mapred-site.xml(本文件需要分发,保持所有节点一致)
<property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn. </description> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1536</value> <description>Larger resource limit for maps.</description> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1024M</value> <description>Larger heap-size for child jvms of maps.</description> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>3072</value> <description>Larger resource limit for reduces.</description> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2560M</value> <description>Larger heap-size for child jvms of reduces.</description> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>100</value> <description>The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks. </description> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>10</value> <description>The number of streams to merge at once while sorting files. This determines the number of open file handles. </description> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>5</value> <description>The default number of parallel transfers run by reduce during the copy(shuffle) phase. </description> </property> <!-- jobhistory properties --> <property> <name>mapreduce.jobhistory.address</name> <value>0.0.0.0:10020</value> <description>MapReduce JobHistory Server IPC host:port</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>0.0.0.0:19888</value> <description>MapReduce JobHistory Server Web UI host:port </description> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate </value> <description></description> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>${yarn.app.mapreduce.am.staging-dir}/history/done</value> <description></description> </property>
11、启动hdfs
12、启动yarn