机器准备
假设现在有五台机器192.168.8.10
192.168.8.11
192.168.8.12
192.168.8.13
192.168.8.14
环境配置
用户创建在所有机器上输入下列命令
useradd -d /home/hadoop -s /bin/bash -m hadoop #创建新用户hadoop
passwd hadoop #设置hadoop用户密码
准备生成密钥
在namenode机器上,在hadoop用户下执行以下命令
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
修改/etc/hosts
在hosts文件中加入下列几行
192.168.8.10 hadoop-master
192.168.8.11 hadoop1
192.168.8.12 hadoop2
192.168.8.13 hadoop3
192.168.8.14 hadoop4
将authorized_keys文件从master传到其余的机器上
scp ~/.ssh/authorized_keys hadoop@192.168.8.1:/home/hadoop/.ssh/authorized_keys
scp ~/.ssh/authorized_keys hadoop@192.168.8.2:/home/hadoop/.ssh/authorized_keys
scp ~/.ssh/authorized_keys hadoop@192.168.8.3:/home/hadoop/.ssh/authorized_keys
scp ~/.ssh/authorized_keys hadoop@192.168.8.4:/home/hadoop/.ssh/authorized_keys
使用同样的方法将hosts文件传到其余机器上的相应位置
hadoop安装
下载并解压缩hadoop 2.2wget http://apache.mirrors.lucidnetworks.net/hadoop/common/stable2/hadoop-2.2.0.tar.gz
tar -xvf hadoop-2.2.0.tar.gz
mv hadoop-2.2.0.tar.gz /home/hadoop/
配置hadoop环境变量
切换到root,编辑/etc/profile,在最后加入下列内容
export HADOOP_HOME=/home/hadoop/hadoop-2.2.0
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/lib
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
配置hadoop
进入$HADOOP_HOME/etc/hadoop
编辑slaves
加入下列内容
hadoop1
hadoop2
hadoop3
hadoop4
编辑hadoop-env.sh
加入下列一行
export JAVA_HOME=/usr/share/jdk1.7.0_51 #否则后面可能会提示找不到JAVA_HOME
编辑core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopp-2.2.0/mytmp</value>
<description>A base for other temporarydirectories.</description>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>hadoop-master</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
编辑hdfs-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoopp-2.2.0/mytmp</value>
<description>A base for other temporarydirectories.</description>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>hadoop-master</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
编辑mapred-site.xml.template
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-master:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
</configuration>
编辑yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-master:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-master:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-master:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-master:18041</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-master:8088</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/mynode/my</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/home/hadoop/mynode/logs</value>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
将core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml使用scp从namenode传送到各个datanode上的相应位置。
至此hadoop已配置完毕。
使用hadoop
启动hadoophadoop namenode -format #格式化namenode
$HADOOP_HOME/sbin/start-all.sh #启动hadoop
在namenode上输入jps,应能看到下列内容
27825 ResourceManager
28080 Jps
27667 SecondaryNameNode
27406 NameNode
在一个datanode上输入jps,能看到下列内容
6079 NodeManager
5908 DataNode
6388 Jps
停止hadoop可使用如下命令
$HADOOP_HOME/sbin/stop-all.sh