简介
hadoop入门之搭建hadoop集群环境,想要学习hadoop的朋友可以先从搭建hadoop集群开始慢慢理解和使用hadoop。
一.环境
- 机器
192.168.1.21(namenode,sesondarynode)
192.168.1.22(data)
192.168.1.23(data) - 软件
hadoop-2.6.4-tar.gz
jdk1.7.tar.gz(建议使用tar.gz包,使用rpm安装,总是不知道安装到哪了)
二.配置hosts文件
hadoop 必须要用hostname节点互相访问,写ip是不行的
vi /etc/hosts
#在hosts文件中增加下面内容
192.168.1.21 node21
192.168.1.22 node22
192.168.1.23 node23
三.配置ssh无密码访问
- 配置
在192.168.1.21上面输入一下命令
ssh-keygen -t rsa(回车到结束)
ssh-copy-id 0.0.0.0(根据提示输入密码)
ssh-copy-id 192.168.1.22(根据提示输入密码)
ssh-copy-id 192.168.1.23(根据提示输入密码)
- 测试
在192.168.1.21上面输入ssh连接两台机器是否需要密码
四.配置JDK、HADOOP及环境变量
- JDK
mkdir -p /opt/java
tar -xzvf jdk1.7.tar.gz -C /opt/java
ln -s /opt/java/jdk1.7XXXXX /opt/java/jdk
- HADOOP
将hadoop文件拷贝并解压到你的位置,我的位置/home/hadoop - 配置环境变量
export JAVA_HOME=/opt/java/jdk
export PATH=$PATH:$JAVA_HOME/bin
export HADOOP_HOME=/home/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$PATH:HADOOP_HOME/sbin
五.HADOOP配置文件
hadoop/etc/hadoop/文件夹下面的几个文件增加下面配置
- core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://node21:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/tmp</value>
</property>
</configuration>
- hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>file:/home/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node21:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>node21:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>node21:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>node21:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>node21:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>node21:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
- mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>node21:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node21:19888</value>
</property>
</configuration>
- slaves
node22
node23
- hadoop-env.sh中配置JAVA_HOME=/opt/java/jdk
六.拷贝其它几台机器
- 拷贝jdk、profile、hadoop、hostname
scp /etc/profile 192.168.1.22:/etc
scp /etc/profile 192.168.1.23:/etc
scp -r /opt/java 192.168.1.22:/opt
scp -r /opt/java 192.168.1.23:/opt
scp -r /home/hadoop 192.168.1.22:/home
scp -r /home/hadoop 192.168.1.23:/home
scp /etc/hosts 192.168.1.22:/etc
scp /etc/hosts 192.168.1.23:/etc
- 分别在每台机器上面运行source /etc/profile
- 测试
输入java测试jdk
输入hdfs测试hadoop
七.重启下所有机器使hostname生效
八.master上启动hadoop
- 格式化hdfs
hdfs namenode -format
- 启动hdfs
/home/hadoop/sbin/start-dfs.sh
- 启动yarn
/home/hadoop/sbin/start-yarn.sh
查看启动的服务
yarn:http://192.168.1.21:8088
hdfs:http://192.168.1.21:50070