一、环境说明
1、虚拟机平台:VMware11
2、Linux版本:Ubuntu 14.10
3、JDK:jdk1.8.0_25
4、Hadoop版本:2.6.0
5、集群节点:3个,分别是Master,Slave1,Slave2
注明:文中longke173是虚拟机用户名。
二、准备工作
1、安装虚拟机平台,并新建一个Ubuntu虚拟机,记为Master。
2、在Master上安装JDK。
安装步骤如下:
先去 Oracle下载Linux下的JDK压缩包到桌面,http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html。
下载好后解压:
longke173@Master:~$ cd Desktop/
longke173@Master:~/Desktop$sudo mkdir /usr/lib/jvm
longke173@Master:~/Desktop$tar zvxf jdk-8u25-linux-x64.tar.gz
longke173@Master:~/Desktop$sudo mv jdk1.8.0_25/ /usr/lib/jvm/
设置JDK环境变量:
修改/etc/profile文件,在里面添加以下内容
#set jdk env
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_25
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
#保存退出,输入以下命令使之立即生效。
longke173@Master:~$source ~/.profile
验证JDK:
longke173@Master:~$ java -version
java version "1.8.0_25"
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
如果之前系统里面已经安装了openjdk,按照下面的步骤更新下即可:
#将自己安装的JDK设置为默认JDK版本
longke173@Master:~$sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.8.0_25/bin/java 300
update-alternatives: using /usr/lib/jvm/jdk1.8.0_25/bin/java to provide /usr/bin/java (java) in auto mode
longke173@Master:~$ sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.8.0_25/bin/javac 300
update-alternatives: using /usr/lib/jvm/jdk1.8.0_25/bin/javac to provide /usr/bin/javac (javac) in auto mode
longke173@Master:~$sudo update-alternatives --config java
longke173@Master:~$sudo update-alternatives --config javac
3、安装SSH
可以先检查是否已经有ssh,也可以跳过这步,直接安装。
sudo ps -ef | grep ssh
如果只有 ssh-agent 就需要安装openssh-server了。
sudo apt-get install ssh openssh-server
4、配置SSH公钥
ssh-keygen -t rsa -P "yourPassWord"
5、建立IP和hadoop节点的映射关系
修改etc/hosts文件,在上面加入hadoop节点IP和名称的映射关系。
sudo gedit /etc/hosts
在其中加上(ip是节点可配置的ip,自己设定):
192.168.47.*** Master
192.168.47.*** Slave1
192.168.47.*** Slave2
6、克隆两个Ubuntu虚拟机,作为hadoop的其他两个节点
7、分别修改三个Ubuntu虚拟机的名称
sudo gedit /etc/hostname
三、配置SSH无密码登陆。
其目的是让Master免密码登陆Slave1和Salve2,步骤如下:
1、在Master上创建授权文件authorized_keys
进入~/.ssh/目录下执行”ls –a”,发现开始是没有authorized_keys文件的,可以使用以下两种方法来生成:
(1) 将id_rsa.pub追加到authorized_keys授权文件中;
$ cat id_rsa.pub >> authorized_keys
(2) 复制id_rsa.pub 为 authorized_keys
$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
完成后就可以无密码登录本机了,可以尝试登陆localhost
$ ssh localhost
2、将授权文件拷贝到Slave1、Slave2相同文件夹下
$ scp authorized_keys longke173@Slave1:~/.ssh/authorized_keys
$ scp authorized_keys longke173@Slave2:~/.ssh/authorized_keys
拷贝过程需要密码,拷贝之后就可以免密码登陆hadoopSlave1和hadoopSlave2了。
四、安装Hadoop2.6.0,搭建集群
1、下载解压hadoop
首先到Apache官网上下载hadoop2.6.0的压缩文件,将其解压到/opt/,将解压出的文件夹改名为hadoop。
2、配置Hadoop环境变量
vi /etc/profile
添加如下内容:
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
export HADOOP_HOME HADOOP_CONF_DIR PATH
使环境变量生效
$source /etc/profile
配置之前,先在本地文件系统创建以下文件夹:~/hadoop/tmp、~/dfs/data、~/dfs/name、~/dfs/namesecondary
(1) 配置文件1:hadoop-env.sh
修改JAVA_HOME值(export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_25)
(2) 配置文件2:yarn-env.sh
修改JAVA_HOME值(export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_25)
(3) 配置文件3:slaves
Salve1
Salve2
(4) 配置文件4:core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Master:9000</value>
<description>NameNode URI.</description>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
<description>Size of read/write buffer used inSequenceFiles.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.longke173.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.longke173.groups</name>
<value>*</value>
</property>
</configuration>
(5) 配置文件5:hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:9001</value>
<description>The secondary namenode http server address andport.</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/dfs/name</value>
<description>Path on the local filesystem where the NameNodestores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop/dfs/data</value>
<description>Comma separated list of paths on the local filesystemof a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///opt/hadoop/dfs/namesecondary</value>
<description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
</property>
</configuration>
(6) 配置文件6:mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
<description>MapReduce JobHistoryServer IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Master:19888</value>
<description>MapReduce JobHistoryServer Web UI host:port</description>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>Master:50030</value>
<description>MapReduce JobHistoryServer Web UI host:port</description>
</property>
</configuration>
(7) 配置文件7:yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
<description>Shuffle service that needs to be set for Map Reduceapplications.</description>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>Master:8032</value>
<description>The hostname of theRM.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>Master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>Master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Master:8088</value>
</property>
</configuration>
2、将hadoop文件夹拷贝到Slave1和Slave2上。
scp –r/opt/hadoop longke173@Slave1:~/
scp –r/opt/hadoop longke173@Slave2:~/
五、验证与运行
所有的组件启动和停止服务都在/hadoop/sbin目录下,一般启动hadoop前会格式化namenode。具体命令参考如下:
进入安装目录: cd ~/hadoop/
格式化namenode:./bin/hdfs namenode –format
启动hadoop: ./sbin/start-all.sh
此时在Master上面运行的进程有:namenode、namenodesecondary、resourcemanager
Slave1和Slave2上面运行的进程有:datanode
查看集群状态:./bin/hdfs dfsadmin –report
查看文件块组成: ./bin/hdfsfsck/ -files -blocks
查看HDFS: http://Master:50070
查看RM: http:// Master:8088