Hadoop2.6.2完全分布式环境搭建
0、环境介绍
0.1、硬件环境
CPU:Intel(R)Core(TM) i5-4590 CPU @ 3.30GHz 4核 64位
RAM:8GB
考虑到内存与CPU的性能尚可,采用虚拟机来部署datanode。
0.2、软件环境
主机 (ubuntu12.04:192.168.100.5)
虚拟机1(ubuntu12.04:192.168.100.6)
虚拟机2(ubuntu12.04:192.168.100.7)
namenode部署于主机,datanode1,datanode部署于两个虚拟机中。
1、host信息配置
vi /etc/hostname(分别给每一台主机指定主机名)
vi /etc/hosts(分别给每一台主机指定主机名到IP地址的映射)
三台机器的hostname分别为:namenode; datanode1; datanode2;
三台机器的/etc/hosts文件配置一样的信息:
127.0.0.1 localhost
192.168.100.5 namenode
192.168.100.6 datanode1
192.168.100.7 datanode2
2、新建用户和组
addgroup hadoop(分别给每一台主机创建组)
adduser --ingroup hadoop hadoop(分别给每一台主机创建用户)
修改/etc/sudoers文件,给hadoop用户赋予root用户的权限在
”root ALL=(ALL:ALL) ALL”行之后加入一行:
”hadoop ALL=(ALL:ALL) ALL”
3、在三台机器上安装openssh
sudoapt-get install openssh-server openssh-client
4、配置SSH
在namenode上面执行
ssh-keygen -t rsa(密码为空,路径默认)
该命令会在用户主目录下创建.ssh目录,并在其中创建两个文件:id_rsa,id_rsa.pub. id_rsa 是私钥文件,是基于RSA算法创建,该私钥文件要妥善保管,不要泄漏。id_rsa.pub公钥文件,和id_rsa文件是一对儿,该文件作为公钥文件,可以公开。
cp .ssh/id_rsa.pub .ssh/authorized_keys
把公钥追加到其他主机的authorized_keys文件中
hadoop@namenode:~$scp ~/.ssh/id_rsa.pub hadoop@datanode1:~/.ssh/authorized_keys
hadoop@namenode:~$scp ~/.ssh/id_rsa.pub hadoop@datanode2:~/.ssh/authorized_keys
可以在namenode上面通过ssh无密码登陆datanode1和datanode2
hadoop@namenode:~$ssh hadoop@datanode1
hadoop@namenode:~$ssh hadoop@datanode2
5、运行环境配置
在三台机器上安装JDK,JAVA_HOME为/usr/lib/jvm/java,三台机器保持一致,过程略。
6、hadoop相关
在namenode上操作:
6.1下载hadoop压缩包:hadoop-2.6.2.tar.gz
6.2解压:
hadoop@namenode:~$tar -zxvf hadoop-2.6.2.tar.gz -C /home/hadoop/
hadoop被解压到/home/hadoop/hadoop-2.6.2
6.3配置环境变量
hadoop@namenode:~$gedit /etc/profile
追加内容如下:
#HADOOPVARIABLES START
export HADOOP_HOME=/home/hadoop/hadoop-2.6.2
export HADOOP_INSTALL=/home/hadoop/hadoop-2.6.2
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
使配置生效
hadoop@namenode:~$source /etc/profile
6.4修改$HADOOP_HOME/etc/hadoop/core-site.xml
添加如下内容:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://namenode:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/hadoop-2.6.2/data</value>
</property>
</configuration>
6.5修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
添加如下内容:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
</configuration>
6.6修改$HADOOP_HOME/etc/hadoop/mapred-site.xml
默认没有mapred-site.xml文件,copymapred-site.xml.template一份为mapred-site.xml即可
cpetc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml
添加如下内容:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
</configuration>
6.7配置$HADOOP_HOME/etc/hadoop/hdfs-site.xml
暂时不配置
<configuration>
</configuration>
6.8配置salves
告诉hadoop其他从节点,这样,只要主节点启动,他会自动启动其他机器上的dataNode
编辑$HADOOP_HOME/etc/hadoop/slaves
内容如下:
datanode1
datanode2
7、拷贝hadoop文件夹到其他两个从主机
hadoop@namenode:~$scp -r /home/hadoop/hadoop-2.6.2 hadoop@datanode1:/home/hadoop/hadoop-2.6.2
hadoop@namenode:~$scp -r /home/hadoop/hadoop-2.6.2 hadoop@datanode2:/home/hadoop/hadoop-2.6.2
8、格式化hdfs
hadoop@namenode:~/hadoop-2.6.2/bin$./hdfs namenode -format
9、启动hadoop
hadoop@namenode:~/hadoop-2.6.2/sbin$./start-all.sh
10、验证
10.1dfs上创建input目录
hadoop@namenode:~/hadoop-2.6.2$bin/hadoop fs -mkdir -p input
10.2把hadoop目录下的README.txt拷贝到dfs新建的input里
hadoop@namenode:~/hadoop-2.6.2$bin/hadoop fs -copyFromLocal README.txt input
10.3运行WordCount
hadoop@namenode:~/hadoop-2.6.2$bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.6.2-sources.jar org.apache.hadoop.examples.WordCount input output
410.运行完毕后,查看单词统计结果
hadoop@namenode:~/hadoop-2.6.2$bin/hadoop fs -cat output/*