Hadoop2.5.2完全分布式搭建

该博客主要帮助实现Hadoop完全分布式环境的搭建:(已经安装好Ubuntu前提下,并且保证虚拟机可以相互ping同还有上网情况下)

事先准备:

jdk-7u51-linux-x64.tar

hadoop-2.5.2


第一大步骤:

1创建root密码:sudo passwd root

2新增一个用户:sudo adduser hadoop

3切换到root

su root 

执行

sudoers增加写权限:chmod u+w /etc/sudoers

编译sudoers文件:nano /etc/sudoers   (也可以vi进行编辑,本人比较懒,懒得下载就直接用nano)   
rootALL=(ALL)  ALL下方增加hadoop ALL=(ALL)NOPASSWD:ALL

去掉sudoers文件的写权限:chmod u-w /etc/sudoers

4第一步:sudo nano/etc/hostname改成自己的用户名

  sudo nano /etc/hosts  127.0.1.1注释掉

  并添加上集群所有的iphostname

本人的集群如下:

ip                                      hostname

192.168.218.130          master

192.168.218.131          slaver1

192.168.218.132          slaver2

(这里建议配置奇数个节点就行,以后设置zookeeper方便点)

重启网络sudo/etc/init.d/networking restart

 所有节点执行以上一样的操作!


退出使用hadoop登录

第二大步骤

 

安装jdkhadoop

安装jdk

sudo tar -xzvf  /home/hadoop/jdk-7u51-linux-x64.tar   /usr/lib/jvm/;



tar -xzvf   /home/hadoop/hadoop-2.5.2.tar.gz


安装ssh(最头痛的);

sudo apt-get install openssh-server

若安装失败

 sudo cp/etc/apt/sources.list  /etc/apt/sources.list.bak//做好备份

sudo gedit /etc/apt/sources.list

 

替换为以下形式:

 

debhttp://ubuntu.uestc.edu.cn/ubuntu/ precise main restricted universe multiverse
deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-backports main restricteduniverse multiverse
deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-proposed main restricteduniverse multiverse
deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-security main restricteduniverse multiverse
deb http://ubuntu.uestc.edu.cn/ubuntu/ precise-updates main restricted universemultiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise main restricted universemultiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-backports main restricteduniverse multiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-proposed main restricteduniverse multiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-security main restricteduniverse multiverse
deb-src http://ubuntu.uestc.edu.cn/ubuntu/ precise-updates main restricteduniverse multiverse

 

替换完之后执行sudo apt-getupdate


(遇到的问题:我的ubuntu机器上出现下面这个错误。

Reading package lists... Error!
E: Encountered a section with no Package: header
E: Problem with MergeList/var/lib/apt/lists/ftp.sjtu.edu.cn_ubuntu_dists_precise-security_restricted_binary-i386_Packages
E: The package lists or status file could not beparsed or opened.

虽然不知道是怎么回事,但是google出来的结果提示可以按如下方法解决,记录之:
sudo rm /var/lib/apt/lists/* -vf
sudo apt-get update

继续执行 sudo apt-getinstall ssh

 

之后:

ssh-keygen -t rsa

.ssh/home/hadoop/.ssh

 

cat~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

 

 

插曲:

1)修改文件"authorized_keys" hadoop用户下修改)
chmod 600 ~/.ssh/authorized_keys

 2)设置SSH配置 
root用户登录服务器修改SSH配置文件"/etc/ssh/sshd_config"的下列内容。

RSAAuthentication yes #
启用 RSA认证
PubkeyAuthentication yes #
启用公钥私钥配对认证方式
AuthorizedKeysFile .ssh/authorized_keys #
公钥文件路径(和上面生成的文件同)


设置完之后记得重启SSH服务,才能使刚才设置有效。
service sshd restart

 

进入master.ssh目录(hadoop@master)

scp authorized_keys hadoop@slaver1:~/.ssh/authorized_keys_from_master

scp authorized_keyshadoop@slaver2:~/.ssh/authorized_keys_from_master


进入slaver1.ssh目录( hadoop@slaver1)

scp authorized_keys hadoop@master:~/.ssh/authorized_keys_from_slaver1

scp authorized_keyshadoop@slaver2:~/.ssh/authorized_keys_from_slaver1


进入slaver2.ssh目录( hadoop@slaver2)

scp authorized_keys hadoop@master:~/.ssh/authorized_keys_from_slaver2

scp authorized_keyshadoop@slaver1:~/.ssh/authorized_keys_from_slaver2


之后再进入master

在目录/home/hadoop/.ssh

cat authorized_keys_from_slaver1  >>  authorized_keys

cat authorized_keys_from_slaver2  >>  authorized_keys

进入slaver1

在目录/home/hadoop/.ssh

cat authorized_keys_from_master  >>  authorized_keys

cat authorized_keys_from_slaver2  >>  authorized_keys


进入slaver2

在目录/home/hadoop/.ssh

cat authorized_keys_from_master  >>  authorized_keys

cat authorized_keys_from_slaver1 >>  authorized_keys

第三部其实讲白一点就是直接将各个虚拟机对应的公钥,复制到~/.ssh/authorized_keys

都要启动者,执行ssh虚拟机时才能运行

 

第四大步骤:

设置java环境变量(根据自己安装目录自己设置)

hadoop@master:~$sudo nano /etc/profile

再添加:

exportJAVA_HOME=/usr/lib/jvm/jdk1.7.0_51

exportJRE_HOME=/usr/lib/jvm/jdk1.7.0_51/jre

exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

exportPATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin


hadoop@master:~$sudo nano /etc/environment

再添加:

exportJAVA_HOME=/usr/lib/jvm/jdk1.7.0_51

exportJRE_HOME=/usr/lib/jvm/jdk1.7.0_51/jre

exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

 

scp authorized_keys hadoop@slaver1:~/.ssh/authorized_keys_from_master

scp /usr/lib/jvm/jdk1.7.0_51 hadoop@slaver1:/usr/lib/jvm/jdk1.7.0_51

scp /usr/lib/jvm/jdk1.7.0_51 hadoop@slaver2:/usr/lib/jvm/jdk1.7.0_51


然后在设置相应的环境变量就行了

 

java –version或者javac查看

(sudo ufw disable)关闭防火墙,以后启动集群经常用到的命令

 

第五部分:

 

Hadoop

hadoop@master:~/hadoop-2.5.2$sudo mkdir hdfs

hadoop@master:~/hadoop-2.5.2$sudo mkdir hdfs/name

hadoop@master:~/hadoop-2.5.2$sudo mkdir hdfs/data

hadoop@master:~/hadoop-2.5.2$sudo mkdir tmp

 

修改权限,保证文件都是在hadoop下操作

hadoop@master:~/hadoop-2.5.2$sudo chown -R hadoop:hadoop hdfs

hadoop@master:~/hadoop-2.5.2$sudo chown -R hadoop:hadoop tmp

 

 

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano hadoop-env.sh

修改里面的JAVA_HOME

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano yarn-env.sh

修改里面的JAVA_HOME

 

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano slaves

(这个文件里面保存所有slave节点)

 

hadoop@master:~/hadoop-2.5.2$ nano etc/hadoop/core-site.xml

内容如下:

<configuration>

        <property>

               <name>fs.defaultFS</name>

                <value>hdfs://master:8020</value>

        </property>

        <property>

               <name>io.file.buffer.size</name>

               <value>131072</value>

        </property>

        <property>

               <name>hadoop.tmp.dir</name>

                <value>file:/home/hadoop/hadoop-2.5.2/tmp</value>

                <description>Abase forother temporary  directories.</descripti$

        </property>

        <property>

               <name>hadoop.proxyuser.hadoop.hosts</name>

                <value>*</value>

        </property>

        <property>

               <name>hadoop.proxyuser.hadoop.groups</name>

                <value>*</value>

        </property>

</configuration>

 

编辑mapred-site.xml(需要复制mapred-site.xml.template,并命名为mapred-site.xml

hadoop@master:~/hadoop-2.5.2$ nano etc/hadoop/mapred-site.xml

 

<configuration>

        <property>  

                            <name>mapreduce.framework.name</name>                                                   

                <value>yarn</value>

        </property>

        <property>

               <name>mapreduce.jobhistory.address</name>

               <value>master:10020</value>

        </property>

        <property>

               <name>mapreduce.jobhistory.webapp.address</name>

               <value>master:19888</value>

        </property>

</configuration>

 

 

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano yarn-site.xml

 

<property>

<name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

</property>

<property>                                                                   

 

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

     <value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

     <name>yarn.resourcemanager.address</name>

     <value>master:8032</value>

</property>

<property>

      <name>yarn.resourcemanager.scheduler.address</name>

      <value>master:8030</value>

</property>

<property>

      <name>yarn.resourcemanager.resource-tracker.address</name>

      <value>master:8031</value>

</property>

<property>

      <name>yarn.resourcemanager.admin.address</name>

       <value>master:8033</value>

</property>

<property>

       <name>yarn.resourcemanager.webapp.address</name>

       <value>master:8088</value>

</property>

 

 

hadoop@master:~/hadoop-2.5.2/etc/hadoop$ nano hdfs-site.xml

 

内容如下:

 

<property>

               <name>dfs.namenode.secondary.http-address</name>

               <value>master:9001</value>

        </property>

        <property>

               <name>dfs.namenode.name.dir</name>

                <value>file:/home/hadoop/hadoop-2.5.2/hdfs/name</value>

        </property>

        <property>

               <name>dfs.datanode.data.dir</name>

               <value>file:/home/hadoop/hadoop-2.5.2/hdfs/data</value>

        </property>

        <property>

                <name>dfs.replication</name>

                <value>3</value>

        </property>

        <property>

               <name>dfs.webhdfs.enabled</name>

                <value>true</value>

        </property>

 

 

 

 scp -r /home/hadoop/hadoop-2.5.2 hadoop@slaver1:/home/hadoop
 scp -r /home/hadoop/hadoop-2.5.2 hadoop@slaver2:/home/hadoop


 

 

hadoop@master:~/hadoop-2.5.2$bin/hdfs namenode –format或者bin/hadoop namenode format

[hadoop@masterhadoop-2.4.0]$ sbin/start-all.sh

通过查找/home/hadoop/hadoop-2.5.2/bin或者/home/hadoop/hadoop-2.5.2/sbin下的文件可以执行各种命令

PS:安装有一段时间了,突然间想写一下博客,然后就匆匆花了两个钟写了,有什么错漏的敬请纠正!基本上我是按照这个配置的,集群正常启动!

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值