Hadoop 2.2.0 安装 (4台CentOs 虚拟机)

Hadoop 2.2.0安装

一、Master节点

1.1 前期准备

1)        /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=master

GATEWAY=192.168.178.254

2)        /etc/hosts

127.0.0.1   localhostlocalhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6localhost6.localdomain6

192.168.178.181 master

192.168.178.182 slave1

192.168.178.183 slave2

192.168.178.184 slave3

3)        安装vsftpd(需要联网)

1)        yum -y install vsftpd

2)        touch /var/log/vsftpd.log

3)        chkconfig --list |grep vsftpd

4)        chkconfig vsftpd on

5)        vim /etc/vsftpd/vsftpd.conf    修改见虾皮hadoop 安装

6)        getsebool -a | grep ftp        配置防火墙

7)        setsebool -P ftp_home_dir 1

8)        setsebool -Pallow_ftpd_full_access 1

9)        servicevsftpd restart

4)        关闭iptables(防火墙)

输入:chkconfig iptables off

1.2 无密钥登陆

1)        SSH无密钥设置

a)        (hadoop用户下)ssh-keygen -t rsa -P ''

b)        cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys

c)        chmod 600~/.ssh/authorized_keys

d)        vim /etc/ssh/sshd_config

RSAAuthentication yes

PubkeyAuthentication yes

AuthorizedKeysFile     .ssh/authorized_keys

e)        service sshd restart

f)         验证 ssh localhost

2)        配置master无密钥登陆slave

a)        scp ~/.ssh/id_rsa.pubhadoop@slave1:~/

b)        (hadoop@slave)  cat ~/id_rsa.pub >>~/.ssh/authorized_keys

c)        chmod 600~/.ssh/authorized_keys

d)        vim /etc/ssh/sshd_config

RSAAuthentication yes

PubkeyAuthentication yes

AuthorizedKeysFile     .ssh/authorized_keys

e)        service sshd restart

3)        配置slave无密钥登陆master

a)        (hadoop@slave)  ssh-keygen -t rsa -P ''

b)        scp ~/.ssh/id_rsa.pub hadoop@master:~/

c)        (hadoop@master)  cat ~/id_rsa.pub >>~/.ssh/authorized_keys

1.3 JDK安装

1)        卸载之前的JDK: 

a)        rpm-qa | grep java

b)        yum -y remove javajava-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64

2)        安装JDK(rpm格式)

a)        rpm -ivh jdk-7u51-linux-x64.rpm

b)        vim /etc/profile

#set java environment

export JAVA_HOME=/usr/java/jdk1.7.0_51

export PATH=$PATH:$JAVA_HOME/bin

export CLASSPATH=$JAVA_HOME/lib/*.jar:$JAVA_HOME/jre/lib/*.jar

 

# set hadoop path

export HADOOP_HOME=/usr/hadoop

export PAHT=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_LOG_DIR=/usr/hadoop/logs

export YARN_LOG_DIR=$HADOOP_LOG_DIR

1.4 安装hadoop 2.2.0

1)        mv hadoop-2.2.0.tar.gz /usr/

2)        cd /usr

3)        tar -zxvf hadoop-2.2.0.tar.gz

4)        mv hadoop-2.2.0 hadoop

5)        chown -R hadoop:hadoop hadoop

6)        rm -rf hadoop-2.2.0.tar.gz

7)        mkdir /usr/hadoop/tmp

8)        mkdir /usr/hadoop/dfs

9)        mkdir /usr/hadoop/dfs/name

10)    vim /etc/profile  配置hadoop路径(见1.3->2)->b) )

11)    配置文件见: http://download.csdn.net/detail/zzzzzqf/7019251

1.5 hadoop时间同步

1)        服务器端

a)        安装ntp

b)        yum -y install ntp

2)        需要被同步的slave

a)        (root) /usr/sbin/ntpdate master(手动)

b)        (自动) # vi /var/spool/cron/root

0 1 * * * /usr/sbin/ntpdate master

详见:http://cyr520.blog.51cto.com/714067/746905

二、Slave节点

2.1 前期准备

1)        /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=slave1

2)        /etc/hosts

127.0.0.1   localhostlocalhost.localdomain localhost4 localhost4.localdomain4

::1         localhost localhost.localdomain localhost6localhost6.localdomain6

192.168.178.181 master

192.168.178.182 slave1

192.168.178.183 slave2

192.168.178.184 slave3

三、Hadoop配置

         首先在<hadoop目录>下:

         mkdirtemp    temp名字要和core-site.xml中的hadoop.tmp.dir属性值的名字一样

mkdir -p dfs/name

mkdir -p dfs/data

一共要修改7个配置文件:slaves,yarn-env.sh,hadoop-env.sh,yarn-site.xml,mapred-site.xml,hdfs-site.xml,core-site.xml。

3.1 slaves

slave1

slave2

slave3

3.2 yarn-env.sh

最后添加:

# set java environment

export JAVA_HOME=/usr/java/jdk1.7.0_51

3.3 hadoop-env.sh

最后添加:

# set java environment

export JAVA_HOME=/usr/java/jdk1.7.0_51

3.4 core-site.xml

<configuration>

 

         <property>   

                   <name>fs.defaultFS</name>

                   <value>hdfs://master:9000</value>

         </property>

 

         <property>

                   <name>io.file.buffer.size</name>

                   <value>131072</value>

         </property>

 

         <property>

                   <name>hadoop.tmp.dir</name>

                   <value>/usr/local/hadoop-2.2.0/temp</value>

                   <description>Abasefor other temporary directories.</description>

         </property>

 

</configuration>

3.5 yarn-site.xml

<configuration>

 

<!-- Site specific YARN configurationproperties -->

         <property>

              <name>yarn.nodemanager.aux-services</name>

 

               <value>mapreduce_shuffle</value>

 

       </property>

 

       <property>

 

              <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

 

              <value>org.apache.hadoop.mapred.ShuffleHandler</value>

 

       </property>

 

       <property>

 

              <name>yarn.resourcemanager.address</name>

 

              <value>master:8032</value>

 

       </property>

 

       <property>

 

              <name>yarn.resourcemanager.scheduler.address</name>

 

               <value>master:8030</value>

 

       </property>

 

       <property>

 

               <name>yarn.resourcemanager.resource-tracker.address</name>

 

               <value>master:8031</value>

 

       </property>

 

       <property>

 

                <name>yarn.resourcemanager.admin.address</name>

 

                <value>master:8033</value>

 

       </property>

 

       <property>

 

              <name>yarn.resourcemanager.webapp.address</name>

 

              <value>master:8088</value>

 

      </property>

 

</configuration>

3.6 mapred-site.xml

<configuration>

         <property>

            <name>mapreduce.framework.name</name>

            <value>yarn</value>

       </property>

 

       <property>

            <name>mapreduce.jobhistory.address</name>

            <value>master:10020</value>

       </property>

 

       <property>

              <name>mapreduce.jobhistory.webapp.address</name>

              <value>master:19888</value>

      </property>

 

</configuration>

3.7 hdfs-site.xml

<configuration>

         <property>

               <name>dfs.namenode.secondary.http-address</name>

              <value>master:9001</value>

       </property>

 

       <property>

                <name>dfs.namenode.name.dir</name>

                <value>/usr/local/hadoop-2.2.0/dfs/name</value>

       </property>

 

       <property>

                <name>dfs.datanode.data.dir</name>

                <value>/usr/local/hadoop-2.2.0/dfs/data</value>

       </property>

 

       <property>

               <name>dfs.replication</name>

                <value>3</value>

       </property>

 

       <property>

               <name>dfs.webhdfs.enabled</name>

                <value>true</value>

         </property>

 

</configuration>

四、测试hadoop

4.1 相关指令

4.1.1 网页

1)        在master节点上,直接输入master:50070, 这是查看HDFS状态

2)        master:8088,这是hadoop的资源管理器

3)        在其他windows系统中,在C:\Windows\System32\drivers\etc\hosts中,加入

192.168.178.181master

192.168.178.182slave1

192.168.178.183slave2

192.168.178.184slave3

可以解决常见错误:BROWSE THE FILESYSTEM链接打不开

4.1.2 其他指令

1)        退出安全模式:./bin/hdfs dfsadmin -safemode leave

2)        往HDFS上写文件./bin/hadoop dfs –mkdir /input

3)        查看HDFS根目录   ./bin/hadoop dfs -ls /

4)        ./bin/hdfs dfs -copyFromLocal/usr/hadoop/input/qing.txt /input

5)        测试程序:

a)         ./bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter input

6)        watch -n 1 "/sbin/ifconfigeth0 | grep bytes"  查看实时网络流量

4.2 测试

4.2.1 WordCount

1)        创建hdfs目录:

./bin/hdfs dfs -mkdir /input

2)        上传文件

./bin/hdfs dfs -copyFromLocal /usr/hadoop/input/qing.txt/input

./bin/hdfs dfs -copyFromLocal /usr/hadoop/input/feng.txt /input

3)        执行:

./bin/hadoop jar /usr/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.2.0-sources.jarorg.apache.hadoop.examples.WordCount /input /output

4)        查看结果:

./bin/hdfs dfs -cat /output/part-r-00000

详见:http://blog.csdn.net/bamuta/article/details/14226243

4.2.2 计算pi

1)        ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 2  

2)        pi后面的两个参数分别是map数和reduce数

4.2.3 随机存入数据

1)        ./bin/hadoop jarshare/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriterrandom-data

帮助知识

用来debug显示错误信息:

export HADOOP_ROOT_LOGGER=DEBUG,console

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值