Installation of Hadoop-1.2.1 Pseudo-distributed mode on Centos 7

2 篇文章 0 订阅

1 Hadoop Versions

   On the official website of Apache, there are variable hadoop releases from 0.10.1 to 2.7.2(recent). In compared with the early releases, the 2.x hadoop introduced Global Resource Manager and Application Master, which are the core components of the so-called YARN framework. Beside of MapReduce, a large number of other parallel computing models such as Memory-Computing, Streaming-Computing, Iterative-Computing, Graph-Computing can be compatible in the new hadoop system. Meantime, according to distinct platforms, Apache offers corresponding packages(.rpm/.deb) or compression files(-bin.tar.gz/tar.gz). So how to choose a suitable package according to your OS is really a trick.

   At first, I choose a .rpm hadoop for my centos7. Question happened when I try to use the rpm package management tool to set up the hadoop:

       rpm -ivh  ./hadoop***.rpm

       The error message shows that the default set-up directory in hadoop/bin conflicts with the system root directory /bin. So you have to use other arguments(relocate or prefix) to denote the appointed installation directory:

      rpm -ivh --relocate /=/opt/temp xxx.rpm;  or   rpm -ivh --prefix= /opt/temp  xxx.rpm

  On the contrary, it is more convenient to handle with the .tar.gz version of hadoop. Just use the tar tool and then copy it to arbitrary rational position.

2 Prerequisites for Installation

  It is suggested that you create a new linux user for hadoop, and assign the new user a higher permission by modifying the sudoers file in /etc directory. Remember to recover the file's read-only attribute after your finished:

     1)Create a new user for hadoop

           groupadd hadoop-user    ----- create a user group

           useradd -g hadoop-user hadoop   ----- add up the new user hadoop to the group

           passwd hadoop                ----- set up a password for your new user.

     2)Modify permission for the new user

           Switch to the root mode, then add the writing permission of the /etc/sudoers to the logged-in new user.

                 #chmod u+w  /etc/sudoers

          emend the sudoers file, add up a new line:

                 user    ALL(ALL)  NOPASSWD: ALL   or     user   ALL(ALL)   ALL

          At last, recover the sudoers file to the read-only mode.

                 #chmod u-w  /etc/sudoers

  Since hadoop already runs in java, you need to have a java version 1.6 or higher on your machine. Fortunately, centos contains openjdk 1.8 for the recent upgrade. You can also choose another official version of java from Oracle. If you choose a .rpm package for java, it needs not to set up the environment variable. If not, add the JAVA_HOME to your profile configuration, the process is omitted...

  The communication between nodes in the cluster happens via ssh. In a multi-node cluster setup of communication between individual nodes, while in single-node cluster, localhost acts as server. The concrete configuration as below:

          $ssh-keygen -t rsa          ------ generation of keys pair

          $cp id_rsa.pub authorized_keys      ------- copy the public key to the authorized user

          $ssh localhost                 ------ test the password-less connection

  If the connection should fail, these general tips might help:

          Enable debugging with ssh -vvv localhost and investigate the error in detail.

          Check the SSH server configuration in /etc/ssh/sshd_config, in particular the options PubkeyAuthentication and AllowUsers. If you made any changes to the SSH server configuration file, you can force a configuration reload with  sudo /etc/init.d/ssh reload.

3 Formal deployment

  It is not recommend to add the HADOOP_HOME to the environment variable for the reason that it is deprecated. you need several steps to finish your work:

  Step1: Configuring the Hadoop environment

          Just append the correlative contents to the respective four files in hadoop-**/conf:

          1)hadoop-env.sh

             $export JAVA_HOME="Where your JAVA HOME"

          2)core-site.xml

            

         3)hdfs-site.xml

        4)mapred-site.xml

  Step2: Running Hadoop

       1)Formatting the NameNode

           $bin/hadoop namenode -format

       2)Starting Hadoop

           You can do a two-stage start up to more easily verify the cluster configuration or just start-all.

           $bin/start-dfs.sh

           $bin/start-mapred.sh

           or:

           $bin/start-all.sh

        3)Checking the started hadoop process

           Normally, if you operated right, by using the 'jps' command, you would find totally five hadoop processes except the jps process itself.

          There are several tips if you have some of the processes failed to start:

          a. Checking out the four configuration file in the hadoop installation directory hadoop-***/conf, make the directory you denoted for 'tmp' or 'namenode', 'datanode' existing.

          b. Censoring if you have the permission to manage the denoted directories above.

          c. Through the hadoop web UI to find relative error log.

   Step3: Testing instance

         Here, we use the hadoop owned example, which aims to compute PI, the first parameter rules the running times of map, the second one denotes the number of samples each map task needs to fetch. The command and map-reduce running procedure are listed below:

         $bin/hadoop jar $HADOOP_HOME/hadoop-examples-1.2.1.jar \

          >pi 2 5


        

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值