hadoop完全分布式环境搭建笔记

                                        hadoop完全分布式环境搭建笔记

0.环境简介
       
  在centos6.3系统中,安装了xen-4.1.2+linux-2.6.31.8,由此配置了带xen的系统。然后作为测试,起4个虚拟机,分别为centso1,centos2,centos3,centos4来模拟分布式集群,用它们搭建hadoop的完全分布式环境。没有这么多的物理机只有这样学习了,刚好也能体现出一点虚拟化的云计算的感觉。。。哈哈

1.
准备
  
  (1)分配机器节点如下:
         物理机              192.168.77.88            监控机(浏览器)
         centos1           
  192.168.77.89                       作为namenode
        centos2              192.168.77.90             作为datanode
        centos3              192.168.77.91             作为datanode
           centos4              192.168.77.92             作为datanode

    
  (2)修改部署环境中的/etc/hosts文件
         物理机:
        [root@as img]# cat /etc/hosts
        127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
        ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
        127.0.0.1   as as.localdomanin
        192.168.77.89   centos1
        192.168.77.90   centos2
        192.168.77.91   centos3
        192.168.77.92   centos4
        虚拟机(四个虚拟机都一样的配置)
        [root@centos1 ~]# cat /etc/hosts
        #127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
        #::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
        #127.0.0.1   as as.localdomanin
        192.168.77.89 centos1
        192.168.77.90 centos2
        192.168.77.91 centos3
        192.168.77.92 centos4
  
2.set-SSH
     
  设置ssh,使得namenode节点[centos1]向其它3个datanode节点[centos2,centos3,centos4]无密码登录。
       (1)集群分配表:
           192.168.77.89 centos1    //作为namenode
           192.168.77.90 centos2    //作为datanode
           192.168.77.91 centos3    //作为datanode
           192.168.77.92 centos4    //作为datanode
       
       (2)分别在centso1,centso2,centso3,centso4中生成密钥:(一路回车或者“y“)

       [root@centos1 ~]# ssh-keygen -t rsa
       Generating public/private rsa key pair.
       Enter file in which to save the key (/root/.ssh/id_rsa):
       Enter passphrase (empty for no passphrase):
       Enter same passphrase again:
       Your identification has been saved in /root/.ssh/id_rsa.
       Your public key has been saved in /root/.ssh/id_rsa.pub.
       The key fingerprint is:
       68:a7:db:bf:13:9b:f3:e8:65:f9:92:0b:a9:f1:16:16 root@centos1
       The key's randomart image is:
       +--[ RSA 2048]----+
       |                             |
       |                             |
       |                             |
       |       .  E                 |
       |      o S  .               |
       |     . o  +. .             |
       |      . ..o=+.            |
       |       o +*=o.           |
       |      . o+*=oo.         |
      +-----------------------+
     
      (3)在centso2,centso3,centso4同样操作命令:ssh-keygen -t rsa。
      
      (4)centos1复制公钥:
      [root@centos1 ~]# cd /root/.ssh
      [root@centos1 .ssh]# ls
      id_rsa    id_rsa.pub  known_hosts
      [root@centos1 .ssh]# cp id_rsa.pub authorized_keys
      [root@centos1 .ssh]# ls
      authorized_keys  id_rsa  id_rsa.pub  known_hosts
   
     
  (5)centos1分发公钥给datanode节点[centos2,centos3,centos4]:
     
  [root@centos1 .ssh]#  scp authorized_keys centos2:/root/.ssh
     
  [root@centos1 .ssh]#  scp authorized_keys centos3:/root/.ssh
     
  [root@centos1 .ssh] scp authorized_keys centos4:/root/.ssh

      (6)更改文件权限,分别在centos1,centos2,centos3,centos4中执行:(注意均是在目录/root/.ssh/中)
    
    [root@centos1 .ssh] chmod 644 authorized_keys
      
  [root@centos2 .ssh] chmod 644 authorized_keys
       [root@centos3 .ssh] chmod 644 authorized_keys
       [root@centos4 .ssh] chmod 644 authorized_keys


       此时从centos1中向其他的centos2,centos3,centos4发起SSH连接时,只有在第一次登录时需要密码,以后则不再需要。
    
      
 
3.set-hadoop
   
  无论是namenod还是datanode的配置,其实可以都一样(本例便如此)。当然为了具体的工作和测试可以根据实际情况修改配置。(详细配置方案请参考官方文档) 
     以下操作均在namenode即centos1的/usr/hadoop/conf/目录下,我的hadoop是放在/usr/目录下的!
     
  (1)去官方网站下载最新的hadoop,我下载的是hadoop-0.21.0.tar.gz
         参考网址:
http://hadoop.apache.org/
                  下载后放在centos1的/usr/目录下,解压缩,重命名为hadoop。
        
  [root@centos1 usr]# tar xvzf    hadoop-0.21.0.tar.gz
           [root@centos1 usr]# mv     hadoop-0.21.0  hadoop
        
  [root@centos1 usr]# cd   /usr/hadoop/conf/

       (2)配置hadoop -env.sh文件
         主要是配置JAVA_HOME路径,填写本系统中的java JDK路径。若四个虚拟机中JDK的路径不同,就得需要单独配置此文件。
         [root@centos1 conf]# cat hadoop-env.sh
        # Set Hadoop-specific environment variables here.
       
        # The only required environment variable is JAVA_HOME.  All others are
        # optional.  When running a distributed configuration it is best to
        # set JAVA_HOME in this file, so that it is correctly defined on
        # remote nodes.

        # The java implementation to use.  Required.
        # export JAVA_HOME=/usr/lib/j2sdk1.6-sun
        export JAVA_HOME=/usr/java/jdk1.7.0_05

        # Extra Java CLASSPATH elements.  Optional.
        # export HADOOP_CLASSPATH=
          ...........................
          ...........................

    
  (3)配置core-site.xml文件
       [root@centos1 conf]# cat core-site.xml
       <?xml version="1.0"?>
       <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
       <!-- Put-specific property overrides in this file. -->
       <configuration>
           <property>
                 <name>fs.default.name</name>
                 <value>hdfs://192.168.77.89:9000/</value>
           </property>
          <property>
                 <name>hadoop.tmp.dir</name>
                 <value>/usr/local/hadoop/hadooptmp</value>
          </property>
       </configuration>

    
  (4)配置 mapred-site.xml文件
       [root@centos1 conf]# cat mapred-site.xml
       <?xml version="1.0"?>
       <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
       <!-- Put site-specific property overrides in this file. -->
       <configuration>
           <property>
                  <name>mapred.job.tracker</name>
                  <value>192.168.77.89:9001</value>
           </property>
           <property>
                  <name>mapred.local.dir</name>
                  <value>/usr/local/hadoop/mapred/local</value>
           </property>
           <property> 
                  <name>mapred.system.dir</name>
                  <value>/tmp/hadoop/mapred/system</value>
           </property>
       </configuration>


    
  (5)配置 hdfs-site.xml
       [root@centos1 conf]# cat hdfs-site.xml
       <?xml version="1.0"?>
       <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
       <!-- Put site-specific property overrides in this file. -->
       <configuration>
            <property>
                   <name>dfs.name.dir</name>
                   <value>/usr/local/hadoop/hdfs/name</value>
            </property>
            <property>
                   <name>dfs.data.dir</name>
                   <value>/usr/local/hadoop/hdfs/data</value>
            </property>
            <property>
                   <name>dfs.replication</name>
                   <value>3</value>
            </property>
        </configuration>
    
     
  (6)配置masters和slaves文件
        [root@centos1 conf]# cat masters
        #localhost
        192.168.77.89  #centos1
        [root@centos1 conf]# cat slaves
        #localhost
        192.168.77.90 #centos2
        192.168.77.91 #centos3
        192.168.77.92 #centos4

     
  (7)分发配置好的hadoop
         
  [root@centos1 usr]# scp -r hadoop centos2:/root/usr/hadoop
          [root@centos1 usr]# scp -r hadoop centos3:/root/usr/hadoop
          [root@centos1 usr]# scp -r hadoop centos4:/root/usr/hadoop       

 
4.run-hadoop
     运行hadoop之前将物理机和四个虚拟机的防火墙都暂时关闭,用命令service iptables stop,也可以用别的命令永久关闭。
      (1)格式化这个分布式集群的文件系统
[root@centos1 hadoop]# bin/hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
 

12/07/31 02:13:49 INFO namenode.NameNode: STARTUP_MSG:
 

12/07/31 02:13:49 WARN common.Util: Path /usr/local/hadoop/hdfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
 
12/07/31 02:13:49 WARN common.Util: Path /usr/local/hadoop/hdfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: defaultReplication = 3
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: maxReplication = 512
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: minReplication = 1
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: maxReplicationStreams = 2
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: shouldCheckForEnoughRacks = false
 
12/07/31 02:13:49 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: fsOwner=root
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: supergroup=supergroup
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: isPermissionEnabled=true
 
12/07/31 02:13:49 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
 
12/07/31 02:13:50 INFO common.Storage: Image file of size 110 saved in 0 seconds.
 
12/07/31 02:13:50 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
 
12/07/31 02:13:50 INFO namenode.NameNode: SHUTDOWN_MSG:



      (2)运行hadoop
[root@centos1 hadoop-0.21.0]# bin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-mapred.sh
starting namenode, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-namenode-centos1.out
 
192.168.77.90: starting datanode, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-datanode-centos2.out
 
192.168.77.92: starting datanode, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-datanode-centos4.out
 
192.168.77.91: starting datanode, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-datanode-centos3.out
 
The authenticity of host '192.168.77.89 (192.168.77.89)' can't be established.
 
RSA key fingerprint is 2b:e9:15:76:32:35:6b:d5:c4:29:2c:40:6f:5b:30:25.
 
Are you sure you want to continue connecting (yes/no)? yes
 
192.168.77.89: Warning: Permanently added '192.168.77.89' (RSA) to the list of known hosts.
 
192.168.77.89: starting secondarynamenode, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-secondarynamenode-centos1.out
 
starting jobtracker, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-jobtracker-centos1.out
 
192.168.77.90: starting tasktracker, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-tasktracker-centos2.out
 
192.168.77.92: starting tasktracker, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-tasktracker-centos4.out
 
192.168.77.91: starting tasktracker, logging to /usr/hadoop-0.21.0/bin/../logs/hadoop-root-tasktracker-centos3.out


      (3)查看此时hadoop中的JAVA任务
[root@centos1 hadoop]# jps
1869 NameNode
2058 SecondaryNameNode
2154 JobTracker
2258 Jps

[root@centos2 hadoop]# jps
1806 DataNode
1933 Jps
1892 TaskTracker

centos3,centos4与centos2的jps运行任务一致!!!


此时在物理机中的浏览器中已经可以看到集群的大致情况了:

centos1 Hadoop Map/Reduce Administration:
http://http://192.168.77.89:50030/jobtracker.jsp
OR:
  http://centos1:50030/jobtracker.jsp

NameNode 'centos1:9000':
http://192.168.77.89:50070/dfshealth.jsp
OR:   http://centos1:50070/dfshealth.jsp


      (4)测试集群(用wordcount例程)
查看集群中的文件:
[root@centos1 hadoop]# bin/hadoop fs -ls
12/08/01 10:08:56 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/08/01 10:08:56 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

准备文件:
[root@centos1 hadoop]# mkdir in
[root@centos1 hadoop]# cp conf/*xml  in
[root@centos1 hadoop]# ls in/
capacity-scheduler.xml    fair-scheduler.xml  hdfs-site.xml      mapred-site.xml
core-site.xml        hadoop-policy.xml   mapred-queues.xml


上传文件:
[root@centos1 hadoop]# bin/hadoop fs -put in in
12/08/01 10:10:50 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/08/01 10:10:50 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id

[root@centos1 hadoop]# bin/hadoop fs -ls
12/08/01 10:10:57 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/08/01 10:10:57 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
 
Found 1 items
 
drwxr-xr-x   - root supergroup          0 2012-08-01 10:10 /user/root/in


计算文件中各个单词的频率:
[root@centos1 hadoop]# bin/hadoop jar hadoop-mapred-examples-0.21.0.jar wordcount in out
12/08/01 10:11:41 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/08/01 10:11:41 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
 
12/08/01 10:11:41 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
 
12/08/01 10:11:42 INFO input.FileInputFormat: Total input paths to process : 7
 
12/08/01 10:11:42 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
 
12/08/01 10:11:42 INFO mapreduce.JobSubmitter: number of splits:7
 
12/08/01 10:11:42 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null
 
12/08/01 10:11:42 INFO mapreduce.Job: Running job: job_201208010835_0001
 
12/08/01 10:11:43 INFO mapreduce.Job:  map 0% reduce 0%
 
12/08/01 10:11:53 INFO mapreduce.Job:  map 57% reduce 0%
 
12/08/01 10:11:54 INFO mapreduce.Job:  map 85% reduce 0%
 
12/08/01 10:11:59 INFO mapreduce.Job:  map 100% reduce 0%
 
12/08/01 10:12:05 INFO mapreduce.Job:  map 100% reduce 100%
 
12/08/01 10:12:07 INFO mapreduce.Job: Job complete: job_201208010835_0001
 
12/08/01 10:12:07 INFO mapreduce.Job: Counters: 33
 
    FileInputFormatCounters
        BYTES_READ=12940
    FileSystemCounters
        FILE_BYTES_READ=10491
        FILE_BYTES_WRITTEN=21242
        HDFS_BYTES_READ=13783
        HDFS_BYTES_WRITTEN=5894
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    Job Counters
        Data-local map tasks=7
        Total time spent by all maps waiting after reserving slots (ms)=0
        Total time spent by all reduces waiting after reserving slots (ms)=0
        SLOTS_MILLIS_MAPS=26944
        SLOTS_MILLIS_REDUCES=9322
        Launched map tasks=7
        Launched reduce tasks=1
    Map-Reduce Framework
        Combine input records=1477
        Combine output records=630
        Failed Shuffles=0
        GC time elapsed (ms)=326
        Map input records=332
        Map output bytes=17598
        Map output records=1477
        Merged Map outputs=7
        Reduce input groups=416
        Reduce input records=630
        Reduce output records=416
        Reduce shuffle bytes=10527
        Shuffled Maps =7
        Spilled Records=1260
        SPLIT_RAW_BYTES=843


查看结果:
[root@centos1 hadoop]# bin/hadoop fs -ls
12/08/01 10:13:07 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/08/01 10:13:07 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
 
Found 2 items
 
drwxr-xr-x   - root supergroup          0 2012-08-01 10:10 /user/root/in

drwxr-xr-x   - root supergroup          0 2012-08-01 10:12 /user/root/out
[root@centos1 hadoop]# bin/hadoop fs -cat out/*
12/08/01 10:13:16 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
12/08/01 10:13:16 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
 
"*"    10
 
"AS    2
 
"License");    2
 
"alice,bob    10
 
'*',    2
 
':'    1
 
'aclsEnabled'    1

后面一堆的运行结果,在此省略!
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值