Hadoop的安装与配置

1.     系统初始化

一)软件准备 

1)VMware-workstation-full-8.0.4-744019.exe

(license key :NF64A-DMJ92-TZNQO-9VCXK-AAJJT)

2)(ubuntu-11.10-server-i386.iso) 

3)hadoop-1.0.3.tar.gz(1.0.3)

4)ssh secure shell

 

二)安装VM

 

三)安装redhat

 

四)安装ssh secure shell

可以在本机登录虚拟机

 

五)设置hostname、hosts

vi/etc/sysconfig/network

修改:HOSTNAME=redhat1

$>hostnameredhat1(此命令后不用重启)

$>hostname   (显示修改后的主机名)

 

vi/etc/hosts

修改:

127001localhost

192.168.229.128  redhat1

#192.168.229.129 redhat2

#192.168.229.130 redhat3

 

六)安装SSH、免密码SSH设置

1) sudoapt-get install ssh(ubuntu)

                   yum install ssh(centos)

2)生成密钥对:ssh-keygen–t rsa

一路回车,文件保存在/root/.ssh

3)进入.ssh目录,执行命令:

cpid_rsa.pub authorized_keys

sshlocalhost

  

七)安装与配置JDK

1./jdk-6u26-linux-i586.bin

输入java javacjava -version版本有信息

2、安装vim软件,方便以后编辑文件apt-get  install vim 

(强烈建议安装,因为vi工具使用起来很不方便)

3、配置java环境:

    1vim /etc/profile 编辑profile文件

    2)在文件末尾加入如下信息:

exportJAVA_HOME=/usr/java/jdk1.6.0_26
exportJRE_HOME=/usr/java/jdk1.6.0_26/jre
exportCLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATT

3)编辑完毕 :wq保存退出

4source /etc/profile

 

八)安装Hadoop

1tar -zvxf hadoop-1.0.3.tar.gz

mvhadoop-1.0.3 hadoop

 

2、编辑usr/hadoop2-0.20.2/conf/hadoop-env.sh文件

 vim conf/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.6.0_26

3reboot重启电脑

 输入 hadoop version出现版本信息安装完毕




2.     三种工作模式

Ø  单机模式

单机模式是Hadoop的默认模式。当首次解压Hadoop的源码包时,Hadoop无法了解硬件安装环境,便保守地选择了最小配置。在这种默认情况下所有3个XML文件均为空。

当配置文件为空时,Hadoop会完全的运行在本地。因为不需要与其他节点交互,单机模式不使用HDFS,也不加载任何Hadoop的守护进程(不用运行任何.sh命令,如start-all.sh)。该模式主要用于开发调试MapReduce程序的应用逻辑,而不会与守护进程交互,避免引起额外的复杂性。

不做任何配置(连hadoop-env.sh里的jre都不用配置),直接运行bin/hadoop jar hadoop-examples-x.y.z.jar wordcount input output 命令就是单机模式运行方式,它输入输出都是本地文件。

 

[root@localhosthadoop]# mkdir input

[root@localhosthadoop]# cd input

[root@localhostinput]# ls

[root@localhostinput]# echo "hello lsr" > test1.txt

[root@localhostinput]# ls

test1.txt

[root@localhostinput]# echo "hello hadoop" > test2.txt

[root@localhostinput]# ls

test1.txt  test2.txt

[root@localhostinput]# cd ../

[root@localhosthadoop]# bin/hadoop jar hadoop-examples-1.0.3.jar wordcount input output

12/10/2310:34:03 INFO util.NativeCodeLoader: Loaded the native-hadoop library

12/10/2310:34:03 INFO input.FileInputFormat: Total input paths to process : 2

12/10/2310:34:03 WARN snappy.LoadSnappy: Snappy native library not loaded

12/10/2310:34:03 INFO mapred.JobClient: Running job: job_local_0001

12/10/2310:34:04 INFO util.ProcessTree: setsid exited with exit code 0

12/10/2310:34:04 INFO mapred.Task:  UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@cd5f8b

12/10/2310:34:04 INFO mapred.MapTask: io.sort.mb = 100

12/10/2310:34:04 INFO mapred.JobClient:  map 0%reduce 0%

12/10/2310:34:09 INFO mapred.MapTask: data buffer = 79691776/99614720

12/10/2310:34:09 INFO mapred.MapTask: record buffer = 262144/327680

12/10/2310:34:09 INFO mapred.MapTask: Starting flush of map output

12/10/2310:34:09 INFO mapred.MapTask: Finished spill 0

12/10/2310:34:09 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And isin the process of commiting

12/10/2310:34:10 INFO mapred.LocalJobRunner:

12/10/2310:34:10 INFO mapred.Task: Task 'attempt_local_0001_m_000000_0' done.

12/10/2310:34:10 INFO mapred.Task:  UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@c4fe76

12/10/2310:34:10 INFO mapred.MapTask: io.sort.mb = 100

12/10/2310:34:10 INFO mapred.MapTask: data buffer = 79691776/99614720

12/10/2310:34:10 INFO mapred.MapTask: record buffer = 262144/327680

12/10/2310:34:10 INFO mapred.MapTask: Starting flush of map output

12/10/2310:34:10 INFO mapred.MapTask: Finished spill 0

12/10/2310:34:10 INFO mapred.Task: Task:attempt_local_0001_m_000001_0 is done. And isin the process of commiting

12/10/2310:34:10 INFO mapred.JobClient:  map 100%reduce 0%

12/10/2310:34:13 INFO mapred.LocalJobRunner:

12/10/2310:34:13 INFO mapred.Task: Task 'attempt_local_0001_m_000001_0' done.

12/10/2310:34:13 INFO mapred.Task:  UsingResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@e28b9

12/10/2310:34:13 INFO mapred.LocalJobRunner:

12/10/2310:34:13 INFO mapred.Merger: Merging 2 sorted segments

12/10/2310:34:13 INFO mapred.Merger: Down to the last merge-pass, with 2 segments leftof total size: 51 bytes

12/10/2310:34:13 INFO mapred.LocalJobRunner:

12/10/2310:34:13 INFO mapred.Task: Task:attempt_local_0001_r_000000_0 is done. And isin the process of commiting

12/10/2310:34:13 INFO mapred.LocalJobRunner:

12/10/2310:34:13 INFO mapred.Task: Task attempt_local_0001_r_000000_0 is allowed tocommit now

12/10/2310:34:13 INFO output.FileOutputCommitter: Saved output of task'attempt_local_0001_r_000000_0' to output

12/10/2310:34:16 INFO mapred.LocalJobRunner: reduce > reduce

12/10/2310:34:16 INFO mapred.Task: Task 'attempt_local_0001_r_000000_0' done.

12/10/2310:34:16 INFO mapred.JobClient: map 100%reduce 100%

12/10/2310:34:16 INFO mapred.JobClient: Job complete: job_local_0001

12/10/2310:34:16 INFO mapred.JobClient: Counters: 20

12/10/2310:34:16 INFO mapred.JobClient:   FileOutput Format Counters

12/10/2310:34:16 INFO mapred.JobClient:     BytesWritten=35

12/10/2310:34:16 INFO mapred.JobClient:  FileSystemCounters

12/10/2310:34:16 INFO mapred.JobClient:    FILE_BYTES_READ=428713

12/10/2310:34:16 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=528398

12/10/2310:34:16 INFO mapred.JobClient:   FileInput Format Counters

12/10/2310:34:16 INFO mapred.JobClient:     BytesRead=23

12/10/2310:34:16 INFO mapred.JobClient:  Map-Reduce Framework

12/10/2310:34:16 INFO mapred.JobClient:     Mapoutput materialized bytes=59

12/10/2310:34:16 INFO mapred.JobClient:     Mapinput records=2

12/10/2310:34:16 INFO mapred.JobClient:    Reduce shuffle bytes=0

12/10/2310:34:16 INFO mapred.JobClient:    Spilled Records=8

12/10/23 10:34:16INFO mapred.JobClient:     Map outputbytes=39

12/10/2310:34:16 INFO mapred.JobClient:     Totalcommitted heap usage (bytes)=548130816

12/10/2310:34:16 INFO mapred.JobClient:     CPUtime spent (ms)=0

12/10/2310:34:16 INFO mapred.JobClient:     SPLIT_RAW_BYTES=200

12/10/2310:34:16 INFO mapred.JobClient:    Combine input records=4

12/10/2310:34:16 INFO mapred.JobClient:    Reduce input records=4

12/10/2310:34:16 INFO mapred.JobClient:    Reduce input groups=3

12/10/2310:34:16 INFO mapred.JobClient:    Combine output records=4

12/10/2310:34:16 INFO mapred.JobClient:    Physical memory (bytes) snapshot=0

12/10/2310:34:16 INFO mapred.JobClient:    Reduce output records=3

12/10/2310:34:16 INFO mapred.JobClient:    Virtual memory (bytes) snapshot=0

12/10/2310:34:16 INFO mapred.JobClient:     Mapoutput records=4

[root@localhosthadoop]# cd output

[root@localhostoutput]# ls

part-r-00000  _SUCCESS

[root@localhostoutput]# cat part-r-00000

hadoop  1

hello   2

lsr     1

[root@localhostoutput]#



Ø  伪分布模式

1)        解压文件

tar xzvf hadoop-1.0.3.tar.gz

 

2)        修改配置文件

1、hadoop-env.sh

增加:

export JAVA_HOME=/soft/java/jdk1.6.0_13

2、core-site.xml

         <configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://ubuntu:9000</value>

</property>

</configuration>

3、hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/hadoop/hadoopdata/tmp</value>

</property>

<property>

<name>dfs.name.dir</name>

<value>/hadoop/hadoopdata/fs/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/hadoop/hadoopdata/fs/data</value>

</property>

</configuration>

4、mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>ubuntu:9001</value>

</property>

</configuration>

 

3)        格式文件系统

bin/hadoop namenode –format

4)        启动

root@ubuntu:/_nosql/hadoop# bin/start-all.sh

 

root@ubuntu:/_nosql/hadoop# jps

9117 JobTracker

8927 DataNode

9228 TaskTracker

9266 Jps

9037 SecondaryNameNode

8814 NameNode

 

5)        验证安装是否成功

http://ubuntu:50030       (MapReduce页面)

http://ubuntu:50070        (HDFS页面)



Ø  全分布(集群)模式

节点说明:

节点类型      节点IP      节点hostname

master节点      192.168.40.4 master

slave节点      192.168.40.5   salve1

192.168.40.6   salve2

192.168.40.7   slave3

secondaryName节点 192.168.40.4

 

配置步骤:

一、按本章第一节系统初始化安装一台虚拟机

二、VMWare(manager中的clone) clone三台虚拟机

三、修改每台虚拟机的hosts,hostname

vi /etc/sysconfig/network

修改:HOSTNAME=master

$>hostname master(此命令后不用重启)

$>hostname        (显示修改后的主机名)

 

vi /etc/hosts

修改:

127.0.0.1            localhost

192.168.40.4   master

192.168.40.5   salve1

192.168.40.6   salve2

192.168.40.7   slave3

四、修改测试四台虚拟机的ssh

生成密钥并配置ssh无密码登陆主机(master主机)

ssh -keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

authorized_keys文件拷贝到两台slave主机

scp authorized_keys slave1:~/.ssh/

scp authorized_keys slave2:~/.ssh/

检查是否可以从master无密码登陆slave

ssh slave1(master主机输入)登陆成功则配置成功,exit退出slave1返回master

五、关闭四台虚拟机的防火墙

关闭防火墙

Shell代码

service iptables stop  

机器重启后,防火墙还会开启。

 

关闭/开启Red hat防火墙

/* 关闭防火墙 */

service iptables stop

/* 开启防火墙 */

service iptables start

/* 默认关闭防火墙 */

chkconfig iptables off

六、配置hadoop

1、hadoop-env.sh

增加:

export JAVA_HOME=/_work/jdk

2、core-site.xml

         <configuration>

<property>

<name>fs.default.name</name>

<value>hdfs://master:9000</value>

</property>

</configuration>

3、hdfs-site.xml

<configuration>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

 <name>hadoop.tmp.dir</name>

 <value>/_work/hadoop/hadoopdata/tmp</value>

</property>

<property>

 <name>dfs.name.dir</name>

 <value>/_work/hadoop/hadoopdata/fs/name</value>

</property>

<property>

 <name>dfs.data.dir</name>

 <value>/_work/hadoop/hadoopdata/fs/data</value>

</property>

</configuration>

4、mapred-site.xml

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>master:9001</value>

</property>

</configuration>

5、masters

                           master

6、slavers

                    slaver1

                           slaver2

                           slaver3

七、启动hadoop

hadoop namenode -format

start-all.sh,或者执行start-dfs.sh,再执行start-mapred.sh

八、测试

master节点:
[root@master ~]# jps

25429 SecondaryNameNode

25500 JobTracker

25201 NameNode

18474 Jps

 

slave节点:
[root@slave1 ~]# jps

4469 TaskTracker

4388 DataNode

29622 Jps

 

hadoop fs -ls /

hadoop fs -mkdir /newDir

[root@slave1 hadoop-0.20.2]#hadoop jar hadoop-0.20.2-examples.jar pi 4 2

Number of Maps  = 4

Samples per Map = 2

Wrote input for Map #0

Wrote input for Map #1

Wrote input for Map #2

Wrote input for Map #3

Starting Job

12/05/20 09:45:19 INFO mapred.FileInputFormat:Total input paths to process : 4

12/05/20 09:45:19 INFOmapred.JobClient: Running job: job_201205190417_0005

12/05/20 09:45:20 INFOmapred.JobClient:  map 0% reduce 0%

12/05/20 09:45:30 INFOmapred.JobClient:  map 50% reduce 0%

12/05/20 09:45:31 INFOmapred.JobClient:  map 100% reduce 0%

12/05/20 09:45:45 INFOmapred.JobClient:  map 100% reduce 100%

12/05/20 09:45:47 INFOmapred.JobClient: Job complete: job_201205190417_0005

12/05/20 09:45:47 INFOmapred.JobClient: Counters: 18

12/05/20 09:45:47 INFOmapred.JobClient:   Job Counters

12/05/20 09:45:47 INFOmapred.JobClient:     Launched reducetasks=1

12/05/20 09:45:47 INFOmapred.JobClient:     Launched maptasks=4

12/05/20 09:45:47 INFOmapred.JobClient:     Data-local maptasks=4

12/05/20 09:45:47 INFOmapred.JobClient:   FileSystemCounters

12/05/20 09:45:47 INFOmapred.JobClient:     FILE_BYTES_READ=94

12/05/20 09:45:47 INFOmapred.JobClient:     HDFS_BYTES_READ=472

12/05/20 09:45:47 INFOmapred.JobClient:    FILE_BYTES_WRITTEN=334

12/05/20 09:45:47 INFOmapred.JobClient:    HDFS_BYTES_WRITTEN=215

12/05/20 09:45:47 INFOmapred.JobClient:   Map-Reduce Framework

12/05/20 09:45:47 INFOmapred.JobClient:     Reduce inputgroups=8

12/05/20 09:45:47 INFOmapred.JobClient:     Combine outputrecords=0

12/05/20 09:45:47 INFOmapred.JobClient:     Map input records=4

12/05/20 09:45:47 INFOmapred.JobClient:     Reduce shufflebytes=112

12/05/20 09:45:47 INFOmapred.JobClient:     Reduce outputrecords=0

12/05/20 09:45:47 INFOmapred.JobClient:     Spilled Records=16

12/05/20 09:45:47 INFOmapred.JobClient:     Map output bytes=72

12/05/20 09:45:47 INFOmapred.JobClient:     Map input bytes=96

12/05/20 09:45:47 INFOmapred.JobClient:     Combine inputrecords=0

12/05/20 09:45:47 INFO mapred.JobClient:     Map output records=8

12/05/20 09:45:47 INFOmapred.JobClient:     Reduce inputrecords=8

Job Finished in 28.952 seconds

Estimated value of Pi is3.50000000000000000000

 

说明:

NameNodeSecondNameNode分开部署的方法:

1、  master文件:

master文件中直接写上SecondNameNode要部署的机器的主机名或IP

说明:master文件不决定哪个是NameNode,而决定的是SecondNameNode(决定谁是NameNode的关键是core-site.xmlfs.default.name这个参数)

2、  hdfs-site.xml增加一个参数

<name>dfs.http.address</name>

<value>(NameNode的主机名或IP)50070</value>

3、  core-site.xml增加两个参数,一般使用默认就可以

fs.checkpoint.period表示多长时间记录一次HDFS的镜像,默认一个小时

<name>fs.checkpoint.period</name>

<value>3600</value>

fs.checkpoint.size表示一次记录多大的size,默认64M

<name>fs.checkpoint.size</name>

<size>67108864</size>

4、  验证

到目标机器上jps查看进程是否启动

进入hdfs-site.xml文件配置的fs.checkpoint.dir({fs.default.dir}/dfs/namesecondary文件夹下用ll命令查看文件,cd current再用ll命令查看文件。


  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值