Hadoop v2(Yarn)集群配置(ubuntu 12.04)

Hadoop版本: 2.0.2(Yarn)

添加一个用户

  sudo addgroup hadoop

  sudo adduser -ingroup hadoop hadoop

  sudo vim /etc/sudoers

  添加 hadoop ALL=(ALL:ALL) ALL

设置环境变量

vim /etc/profile

export JAVA_HOME=/usr/lib/jvm/java-7-sun

export JRE_HOME=$JAVA_HOME/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH

export HADOOP_PREFIX=/home/hadoop/hadoop-2.0.2-alpha

export HADOOP_HOME=$HADOOP_PREFIX

修改hosts

vim /etc/hosts

192.168.0.1  mcw-cc-nachuang

设置SSH无密码登录:

  1.安装openssh-server

    sudo apt-get install openssh-server

  2. ssh-keygen -t rsa -P "" 

    进入~/.ssh/目录,将id_rsa.pub追加到authorized_keys授权文件中:

    cat id_rsa.pub >> authorized_keys

    然后,将id_rsa.pub 再导入其他的slave中

    slave也要做同样的操作,使slave 和 master之间可以相互无密码登录

  3.关闭防火墙 sudo ufw disable (可选,不过对于初学者建议关掉)

配置完后通过ssh验证各个slaves和master相互都可以无密码登录。


etc/hadoop/配置文件:

1.hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-sun

2.core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://mcw-cc-nachuang:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/tmp</value>
</property>
</configuration>

3.hdfs-site.xml

<configuration>
<property>  
<name>dfs.name.dir</name>  
<value>/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/hdfs/name</value>  
</property>  
<property>  
<name>dfs.data.dir</name>  
<value>/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/hdfs/data</value>
</property>  
<property>  
<name>dfs.replication</name>  
<value>1</value> 
</property>
<property>  
<name>dfs.block.size</name>  
<value>268435456</value>  
</property>
</configuration>
 

4.mapred-site.xml

<configuration>  
<property>  
<name>mapreduce.framework.name</name>  
<value>yarn</value>  
</property>  
<property>  
<name>mapreduce.job.tracker</name>  
<value>hdfs://mcw-cc-nachuang:9001</value>  
<final>true</final>  
</property>  
<property>
<name>mapred.job.map.memory.mb</name>
<value>1700</value>
</property>

<property>
<name>mapred.job.reduce.memory.mb</name>
<value>1700</value>
</property>

<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1400m</value>
</property>
<property>  
<name>mapreduce.task.io.sort.mb</name>  
<value>400</value>  
</property>  
<property>  
<name>mapreduce.task.io.sort.factor</name>  
<value>10</value>  
</property>  
<property>  
<name>mapred.system.dir</name>  
<value>file:/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/mapred/system</value>  
<final>true</final>  
</property>  
<property>  
<name>mapred.local.dir</name>  
<value>file:/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/mapred/local</value>  
<final>true</final>  
</property>  
<property>
<name>mapred.reduce.slowstart.completed.maps</name>
<value>1</value>
</property>
</configuration>  

yarn-site.xml

<configuration>  
<property>  
<name>yarn.resourcemanager.address</name>  
<value>mcw-cc-nachuang:9080</value>  
</property>  
<property>  
<name>yarn.resourcemanager.scheduler.address</name>  
<value>mcw-cc-nachuang:9081</value>  
</property>  
<property>  
<name>yarn.resourcemanager.resource-tracker.address</name>  
<value>mcw-cc-nachuang:9082</value>  
</property>  
<property>  
<name>yarn.nodemanager.aux-services</name>  
<value>mapreduce.shuffle</value>  
</property>  
<property>  
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>  
<value>org.apache.hadoop.mapred.ShuffleHandler</value>  
</property>  
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>6</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
</property>
</configuration>  

验证:

在hadoop-2.0.2-alpha目录下执行:

bin/hadoop namenode -format 

sbin/start-all.sh

jps 后显示出以下5个服务,说明启动成功:

19166 SecondaryNameNode

19566 NodeManager
20254 Jps
19321 ResourceManager
18610 NameNode
18850 DataNode

如果启动成功,可以通过浏览器访问HDFS:  http://mcw-cc-nachuang:50070

生成terasort的数据:

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.2-alpha.jar teragen 1000000 teradata/input100m

执行terasort:

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.2-alpha.jar terasort teradata/input100m teradata/output100m

注意,在hadoop2.0.3默认情况下执行terasort cases会报java.lang.OutOfMemoryError:Java heap space需要修改hadoop-env.sh 中的export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"(JVM默认配置的最大内存是128m),修改为256m。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值