Hadoop版本: 2.0.2(Yarn)
添加一个用户
sudo addgroup hadoop
sudo adduser -ingroup hadoop hadoop
sudo vim /etc/sudoers
添加 hadoop ALL=(ALL:ALL) ALL
设置环境变量
vim /etc/profile
export JAVA_HOME=/usr/lib/jvm/java-7-sun
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export HADOOP_PREFIX=/home/hadoop/hadoop-2.0.2-alpha
export HADOOP_HOME=$HADOOP_PREFIX
修改hosts
vim /etc/hosts
192.168.0.1 mcw-cc-nachuang
设置SSH无密码登录:
1.安装openssh-server
sudo apt-get install openssh-server
2. ssh-keygen -t rsa -P ""
进入~/.ssh/目录,将id_rsa.pub追加到authorized_keys授权文件中:
cat id_rsa.pub >> authorized_keys
然后,将id_rsa.pub 再导入其他的slave中
slave也要做同样的操作,使slave 和 master之间可以相互无密码登录
3.关闭防火墙 sudo ufw disable (可选,不过对于初学者建议关掉)
配置完后通过ssh验证各个slaves和master相互都可以无密码登录。
etc/hadoop/配置文件:
1.hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-sun
2.core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://mcw-cc-nachuang:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/tmp</value>
</property>
</configuration>
3.hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.block.size</name>
<value>268435456</value>
</property>
</configuration>
4.mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.job.tracker</name>
<value>hdfs://mcw-cc-nachuang:9001</value>
<final>true</final>
</property>
<property>
<name>mapred.job.map.memory.mb</name>
<value>1700</value>
</property>
<property>
<name>mapred.job.reduce.memory.mb</name>
<value>1700</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx1400m</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>400</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>10</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>file:/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/mapred/system</value>
<final>true</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>file:/home/nachuang/Workspace/Hadoop/hadoop-2.0.2-alpha/mapred/local</value>
<final>true</final>
</property>
<property>
<name>mapred.reduce.slowstart.completed.maps</name>
<value>1</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>mcw-cc-nachuang:9080</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>mcw-cc-nachuang:9081</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>mcw-cc-nachuang:9082</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>6</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>10240</value>
</property>
</configuration>
验证:
在hadoop-2.0.2-alpha目录下执行:
bin/hadoop namenode -format
sbin/start-all.sh
jps 后显示出以下5个服务,说明启动成功:
19166 SecondaryNameNode
19566 NodeManager
20254 Jps
19321 ResourceManager
18610 NameNode
18850 DataNode
如果启动成功,可以通过浏览器访问HDFS: http://mcw-cc-nachuang:50070
生成terasort的数据:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.2-alpha.jar teragen 1000000 teradata/input100m
执行terasort:
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.0.2-alpha.jar terasort teradata/input100m teradata/output100m
注意,在hadoop2.0.3默认情况下执行terasort cases会报java.lang.OutOfMemoryError:Java heap space需要修改hadoop-env.sh 中的export HADOOP_CLIENT_OPTS="-Xmx128m $HADOOP_CLIENT_OPTS"(JVM默认配置的最大内存是128m),修改为256m。