My Hadoop: Hadoop 0.23 setup

1 Download 

choose a mirror http://www.apache.org/dyn/closer.cgi/hadoop/core/

download from renren for 0.23 version: hadoop-0.23.0.tar.gz 

1.1 untar 

tar zxfv hadoop-0.23.0.tar.gz

2 Run first hadoop program (locally)

2.1 compute pi

bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar pi -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -libjars modules/hadoop-mapreduce-client-jobclient-0.23.0.jar 16 10000


Job Finished in 6.014 seconds
Estimated value of Pi is 3.14127500000000000000

2.2 word count

bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar wordcount -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -libjars modules/hadoop-mapreduce-client-jobclient-0.23.0.jar LICENSE.txt output

Result is in the output dir


congratulations, you get the first MapReduce program.

While we know Hadoop is used in parallel/distributed computing, so next let's configure it one by one.

3 Setup the first node (master)

3.1 SSH

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys


id_dsa.pub is the public key of localhost

authorized_keys contains all the public keys trusted in current hosts.

Import the localhost public key into authorized_keys, then you can ssh localhost in passphraseless.


Similarly, you can cat id_dsa.pub to other hosts authorized_keys file. Then you can ssh to other hosts in passphraseless.

3.2 Config HDFS

etc/hadoop/core-site.xml (Default is here)

<configuration>
     <property>
         <name>fs.defaultFS</name>
         <value>hdfs://172.16.100.122:9000</value>
     </property>
</configuration>

etc/hadoop/hdfs-site.xml (Default is here)

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/tntuser/hadoop-0.23.0/data/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/tntuser/hadoop-0.23.0/data/hdfs/datanode</value>
</property>
</configuration>

a full URI is needed for the name dir and data dir.

3.3 Format HDFS

mkdir data/hdfs/namenode
mkdir data/hdfs/datanode
bin/hdfs namenode -format

3.4 Start HDFS

sbin/hadoop-daemon.sh start|stop namenode
sbin/hadoop-daemon.sh start|stop datanode


Check

JPS should show NameNode, DataNode


Run several HDFS command

bin/hadoop fs -ls

bin/hadoop fs -mkdir test

bin/hadoop fs -rm -r test

3.5 Config MapReduce

etc/hadoop/mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet href="configuration.xsl"?>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

conf/yarn-site.xml

<?xml version="1.0"?>
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>172.16.100.122:8025</value>
</property>

<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>172.16.100.122:8030</value>
</property>

<property>
<name>yarn.resourcemanager.address</name>
<value>172.16.100.122:8040</value>
</property>
</configuration>

conf/yarn-env.sh

export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$YARN_HOME/etc/hadoop}"
export HADOOP_COMMON_HOME="${HADOOP_COMMON_HOME:-$YARN_HOME}"
export HADOOP_HDFS_HOME="${HADOOP_HDFS_HOME:-$YARN_HOME}"


The conf directory that comes with Hadoop is no longer the default configuration directory. Rather, Hadoop looks in etc/hadoop for configuration files.

sbin/hadoop-daemon.sh call hdfs-config.sh, hdfs-config.sh calls hadoop-config.sh in $HADOOP_COMMON_HOME/libexec/hadoop-config.sh

3.6 Start MapReduce (YARN) Daemon

bin/yarn-daemon.sh start resourcemanager
bin/yarn-daemon.sh start nodemanager
bin/yarn-daemon.sh start historyserver

NodeManage may be fail because of 8080 is used by Tomcat

conf/yarn-env.sh

<property>
  <name>mapreduce.shuffle.port</name>
  <value>8090</value>
</property>

4 Run the hadoop program in single node

MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.



See the detail, the task is executed by node.



NameNode http://nn_host:port/ Default HTTP port is 50070,browser HDFS and hdfs nodes

ResourceManager http://rm_host:port/ Default HTTP port is 8088, browser map-reduce nodes

5 Setup the slave node

5.1 untar on the slave

5.2 copy config from master

scp 172.16.100.122:/home/tntuser/hadoop-0.23.0/etc/hadoop/*.xml etc/hadoop

scp 172.16.100.122:/home/tntuser/hadoop-0.23.0/conf/yarn-* conf

5.3 (re) format hdfs on master 

shutdown daemons on master first

bin/hdfs namenode -format -clusterid hadoop_cluster

5.4 add slave hosts

conf/slave

       172.16.100.122 //master

       172.16.100.130

5.5 Start Master Daemons

sbin/hadoop-daemon.sh start|stop namenode
sbin/hadoop-daemon.sh start|stop datanode

bin/yarn-daemon.sh start resourcemanager
bin/yarn-daemon.sh start nodemanager
bin/yarn-daemon.sh start historyserver

5.6 Start Slave Daemons

sbin/hadoop-daemon.sh start|stop datanode

bin/yarn-daemon.sh start nodemanager

6 Run the hadoop program in cluster

issue 1: temp directory already exists


 hdfs://172.16.100.122:9000/user/tntuser/QuasiMonteCarlo_TMP_3_141592654 already exists.  Please remove it first.

bin/hadoop fs -rm -r QuasiMonteCarlo_TMP_3_141592654


issues 2: 

java.io.FileNotFoundException: File does not exist: hdfs://172.16.100.122:9000/user/tntuser/QuasiMonteCarlo_TMP_3_141592654/out/reduce-out
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:764)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1614)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1638)
	at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
	at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:351)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
	at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:360)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:4

1)  dns config /etc/resolve.conf, make sure the dns nameserver is right

2) add master/slave hostname to each others  /etc/hosts 

172.16.100.122          dev122
172.16.100.130          dev130

3) check the hadoop slaves config file conf/slaves, make sure the hostname or ip is right

Reference

http://www.crobak.org/2011/12/getting-started-with-apache-hadoop-0-23-0/

http://www.cloudera.com/blog/2011/11/building-and-deploying-mr2/

http://www.rpark.com/2011/05/building-hadoop-cluster.html


http://hadoop.apache.org/common/docs/current/single_node_setup.html

http://hadoop.apache.org/common/docs/current/cluster_setup.html

http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/SingleCluster.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

FireCoder

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值