难得周末,学习了一点Hadoop,作此笔记
实验环境:
Debian 6.0
VirtualBox 4.1.4
Haddop 0.21.0
Linux配置:
/etc/host
127.0.0.1 localhost 192.168.1.11 node1 192.168.1.12 node2 192.168.1.13 node3 192.168.1.14 node4
/etc/resolv.conf
nameserver 8.8.8.8 nameserver 8.8.4.4
/etc/hostname
node1 # node1 ~ node4
/etc/network/interfaces
auto lo iface lo inet loopback auto eth0 iface lo inet static address 192.168.1.11 # 11 ~ 14 netmask 255.255.255.0 network 192.168.1.0 broadcast 192.168.1.255 gateway 192.168.1.1
Hadoop环境:
NameNode:node1
JobTracker:node2
DataNode & TaskTracker:node3 & node4
# conf/slaves 192.168.1.13 192.168.1.14
在四台虚拟机中
在宿主机上(方便一点点)apt-get install ssh apt-get install rsync ssh-keygen #一路回车
scp root@192.168.1.11:/root/.ssh/id_rsa.pub node1_pub scp root@192.168.1.12:/root/.ssh/id_rsa.pub node2_pub cat node1_pub node2_pub >> authorized_keys scp authorized_keys root@192.168.1.13 scp authorized_keys root@192.168.1.14
仍然在宿主机上,下载解压hadoop-0.21.0.tar.gz,编辑配置文件
conf/hadoop-env.sh,更改JAVA_HOME就行来(适情更改)
export JAVA_HOME=/root/jdk1.6.0_29
conf/core-site.xml<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>http://192.168.1.11:5161/</value> </property> </configuration>
conf/hdfs-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.name.dir</name> <value>/root/hadoop-0.21.0/var/name</value> </property> <property> <name>dfs.data.dir</name> <value>/root/hadoop-0.21.0/var/data</value> </property> <property> <name>dfs.block.size</name> <value>134217728</value> </property> </configuration>
conf/mapred-site.xml<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.jobtracker.address</name> <value>http://192.168.1.12:5162/</value> </property> <property> <name>mapreduce.jobtracker.system.dir</name> <value>/root/hadoop-0.21.0/var/system</value> </property> <property> <name>mapreduce.cluster.local.dir</name> <value>/root/hadoop-0.21.0/var/local</value> </property> </configuration>
同时附上本人的/root/.bashrc,注意HADOOP_HOME,不然启动的时候可能会报错
PATH=$PATH:/root/jdk1.6.0_29/bin:/root/jdk1.6.0_29/jre/bin JAVA_HOME=/root/jdk1.6.0_29 JRE_HOME=/root/jdk1.6.0_29/jre CLASSPATH=.:/root/jdk1.6.0_29/lib/tools.jar:/root/jdk1.6.0_29/lib/dt.jar HADOOP_HOME=/root/hadoop-0.21.0 export PATH export JRE_HOME export JAVA_HOME export CLASSPATH export HADOOP_HOME
将配置好的Hadoop复制到到虚拟机
scp -r hadoop-0.21.0 root@192.168.1.11:/root/ scp -r hadoop-0.21.0 root@192.168.1.12:/root/ scp -r hadoop-0.21.0 root@192.168.1.13:/root/ scp -r hadoop-0.21.0 root@192.168.1.14:/root/
启动服务
在node1启动HDFS
bin/hadoop namenode -format #第一次 bin/start-dfs.sh
在node2 启动MapReduce
bin/start-mapred.sh
经验小结
1. 启动时卡住,查看log,发现抛出NameUnsolve之类(具体名字忘了)的异常,原因在于忘了在/etc/hosts为hostname做解析
2. Hadoop common not found,这主要是因为没有HADOOP_HOME这个环境变量
3. 其实没有必要在master上授权salve,基本不用,直接在Master上启动服务。