安装环境:
Hyper-V 2008 R2,RHEL5.5,Hadoop2.2
计划:
搭建一主两从集群,实现分布式存储和mapreduce计算:
主(namenode+resourcemanager):16.158.49.120,h1.dssdev
从1(datanode+nodemanager):16.158.49.121,h2.dssdev
从2(datanode+nodemanager):16.158.49.123,h3.dssdev
步骤:
- 设置代理服务器:
vim /etc/profile
>http_proxy=proxy.houston.hp.com:8080
>https_proxy=proxy.houston.hp.com:8080
>ftp_proxy=proxy.houston.hp.com:8080
>no_proxy=127.0.0.1,localhost
>export http_proxy https_proxy ftp_proxy no_proxy
source /etc/profile
- 在h1.dssdev上新建账户:hadoop:
useradd hadoop
passwd hadoop
- 修改hosts文件(否则会出现hadoop metrics.MetricsUtil: Unable to obtain hostName
vim /etc/hosts
>16.158.49.120 h1.dssdev h1
- 卸载系统gcc的jdk,并安装官方jdk
查看自带的jdk::
rpm -qa | grep gcj
看到如下信息:
libgcj-4.1.2-44.el5
java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
使用rpm -e --nodeps命令删除上面查找的内容:
rpm -e -–nodeps java-1.4.2-gcj-compat-1.4.2.0-40jpp.115
下载jdk1.6,并安装(不要安装jre,jre没有jps工具)
- 下载hadoop2.2.0二进制版本
http://www.apache.org/dyn/closer.cgi/hadoop/common/
修改tarball权限:
sudo chown hadoop:hadoop hadoop-2.2.0.tar.gz
sudo chmod 775 hadoop-2.2.0.tar.gz
解压至/usr/local/hadoop,并设置环境变量:
vim /etc/profile
>export JAVA_HOME=/usr/java/jdk1.6.0_45
>export HADOOP_HOME=/usr/local/hadoop/hadoop-2.2.0
>export HADOOP_DEV_HOME=$HADOOP_HOME
>export HADOOP_MAPARED_HOME=${HADOOP_DEV_HOME}
>export HADOOP_COMMON_HOME=${HADOOP_DEV_HOME}
>export HADOOP_HDFS_HOME=${HADOOP_DEV_HOME}
>export HADOOP_PREFIX=${HADOOP_DEV_HOME}
>export YARN_HOME=${HADOOP_DEV_HOME}
>export HADOOP_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
>export HDFS_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
>export YARN_CONF_DIR=${HADOOP_DEV_HOME}/etc/hadoop
>export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
>export >PATH=$PATH:$JAVA_HOME/bin:$HADOOP_DEV_HOME/bin:$HADOOP_DEV_HOME/sbin
source /etc/profile
- 配置hadoop节点
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://h1.dssdev:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>16.158.49.120</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/hadoop/hdfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.dataname.data.dir</name>
<value>file:/usr/local/hadoop/hdfs/data</value>
<final>true</final>
</property>
<property>
<name>dfs.federation.nameservice.id</name>
<value>ns1</value>
</property>
<property>
<name>dfs.namenode.backup.address.ns1</name>
<value>16.158.49.120:50100</value>
</property>
<property>
<name>dfs.namenode.backup.http-address.ns1</name>
<value>16.158.49.120:50105</value>
</property>
<property>
<name>dfs.federation.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1</name>
<value>16.158.49.120:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns2</name>
<value>16.158.49.120:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1</name>
<value>16.158.49.120:23001</value>
</property>
<property>
<name>dfs.namenode.http-address.ns2</name>
<value>16.158.49.120:13001</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>16.158.49.120:23002</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>16.158.49.120:23002</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns1</name>
<value>16.158.49.120:23003</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address.ns2</name>
<value>16.158.49.120:23003</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>h1.dssdev:9001</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.address</name>
<value>16.158.49.120:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>16.158.49.120:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>16.158.49.120:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>16.158.49.120:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>16.158.49.120:18141</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
master:
16.158.49.120
slaves:
16.158.49.121
16.158.49.123
~~~~~~ 以上步骤在其他节点各重复操作一遍 ~~~~~~
- 设置master对slave的无密码ssh链接
首先在h1.dssdev上生成一个密钥对,包括一个公钥和一个私钥,并将公钥复制到所有的slave(h2.dssdev&h3.dssdev) 上,然后当master通过SSH连接slave时,slave就会生成一个随机数并用master的公 钥对随机数进行加密,并发送给master。最后master收到加密数之后再用私钥解密,并将解密数回传给slave,slave确认解密数无误之后就允许master不输入密码进行连接了。
1、执行命令ssh-keygen -t rsa之后一路回车,查看刚生成的无密码钥对:cd .ssh后执行ll
2、把id_rsa.pub追加到授权的key里面去。执行命令cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
3、修改权限:执行chmod 600 ~/.ssh/authorized_keys
4、确保cat /etc/ssh/sshd_config中存在如下内容
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
如需修改,则在修改后执行重启SSH服务命令使其生效:service sshd restart
5、将公钥复制到所有的 slave 机器上 :scp ~/.ssh/id_rsa.pub 192.168.1.203 : ~/ 然后 输入 yes ,最后 输入 slave 机器的密 码
6、在slave机器上创建.ssh 文件夹:mkdir ~/.ssh 然后执行chmod 700 ~/.ssh(若文件夹以存在则不需要创建)
7、追加到授权文件authorized_keys执行命令:cat ~/id_rsa.pub >> ~/.ssh/authorized_keys然后执行chmod 600 ~/.ssh/authorized_keys
8、重复第4步
9、验证命令:在master机器上执行ssh 192.168.1.203发现主机名由hadoop1变成hadoop3即成功,最后删除id_rsa.pub文件:rm -r id_rsa.pub
- 启动集群:
在master上执行:
cd /usr/local/hadoop/hadoop-2.2.0/sbin
./start-dfs.sh
或
./hadoop-deamon.sh start namenode
./hadoop-daemons.sh start datanode
./start-yarn.sh
或
./yarn-daemon.sh start resourcemanager
./yarn-daemons.sh start nodemanager
./mr-jobhistory-daemon.sh start historyserver
在master上执行jps:
29043 JobHistoryServer
2902 Jps
28625 NameNode
28761 ResourceManager
2869 NodeManager
24817 Jps
2710 DataNode
- 验证
mkdir -p /usr/local/hadoop/hadoop-2.2.0/input
cat > input/file.txt
This is one line
This is another one
cd bin
./hdfs dfs -mkdir user/input
./hdfs dfs -copyFromLocal /usr/local/hadoop/hadoop-2.2.0/input/file.txt /user/input/
./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar grep /user/input /user/output 'i'
./hdfs dfs -cat /user/output/*
- web interface
1. http://master:50070/dfshealth.jsp
2. http://master:8088/cluster
3. http://master:19888/jobhistory