Hadoop 2.2.0安装
一、Master节点
1.1 前期准备
1) /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=master
GATEWAY=192.168.178.254
2) /etc/hosts
127.0.0.1 localhostlocalhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6localhost6.localdomain6
192.168.178.181 master
192.168.178.182 slave1
192.168.178.183 slave2
192.168.178.184 slave3
3) 安装vsftpd(需要联网)
1) yum -y install vsftpd
2) touch /var/log/vsftpd.log
3) chkconfig --list |grep vsftpd
4) chkconfig vsftpd on
5) vim /etc/vsftpd/vsftpd.conf 修改见虾皮hadoop 安装
6) getsebool -a | grep ftp 配置防火墙
7) setsebool -P ftp_home_dir 1
8) setsebool -Pallow_ftpd_full_access 1
9) servicevsftpd restart
4) 关闭iptables(防火墙)
输入:chkconfig iptables off
1.2 无密钥登陆
1) SSH无密钥设置
a) (hadoop用户下)ssh-keygen -t rsa -P ''
b) cat ~/.ssh/id_rsa.pub >>~/.ssh/authorized_keys
c) chmod 600~/.ssh/authorized_keys
d) vim /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
e) service sshd restart
f) 验证 ssh localhost
2) 配置master无密钥登陆slave
a) scp ~/.ssh/id_rsa.pubhadoop@slave1:~/
b) (hadoop@slave) cat ~/id_rsa.pub >>~/.ssh/authorized_keys
c) chmod 600~/.ssh/authorized_keys
d) vim /etc/ssh/sshd_config
RSAAuthentication yes
PubkeyAuthentication yes
AuthorizedKeysFile .ssh/authorized_keys
e) service sshd restart
3) 配置slave无密钥登陆master
a) (hadoop@slave) ssh-keygen -t rsa -P ''
b) scp ~/.ssh/id_rsa.pub hadoop@master:~/
c) (hadoop@master) cat ~/id_rsa.pub >>~/.ssh/authorized_keys
1.3 JDK安装
1) 卸载之前的JDK:
a) rpm-qa | grep java
b) yum -y remove javajava-1.7.0-openjdk-1.7.0.9-2.3.4.1.el6_3.x86_64
2) 安装JDK(rpm格式)
a) rpm -ivh jdk-7u51-linux-x64.rpm
b) vim /etc/profile
#set java environment
export JAVA_HOME=/usr/java/jdk1.7.0_51
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=$JAVA_HOME/lib/*.jar:$JAVA_HOME/jre/lib/*.jar
# set hadoop path
export HADOOP_HOME=/usr/hadoop
export PAHT=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_LOG_DIR=/usr/hadoop/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
1.4 安装hadoop 2.2.0
1) mv hadoop-2.2.0.tar.gz /usr/
2) cd /usr
3) tar -zxvf hadoop-2.2.0.tar.gz
4) mv hadoop-2.2.0 hadoop
5) chown -R hadoop:hadoop hadoop
6) rm -rf hadoop-2.2.0.tar.gz
7) mkdir /usr/hadoop/tmp
8) mkdir /usr/hadoop/dfs
9) mkdir /usr/hadoop/dfs/name
10) vim /etc/profile 配置hadoop路径(见1.3->2)->b) )
11) 配置文件见: http://download.csdn.net/detail/zzzzzqf/7019251
1.5 hadoop时间同步
1) 服务器端
a) 安装ntp
b) yum -y install ntp
2) 需要被同步的slave
a) (root) /usr/sbin/ntpdate master(手动)
b) (自动) # vi /var/spool/cron/root
0 1 * * * /usr/sbin/ntpdate master
详见:http://cyr520.blog.51cto.com/714067/746905
二、Slave节点
2.1 前期准备
1) /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=slave1
2) /etc/hosts
127.0.0.1 localhostlocalhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6localhost6.localdomain6
192.168.178.181 master
192.168.178.182 slave1
192.168.178.183 slave2
192.168.178.184 slave3
三、Hadoop配置
首先在<hadoop目录>下:
mkdirtemp temp名字要和core-site.xml中的hadoop.tmp.dir属性值的名字一样
mkdir -p dfs/name
mkdir -p dfs/data
一共要修改7个配置文件:slaves,yarn-env.sh,hadoop-env.sh,yarn-site.xml,mapred-site.xml,hdfs-site.xml,core-site.xml。
3.1 slaves
slave1
slave2
slave3
3.2 yarn-env.sh
最后添加:
# set java environment
export JAVA_HOME=/usr/java/jdk1.7.0_51
3.3 hadoop-env.sh
最后添加:
# set java environment
export JAVA_HOME=/usr/java/jdk1.7.0_51
3.4 core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop-2.2.0/temp</value>
<description>Abasefor other temporary directories.</description>
</property>
</configuration>
3.5 yarn-site.xml
<configuration>
<!-- Site specific YARN configurationproperties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
3.6 mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
3.7 hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop-2.2.0/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop-2.2.0/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
四、测试hadoop
4.1 相关指令
4.1.1 网页
1) 在master节点上,直接输入master:50070, 这是查看HDFS状态
2) master:8088,这是hadoop的资源管理器
3) 在其他windows系统中,在C:\Windows\System32\drivers\etc\hosts中,加入
192.168.178.181master
192.168.178.182slave1
192.168.178.183slave2
192.168.178.184slave3
可以解决常见错误:BROWSE THE FILESYSTEM链接打不开
4.1.2 其他指令
1) 退出安全模式:./bin/hdfs dfsadmin -safemode leave
2) 往HDFS上写文件./bin/hadoop dfs –mkdir /input
3) 查看HDFS根目录 ./bin/hadoop dfs -ls /
4) ./bin/hdfs dfs -copyFromLocal/usr/hadoop/input/qing.txt /input
5) 测试程序:
a) ./bin/hadoop jar./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriter input
6) watch -n 1 "/sbin/ifconfigeth0 | grep bytes" 查看实时网络流量
4.2 测试
4.2.1 WordCount
1) 创建hdfs目录:
./bin/hdfs dfs -mkdir /input
2) 上传文件
./bin/hdfs dfs -copyFromLocal /usr/hadoop/input/qing.txt/input
./bin/hdfs dfs -copyFromLocal /usr/hadoop/input/feng.txt /input
3) 执行:
./bin/hadoop jar /usr/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.2.0-sources.jarorg.apache.hadoop.examples.WordCount /input /output
4) 查看结果:
./bin/hdfs dfs -cat /output/part-r-00000
详见:http://blog.csdn.net/bamuta/article/details/14226243
4.2.2 计算pi
1) ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 2
2) pi后面的两个参数分别是map数和reduce数
4.2.3 随机存入数据
1) ./bin/hadoop jarshare/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar randomwriterrandom-data
帮助知识
用来debug显示错误信息:
export HADOOP_ROOT_LOGGER=DEBUG,console