首先吐槽一下Apache官网上的Hadoop二进制包居然是32位版本的,害我研究了半天的错误!
一、安装JDK
我安装了Oracle的HotSpot的JDK1.7,这里不能安装JRE,因为没有tools.jar
从官网下载JDK的RPM格式安装包
rpm -ivh xxxxx.rpm
安装完成后,默认的安装地址是在/usr/java/xxxx
配置环境变量
二、安装maven
从官网直接下载二进制包,解压之,放到/usr/local/maven中去
为了提高下载速度,配置OSChina的源
- <mirror>
- <id>nexus-osc</id>
- <mirrorOf>*</mirrorOf>
- <name>Nexusosc</name>
- <url>http://maven.oschina.net/content/groups/public/</url>
- </mirror>
- <profile>
- <id>jdk-1.7</id>
- <activation>
- <jdk>1.7</jdk>
- </activation>
- <repositories>
- <repository>
- <id>nexus</id>
- <name>local private nexus</name>
- <url>http://maven.oschina.net/content/groups/public/</url>
- <releases>
- <enabled>true</enabled>
- </releases>
- <snapshots>
- <enabled>false</enabled>
- </snapshots>
- </repository>
- </repositories>
- <pluginRepositories>
- <pluginRepository>
- <id>nexus</id>
- <name>local private nexus</name>
- <url>http://maven.oschina.net/content/groups/public/</url>
- <releases>
- <enabled>true</enabled>
- </releases>
- <snapshots>
- <enabled>false</enabled>
- </snapshots>
- </pluginRepository>
- </pluginRepositories>
- </profile>
三、安装protobuf
下载protobuf2.5.0,一定是要这个版本,别下高版本的,否则Hadoop编译失败,我吃过亏了
解压之,进行编译安装
./configure --prefix=/usr/local/protoc/
make
make install
安装完成后,配置环境变量
四、安装Hadoop
从Apache官网下载2.6的源码包,解压之
mvn package -Pdist,native -DskipTests -Dtar
在下载过程中,因为OSChina源不稳定,中间出了好几次网络连接错误,直接重新执行上面的命令即可
编译安装成功后,在hadoop-dist/target/hadoop-2.6.0目录中
cp -r hadoop-2.6.0 /usr/local/hadoop/
cd /usr/local/hadoop/
./bin/hadoop version
可以看到hadoop的版本信息
file lib//native/*
可以看到安装的是32位的还是64位的
配置环境变量
连带上面的一起,环境变量如下:
export JAVA_HOME=/usr/java/jdk1.7.0_71
export HADOOP_HOME=/usr/local/hadoop
export MAVEN_HOME=/usr/local/maven
export PROTOBUF_HOME=/usr/local/protoc
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$MAVEN_HOME/bin:$PROTOBUF_HOME/bin:
五、配置集群
配置各个节点的SSH无密码登陆
首先先配置各个节点的hostname,我这里的环境简单介绍一下
192.168.36.130 Master.Hadoop
192.168.36.131 Slave1.Hadoop
192.168.36.132 Slave2.Hadoop
在CentOS6.6下配置hostname的方法如下
vim /etc/sysconfig/network
vim /etc/hosts
hostname XXX.Hadoop
service network restart
另外再关闭防火墙
service iptables stop
chkconfig iptables off
好了,开始生成公钥和私钥
ssh-keygen -t rsa
一直回车
然后会看到/root/.ssh/这个目录下多了两个文件
在另外两个节点上也执行相同的操作,这里用root账户去执行
同时在这个目录下创建一个名为authorized_keys的文件
里面的内容就是三台节点的公钥的内容,同时复制到另外两个节点上的相同目录下
我的authorized_keys的内容可参考如下:
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA4mJOu/de6KmSI7XP9LdxOlldnDI1olDM7GalikiUK3zCSkvUdXCkql7I2b1FU1OopT2keiXZptNJ8DlJi/LCkfi/+zysOmX5ppl5D4Zm9aIzyx1JYUB0pKT5mmYLuCsuHok+rPub1kzwHsWtzoYqAPgmxqnlEtgqxZj+YcaJJp9C2rF9zTaD/1sip/AguCQ2vdQc+yQYc7K33rPZXArnBfNVankIU2o2DsqdovtMCnFPU87+57S3hfT50HyLxXEMiroFypYGTNm84v3gAoCB/IpS0BwPdtHun2YrYtGKTaW0EjgG2J8lYbDUSe1eFWNidWHiDYtvzYR6vORXvMOq6Q== root@6.6.1
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAtTFmf3Qpms17fsZuxYIFfTY1fGk9e1T7RJOMQIbV/nwBEiy0MDYkGwFhUi1ASWoxGnoPRt6soOE+tluaQOOfAY9HcdyS22ZHcxq4269VdTwZetANrhbI2F0LJgnS9B5D3wQPGqIMiujGria0J9iDpDhXDGWFK+RXzJDsKWTYfVeKVAiGzasebSKsyJKcxzBNzHV0AMKFPuy15DFtC+E82n1gMoPelp3iNpOBCIRzC1koeGvdPG9lu3Y22mpagn7JGw8ozt2j2tVZHl47sZ/rD0LvYK9DRwHFlzUp1h0A55SQwe6D/DVKTwdlKSLasYKlxgqV0ckNynptEvwu/KxoZQ== root@6.6.2
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAuZjZ+sWkr+9P6/NtEUzxWZyNPbJKAt0W18Cy0gUePFAbXQd9Rv3LngbCScbsNDM7Fsaao+gop87bk2BRsmN9QPzY8KevFMvN4UtysoqgFT7UUWGXRvizLH2EWKi056gu5rw493k9MDbDDFtT03v5PbKen23ILbZ/q2fKe7cyY6xRXNwxTsKm80EOqh4KrU40PkrcEkDL2BA8HGhwdsb7R6nPwcuFkKqIdVEKESHxrrLYApu6Iu5R3WJKGXJXqx7mHZnFOFkTw60BEOalONdg1XXedxCrIUtlbCGiz4xJ+mnCNPDOFoGte/E+WdyPYMqRYEk23E7xRx3a1lLBZ6FsdQ== root@6.6.3
最后测试一下
ssh Slave1.Hadoop
ssh Slave2.Hadoop
另外两台节点也都测试一下,如果可以直接连上,说明配置成功
下面开始配置hadoop的一些信息
cd /usr/local/hadoop/etc/hadoop
配置hadoop-env.sh和yarn-env.sh的JAVA_HOME
在yarn-env.sh中将HADOOP_PID_DIR改为hadoop的安装目录
配置core-site.xml
- [root@systdt hadoop]# cat core-site.xml
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/home/hadoop/tmp</value>
- <description>Abase for other temporary directories.</description>
- </property>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://Master.Hadoop:9000</value>
- </property>
- <property>
- <name>io.file.buffer.size</name>
- <value>4096</value>
- </property>
- </configuration>
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>file:///home/hadoop/dfs/name</value>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>file:///home/hadoop/dfs/data</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>2</value>
- </property>
- <property>
- <name>dfs.nameservices</name>
- <value>hadoop-cluster1</value>
- </property>
- <property>
- <name>dfs.namenode.secondary.http-address</name>
- <value>Master.Hadoop:50090</value>
- </property>
- <property>
- <name>dfs.webhdfs.enabled</name>
- <value>true</value>
- </property>
- </configuration>
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- <final>true</final>
- </property>
- <property>
- <name>mapreduce.jobtracker.http.address</name>
- <value>Master.Hadoop:50030</value>
- </property>
- <property>
- <name>mapreduce.jobhistory.address</name>
- <value>Master.Hadoop:10020</value>
- </property>
- <property>
- <name>mapreduce.jobhistory.webapp.address</name>
- <value>Master.Hadoop:19888</value>
- </property>
- <property>
- <name>mapred.job.tracker</name>
- <value>http://Master.Hadoop:9001</value>
- </property>
- </configuration>
- <?xml version="1.0"?>
- <configuration>
- <!-- Site specific YARN configuration properties -->
- <property>
- <name>yarn.resourcemanager.hostname</name>
- <value>Master.Hadoop</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- <property>
- <name>yarn.resourcemanager.address</name>
- <value>Master.Hadoop:8032</value>
- </property>
- <property>
- <name>yarn.resourcemanager.scheduler.address</name>
- <value>Master.Hadoop:8030</value>
- </property>
- <property>
- <name>yarn.resourcemanager.resource-tracker.address</name>
- <value>Master.Hadoop:8031</value>
- </property>
- <property>
- <name>yarn.resourcemanager.admin.address</name>
- <value>Master.Hadoop:8033</value>
- </property>
- <property>
- <name>yarn.resourcemanager.webapp.address</name>
- <value>192.168.36.130:8088</value>
- </property>
- </configuration>
六、启动
在Master.Hadoop节点上,执行/usr/local/hadoop/sbin/start-all.sh
输出以下信息:
[root@Master sbin]# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [Master.Hadoop]
Master.Hadoop: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-Master.Hadoop.out
Slave1.Hadoop: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-Slave1.Hadoop.out
Slave2.Hadoop: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-Slave2.Hadoop.out
Master.Hadoop: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-Master.Hadoop.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-Master.Hadoop.out
Slave2.Hadoop: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-Slave2.Hadoop.out
Slave1.Hadoop: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-Slave1.Hadoop.out
Master.Hadoop: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-Master.Hadoop.out
说明启动成功
查看集群状态:./bin/hdfs dfsadmin –report
查看文件块组成: ./bin/hdfsfsck / -files -blocks
查看各节点状态: http://192.168.36.130:50070
查看resourcemanager上cluster运行状态: http://192.168.36.130:8088
七、测试
echo "hello,world" >> file
hdfs dfs -mkdir /test
hdfs dfs -put file /test
hdfs dfs -cat /test/file
hdfs dfs -get /test/file file1
hdfs dfs -rm /test/file
hdfs dfs -ls /test
OK,正常工作了!结束!