1. 安装环境决定了我的苦逼路程:
[root@dev-47 hadoop]# cat /etc/issue
CentOS release 6.4 (Final)
Kernel \r on an \m
[root@dev-47 hadoop]# java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
hadoop官方只提供32位的安装包,在使用到nativelib时就会出错,所以需要自己下载然后进行编译。好在官网是由相关文档的,只是存的位置太尼玛偏了,在src目录下的building.txt。(哦,对了,编译最好在运行的机器上进行)
2. 编译步骤
a) Wget http://psg.mtu.edu/pub/apache/hadoop/common/stable/hadoop-2.2.0-src.tar.gz
b) Wget http://apache.dataguru.cn/maven/maven-3/3.1.1/binaries/apache-maven-3.1.1-bin.tar.gz
c) 安装编译需要的一些工具:
yum install gcc gcc-c++ cmake ncurses-devel openssl-devel
d) Wget http://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
这个需要加压后使用三板斧,进行安装 ./configure && make && make install(中间还可能缺啥工具,自行解决,可能有autoconf, libtool啥的)
e) 进行编译,可以去聊天,预计20分钟左右。如果配了maven库的话,可能快些。
mvn package –Pdist,native –DskipTests –Dtar (主要就是native)
3. 安装步骤
a) 解压
b) Core-site.xml
<configuration>
<!--- global properties -->
<!-- 指定hdfs的nameservice为ns1 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://jkdcluster</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/tmp</value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>dev-44:2181,dev-45:2181,dev-46:2181</value>
</property>
</configuration>
c) Hdfs-site.xml
<configuration>
<property>
<name>dfs.nameservices</name>
<value>jkbcluster</value>
</property>
<property>
<name>dfs.ha.namenodes.jkbcluster</name>
<value>nn44,nn45</value>
</property>
<property>
<name>dfs.namenode.rpc-address.jkbcluster.nn44</name>
<value>dev-44:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.jkbcluster.nn44</name>
<value>dev-44:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.jkbcluster.nn45</name>
<value>dev-45:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.jkbcluster.nn45</name>
<value>dev-45:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://dev-44:8485;dev-45:8485;dev-46:8485/jkbcluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/root/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.jkbcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
</configuration>
d) Yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>dev-44</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
e) Mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
f) Slaves
dev-44
dev-45
dev-46
dev-47
dev-48
g) scp分发到各个节点
h) 环境变量
# Hadoop
export HADOOP_HOME=/root/hadoop
export HADOOP_PREFIX="$HADOOP_HOME/"
export YARN_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME="$HADOOP_HOME"
export HADOOP_COMMON_HOME="$HADOOP_HOME"
export HADOOP_HDFS_HOME="$HADOOP_HOME"
export HADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop/"
export YARN_CONF_DIR=$HADOOP_CONF_DIR
export PATH="$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH"
i) 44上执行:hdfs namenode –format, 把生成的元数据文件夹,这里是tmp, copy到45(standby nn)上。
k) 44上start-all.sh
i) Hdfsdfs –ls /,进行测试
4. 错误集锦
a) hadoop中不太支持下划线,所以主机名之类的东西最好不要有下滑线,格式化报错如:Does not contain a valid host:port authority:dev_44:8082
b) Hadoop journalnode线程和zookeeper关系不大,是属于Hadoop的东西。要在配置了的每台机器上都启动,在format的时候才能成功访问它的8485端口,否则报错
c) 运行Hadoop命令时,总会报出如下警告:an attempt to override final parameter.按照字面意思来看,是覆盖了final变量,看来不能像1.x那样直接copy默认的配置文件了。