结束了一年多的paas开发,转战大数据,备份一些安装文档。
集群示例
hadoop-001 10.168.204.55 NameNode,secondaryNameNode,ResourceManager
hadoop-002 10.168.204.56 DataNode,NodeManager
hadoop-003 10.168.204.57 DataNode,NodeManager
hadoop-004 10.168.204.58 DataNode,NodeManager
hadoop版本:CDH4.4.0
centos版本:6.3
一、准备
1. jdk 1.7
http://download.oracle.com/otn-pub/java/jdk/7u45-b18/jdk-7u45-linux-x64.rpm
sudo rpm -ivh jdk-7u45-linux-x64.rpm
alternatives --install /usr/bin/java java /usr/java/jdk1.7.0_45/bin/java 300
alternatives --install /usr/bin/javac javac /usr/java/jdk1.7.0_45/bin/javac 300
alternatives --config java
2. 修改hostname
vim /etc/sysconfig/network #修改每个服务器的hostname,重启生效
配置/etc/hosts
192.168.204.55 hadoop-001
192.168.204.56 hadoop-002
192.168.204.57 hadoop-003
192.168.204.58 hadoop-004
3. 防火墙关闭
service iptables status
service iptables stop
chkconfig iptables stop
4. selinux disabled
#修改为disable
vim /etc/selinux/config
5. 创建hadoop用户,配置为sudoer
adduser hadoop
passwd hadoop
sudo vim /etc/sudoers
6. ssh without passwd
#切换至hadoop用户
ssh-keygen -t rsa
cat id_rsa.pub >> authorized_keys
chmod 600 authorized_keys
将authorized_keys scp 至其它slaves服务器上。
二、安装
1. 下载CDH4.4 tar
mkdir cdh4.4.0
wget http://archive.cloudera.com/cdh4/cdh/4/hadoop-2.0.0-cdh4.4.0.tar.gz
tar -xvzf hadoop-2.0.0-cdh4.4.0.tar.gz
2. 设置环境变量
修改/etc/profile或 ~/.bashrc,这里改的是bashrc,都一样。
export JAVA_HOME=/usr/java/jdk1.7.0_45
export HADOOP_HOME=/home/hadoop/cdh4.4.0/hadoop-2.0.0-cdh4.4.0
export HADOOP_COMMOM_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LIB=$HADOOP_HOME/lib
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
export PATH=$PATH:/etc/haproxy/sbin/:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar:$HADOOP_LIB/native/libhadoop.so
libhadoop.so其实是后面安装impala时要用到。
3. 配置文件设置
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-001:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
<property>
<name>fs.trash.checkpoint.interval</name>
<value>10080</value>
</property>
<!-- <property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.SnappyCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>-->
<!-- OOZIE -->
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>hadoop-001</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hadoop</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- <property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hadoop/name</value>
<final>ture</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hadoop/data</value>
<final>ture</final>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-001:50070</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>hadoop-001:50090</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!--for impala
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
<name>dfs.client.file-block-storage-locations.timeout</name>
<value>3000</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>-->
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-001:18025</value>
</property>
<property>
<name>yarn.resourcemanager.address </name>
<value>hadoop-001:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address </name>
<value>hadoop-001:18030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address </name>
<value>hadoop-001:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address </name>
<value>hadoop-001:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*</value>
</property>
</configuration>
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-001:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-001:19888</value>
</property>
<property>
<name>mapreduce.job.tracker</name>
<value>hadoop-001:8021</value>
<final>ture</final>
</property>
<property>
<name>mapred.system.dir</name>
<value>file:/hadoop/mapred/system</value>
<final>ture</final>
</property>
<property>
<name>mapred.local.dir</name>
<value>file:/hadoop/mapred/local</value>
<final>ture</final>
</property>
<property>
<name>mapred.child.env</name>
<value>LD_LIBRARY_PATH=/usr/local/lib</value>
</property>
<!--<property>
<name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.map.output.compress.codec</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>-->
</configuration>
4. 准备hdfs的文件路径
/hadoop/tmp
/hadoop/mapred/system
/hadoop/mapred/local
/hadoop/name
/hadoop/data
sudo chown hadoop:hadoop -R /hadoop
5. 将 CDH4.4 scp至slaves节点
scp -r cdh4.4.0/ hadoop-002:~/.
scp -r cdh4.4.0/ hadoop-003:~/.
scp -r cdh4.4.0/ hadoop-004:~/.
三、启动
1. 格式化文件系统
#在hadoop-001 master节点上
cd cdh4.4.0/hadoop-2.0.0-cdh4.4.0/bin
hadoop namenode -format
2. 启动
cd cdh4.4.0/hadoop-2.0.0-cdh4.4.0/sbin
./start-all.sh
jps一下,看有没有相应的进程。
四、遇到的问题
微博:http://weibo.com/kingjames3