Step 8:安装CDH5
a、下载rpm安装包
1、进入下载目录,/usr/tool/:
2、执行下载:
wget http://archive.cloudera.com/cdh5/one-click-install/redhat/6/x86_64/cloudera-cdh-5-0.x86_64.rpm --如果Linux版本是CentOS 5.x,则将红色字体部分改成5,下同
3、禁用GPG签名检查,并安装本地软件包:
yum --nogpgcheck localinstall cloudera-cdh-5-0.x86_64.rpm
4、添加cloudera仓库验证:
rpm --import http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
b、安装hadoop插件包
1、master上安装namenode、resourcemanager、nodemanager、datanode、mapreduce、historyserver、proxyserver和hadoop-client:
yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-hdfs-namenode hadoop-yarn-resourcemanager hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce hadoop-mapreduce-historyserver hadoop-yarn-proxyserver -y
2、slave1和slave2上安装yarn、nodemanager、datanode、mapreduce和hadoop-client:
yum install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-yarn hadoop-hdfs-datanode hadoop-yarn-nodemanager hadoop-mapreduce -y
3、安装httpfs:
yum install hadoop-httpfs -y
4、安装Secondary NameNode(可选):
选择一台机器作为Secondary NameNode,安装SecondaryNamenode
yum install hadoop-hdfs-secondarynamenode -y
在/etc/hadoop/conf/hdfs-site.xml中添加以下配置:
<property>
<name>dfs.namenode.checkpoint.check.period</name>
<value>60</value>
</property>
<property>
<name>dfs.namenode.checkpoint.txns</name>
<value>1000000</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///data/cache1/dfs/namesecondary</value>
</property>
<property>
<name>file:///data/cache1/dfs/namesecondary</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.namenode.num.checkpoints.retained</name>
<value>2</value>
</property>
<!-- 将slave1设置成SecondaryNameNode -->
<property>
<name>dfs.secondary.http.address</name>
<value>slave1:50090</value>
</property>
详细配置可参考:http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
如果要设置多个Secondary Namenode,可参考:http://blog.cloudera.com/blog/2009/02/multi-host-secondarynamenode-configuration/
Step 9:创建目录
a、在master上创建目录:
mkdir -p /data/cache1/dfs/nn
chown -R hdfs:hadoop /data/cache1/dfs/nn
chmod 700 /data/cache1/dfs/nn
hdfs dfs -mkdir -p /user/hadoop/{done,tmp}
b、在slave1&slave2上创建目录:
mkdir -p /data/cache1/dfs/dn
mkdir -p /data/cache1/dfs/mapred/local
chown -R hdfs:hadoop /data/cache1/dfs/dn
usermod -a -G mapred hadoop
chown -R mapred:hadoop /data/cache1/dfs/mapred/local
c、在HDFS上创建:(此配置需在hadoop集群环境搭建完成并启动后执行)
hdfs dfs -mkdir -p /user/hadoop/{done,tmp}
sudo -u hdfs hadoop fs -chown mapred:hadoop /user/hadoop/*
hdfs dfs -mkdir -p /var/log/hadoop-yarn/apps
sudo -u hdfs hadoop fs -chown hadoop:hdfs /var/log/hadoop-yarn/apps
hdfs dfs -mkdir -p /user/hive/warehouse
sudo -u hdfs hadoop fs -chown hive /user/hive/warehouse
sudo -u hdfs hadoop fs -chmod 1777 /user/hive/warehouse
hdfs dfs -mkdir /tmp/hive
sudo -u hdfs hadoop fs -chmod 777 /tmp/hive
Step 10:配置环境变量
a、编辑/etc/profile,在里面添加如下环境变量:
export HADOOP_HOME=/usr/lib/hadoop
export HIVE_HOME=/usr/lib/hive
export HBASE_HOME=/usr/lib/hbase
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_MAPRED_HOME=/usr/lib/hadoop-mapreduce
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=/usr/lib/hadoop-hdfs
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HDFS_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_YARN_HOME=/usr/lib/hadoop-yarn
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin:$PATH
b、执行以下命令生效:
source /etc/profile
Step 11:修改hadoop配置文件:
a、配置文件说明:
配置文件 | 类型 | 说明 |
hadoop-env.sh | Bash脚本 | Hadoop运行环境变量设置 |
core-site.xml | xml | 配置Hadoop core,如IO |
hdfs-site.xml | xml | 配置HDFS守护进程:NN、JN、DN |
yarn-env.sh | Bash脚本 | Yarn运行环境变量设置 |
yarn-site.xml | xml | Yarn框架配置环境 |
mapred-site.xml | xml | MR属性设置 |
capacity-scheduler.xml | xml | Yarn调度属性设置 |
container-executor.cfg | cfg | Yarn Container配置 |
mapred-queues.xml | xml | MR队列设置 |
hadoop-metrics.properties | Java属性 | Hadoop Metrics配置 |
hadoop-metrics2.properties | Java属性 | Hadoop Metrics配置 |
slaves | Plain Text | DN节点配置 |
exclude | Plain Text | 移除DN节点配置文件 |
log4j.properties | Java属性 | 系统日志设置 |
configuration.xsl |
|
|
b、修改master机器上的配置文件,然后scp到各个slave的对应目录:
/etc/hadoop/conf/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>master</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>hdfs</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.mapred.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>httpfs-host.foo.com</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
/etc/hadoop/conf/hdfs-site.xml
<property> <name>dfs.namenode.name.dir</name> <value>/data/cache1/dfs/nn/</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/data/cache1/dfs/dn/</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.permissions.superusergroup</name> <value>hdfs</value> </property>
/etc/hadoop/conf/mapred-site.xml
<property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> </property> <property> <name>mapreduce.jobhistory.joblist.cache.size</name> <value>50000</value> </property> <!-- 前面在HDFS上创建的目录 --> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/user/hadoop/done</value> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/user/hadoop/tmp</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
/etc/hadoop/conf/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>List of directories to store localized files in.</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>/var/log/hadoop-yarn/containers</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://master:9000/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$H0041DOOP_COMMON_HOME/*,
$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,
$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,
$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,
$HADOOP_YARN_HOME/lib/*
</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>master:54315</value>
</property>
c、添加所有的slave的 /etc/hadoop/slaves:
slave1
slave2
d、最后将以上修改的文件同步到slave上:
scp -r /etc/hadoop/conf root@slave1:/etc/hadoop/
scp -r /etc/hadoop/conf root@slave2:/etc/hadoop/
Step 12:开启回收站功能(可选)
在/etc/hadoop/conf/core-site.xml中添加如下两个参数:
1、fs.trash.interval:该参数值为时间间隔,单位为分钟,默认为0,表示回收站功能关闭。该值表示回收站中文件保存多长时间,如果服务端配置了该参数,则忽略客户端的配置;如果服务端关闭了该参数,则检查客户端是否有配置该参数;
2、fs.trash.checkpoint.interval:该参数值为时间间隔,单位为分钟,默认为0。该值表示检查回收站时间间隔,该值要小于fs.trash.interval,该值在服务端配置。如果该值设置为0,则使用 fs.trash.interval 的值。
Step 13:配置LZO(可选)
a、下载repo文件到traceMaster上的/etc/yum.repos.d/:
wget http://archive.cloudera.com/gplextras5/redhat/6/x86_64/gplextras/cloudera-gplextras5.repo
b、安装LZO:yum install hadoop-lzo* impala-lzo -y
c、在/etc/hadoop/conf/core-site.xml中添加以下配置:
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.BZip2Codec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
如果想要MapReduce在写中间结果时也使用LZO压缩,可以将以下配置添加到/etc/hadoop/conf/mapred-site.xml中:
<property> <name>mapred.map.output.compression.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>
d、配置完成后,进行测试:
hadoop jar /usr/lib/hadoop/lib/hadoop-lzo.jar com.hadoop.compression.lzo.LzoIndexer hdfs://master:9000/user/hadoop/workflows/shellTest/workflow.xml
hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start
Step 14:启动服务
a、master启动:
hdfs namenode -format
/etc/init.d/hadoop-hdfs-namenode init
/etc/init.d/hadoop-hdfs-namenode start
/etc/init.d/hadoop-yarn-resourcemanager start
/etc/init.d/hadoop-yarn-proxyserver start
/etc/init.d/hadoop-mapreduce-historyserver start
b、slave1&slave2启动:
/etc/init.d/hadoop-hdfs-datanode start
/etc/init.d/hadoop-yarn-nodemanager start
以上启动过程中,会遇到启动失败的问题,按照提示找到对应的log日志文件,进去查看错误详情,绝大多数是因为文件没有操作权限引起的,执行chmod –R 777 对应文件目录即可解决!
c、启动后检查:
HDFS | |
http://192.168.157.130:8088 | ResourceManager(Yarn) |
http://192.168.157.130:8088/cluster/nodes | 在线的节点 |
http://192.168.157.130:8042 |
NodeManager |
http://192.168.157.130:19888/ | JobHistory |