Hadoop 2.8.0 + CentOS7.3
一、安装jdk1.8
tar zxvf jdk-8u65-linux-x64.tar.gz
mv jdk-8u65-linux-x64 /usr/src/jdk
在/etc/profile中添加如下
JAVA_HOME=/usr/src/jdk/
PATH=$JAVA_HOME/bin:/usr/local/xtrabackup/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME
export PATH
二、安装hadoop
tar zxvf hadoop-2.8.0.tar.gz
mv hadoop-2.8.0 /usr/src/hadoop
在/etc/profile中添加环境变量
export CLASSPATH
HADOOP_LOG_DIR=/usr/src/hadoop
HADOOP_PREFIX=/usr/src/hadoop
export HADOOP_PREFIX
export HADOOP_HOME=/usr/src/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_MAPARED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native/:$LD_LIBRARY_PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
三、服务器相关设置
1, /etc/hosts设置
[root@centos128 hadoop]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.44.128 centos128 localhost
192.168.44.129 centos129
192.168.44.130 centos130
centos128 是master
centos129,centos130 是slave
2. firewalld关闭
systemctl disable firewalld
3. ssh trusts设置
ssh-keygen -t rsa
ssh-keygen -t dsa
cd ~/.ssh
ssh-copy-id -i id_rsa.pub centos128
ssh-copy-id -i id_dsa.pub centos128
ssh-copy-id -i id_rsa.pub centos129
ssh-copy-id -i id_dsa.pub centos129
ssh-copy-id -i id_rsa.pub centos130
ssh-copy-id -i id_dsa.pub centos130
在其它服务器做同样的设置
4. 存放路径的创建
mkdir /data/hadoop/name -p
mkdir /data/hadoop/tmp -p
mkdir /Data1 -p
mkdir /Data2 -p
四、设置配置文件
主要几个配置文件
a,在hadoop-env.sh中将
export JAVA_HOME=${JAVA_HOME}
改成
export JAVA_HOME=/usr/src/jdk
b,
etc/hadoop/core-site.xml 配置NameNode URI
etc/hadoop/hdfs-site.xml 配置NameNode ,配置DataNode,
etc/hadoop/yarn-site.xml 配置ResourceManager ,配置NodeManager ,配置History Server
etc/hadoop/mapred-site.xml 配置MapReduce Applications,配置 MapReduce JobHistory Server
etc/hadoop/slaves 添加slave的IP
1,etc/hadoop/core-site.xml 配置如下:
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://centos128:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
</configuration>
其中hdfs://centos128:9000是 DataName uri地址
2, etc/hadoop/hdfs-site.xml 配置如下:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/name</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<!--
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/Data1,/Data2</value>
</property>
</configuration>
dfs.namenode.name.dir namenode物理路径
dfs.replication 默认为3个副本
dfs.datanode.data.dir datanode放存物理路径
3, etc/hadoop/yarn-site.xml 配置如下:
含义参考:http://blog.csdn.net/u010719917/article/details/73917217
<!-- Site specific YARN configuration properties -->
<!--
ResourceManager
-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--Configurations for ResourceManager -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>centos128</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.resource-tracker.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.client.thread-count</name>
<value>50</value>
</property>
-->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>0</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>512</value>
</property>
<!--
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name>
<value>1000</value>
</property>
-->
<!--
nodemanager
-->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>${yarn.log.dir}/userlogs</value>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--
History Serve
-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
<!--
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
-->
</configuration>
4. etc/hadoop/mapred-site.xml 配置如下:
含义参考:http://blog.csdn.net/u010719917/article/details/73917217
<configuration>
<!--
MapReduce Applications
-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>Xmx2560M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
<!--
MapReduce JobHistory Server
-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>centos128:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>centos128:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
</configuration>
5. etc/hadoop/slaves 如下
[root@centos128 hadoop]# cat slaves
centos129
centos130
6,日志路径:
[root@centos128 logs]# pwd
/usr/src/hadoop/logs
[root@centos128 logs]# ll
total 456
-rw-r--r-- 1 root root 135437 Jul 1 14:06 hadoop-root-namenode-centos128.log
-rw-r--r-- 1 root root 5069 Jul 1 13:19 hadoop-root-namenode-centos128.out
-rw-r--r-- 1 root root 22419 Jul 1 12:54 hadoop-root-secondarynamenode-centos128.log
-rw-r--r-- 1 root root 716 Jul 1 12:54 hadoop-root-secondarynamenode-centos128.out
-rw-r--r-- 1 root root 34891 Jul 1 14:07 mapred-root-historyserver-centos128.log
-rw-r--r-- 1 root root 1477 Jul 1 13:18 mapred-root-historyserver-centos128.out
-rw-r--r-- 1 root root 0 Jul 1 12:53 SecurityAuth-root.audit
-rw-r--r-- 1 root root 20165 Jul 1 13:06 yarn-root-proxyserver-centos128.log
-rw-r--r-- 1 root root 702 Jul 1 13:06 yarn-root-proxyserver-centos128.out
-rw-r--r-- 1 root root 87905 Jul 1 14:06 yarn-root-resourcemanager-centos128.log
-rw-r--r-- 1 root root 1524 Jul 1 13:04 yarn-root-resourcemanager-centos128.out
-rw-r--r-- 1 root root 702 Jul 1 13:00 yarn-root-resourcemanager-centos128.out.1
-rw-r--r-- 1 root root 702 Jul 1 12:58 yarn-root-resourcemanager-centos128.out.2
五 , 其它服务安装hadoop
将配置好的hadoop包,jdk,profile,host 复制到centos129,centos130
cd /
tar cvf hd.tar.gz /usr/src/hadoop/ /usr/src/jdk/ /etc/profile /etc/hosts
六. hadoop脚本及页面
Hadoop Startup
要启动hdfs的HDFS,首先HDFS 格式化:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
开启HDFS的 NameNode:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
开启HDFS的 DataNode:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:
如果etc/hadoop/slaves和ssh trusted都被配置,以上所有进程,可以用如下脚本开启
[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
开启YARN的ResourceManager:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
开启YARN的NodeManager:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager
Start a standalone WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn:
直接开启yarn全部功能:
[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh
开启mapred的MapReduce JobHistory Server服务:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver
Hadoop Shutdown
关闭hdfs的namenode:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
关闭hdfs的DataNode:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
关闭yarn的ResourceManager:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
关闭yarn的NodeManager:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn:
[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh
Stop the WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver
关闭mapred的MapReduce JobHistory Serve:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver
Web Interfaces
Once the Hadoop cluster is up and running check the web-ui of the components as described below:
Daemon Web Interface Notes
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.
一、安装jdk1.8
tar zxvf jdk-8u65-linux-x64.tar.gz
mv jdk-8u65-linux-x64 /usr/src/jdk
在/etc/profile中添加如下
JAVA_HOME=/usr/src/jdk/
PATH=$JAVA_HOME/bin:/usr/local/xtrabackup/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME
export PATH
二、安装hadoop
tar zxvf hadoop-2.8.0.tar.gz
mv hadoop-2.8.0 /usr/src/hadoop
在/etc/profile中添加环境变量
export CLASSPATH
HADOOP_LOG_DIR=/usr/src/hadoop
HADOOP_PREFIX=/usr/src/hadoop
export HADOOP_PREFIX
export HADOOP_HOME=/usr/src/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_MAPARED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_YARN_HOME=${HADOOP_HOME}
export YARN_HOME=${HADOOP_HOME}
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export LD_LIBRARY_PATH=${HADOOP_HOME}/lib/native/:$LD_LIBRARY_PATH
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
三、服务器相关设置
1, /etc/hosts设置
[root@centos128 hadoop]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.44.128 centos128 localhost
192.168.44.129 centos129
192.168.44.130 centos130
centos128 是master
centos129,centos130 是slave
2. firewalld关闭
systemctl disable firewalld
3. ssh trusts设置
ssh-keygen -t rsa
ssh-keygen -t dsa
cd ~/.ssh
ssh-copy-id -i id_rsa.pub centos128
ssh-copy-id -i id_dsa.pub centos128
ssh-copy-id -i id_rsa.pub centos129
ssh-copy-id -i id_dsa.pub centos129
ssh-copy-id -i id_rsa.pub centos130
ssh-copy-id -i id_dsa.pub centos130
在其它服务器做同样的设置
4. 存放路径的创建
mkdir /data/hadoop/name -p
mkdir /data/hadoop/tmp -p
mkdir /Data1 -p
mkdir /Data2 -p
四、设置配置文件
主要几个配置文件
a,在hadoop-env.sh中将
export JAVA_HOME=${JAVA_HOME}
改成
export JAVA_HOME=/usr/src/jdk
b,
etc/hadoop/core-site.xml 配置NameNode URI
etc/hadoop/hdfs-site.xml 配置NameNode ,配置DataNode,
etc/hadoop/yarn-site.xml 配置ResourceManager ,配置NodeManager ,配置History Server
etc/hadoop/mapred-site.xml 配置MapReduce Applications,配置 MapReduce JobHistory Server
etc/hadoop/slaves 添加slave的IP
1,etc/hadoop/core-site.xml 配置如下:
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://centos128:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>10080</value>
</property>
</configuration>
其中hdfs://centos128:9000是 DataName uri地址
2, etc/hadoop/hdfs-site.xml 配置如下:
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/name</value>
</property>
<property>
<name>dfs.blocksize</name>
<value>268435456</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
</property>
<!--
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
-->
<property>
<name>dfs.datanode.data.dir</name>
<value>/Data1,/Data2</value>
</property>
</configuration>
dfs.namenode.name.dir namenode物理路径
dfs.replication 默认为3个副本
dfs.datanode.data.dir datanode放存物理路径
3, etc/hadoop/yarn-site.xml 配置如下:
含义参考:http://blog.csdn.net/u010719917/article/details/73917217
<!-- Site specific YARN configuration properties -->
<!--
ResourceManager
-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--Configurations for ResourceManager -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>centos128</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<!--
<property>
<name>yarn.resourcemanager.resource-tracker.client.thread-count</name>
<value>50</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.client.thread-count</name>
<value>50</value>
</property>
-->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>0</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>512</value>
</property>
<!--
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<property>
<name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name>
<value>1000</value>
</property>
-->
<!--
nodemanager
-->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>${hadoop.tmp.dir}/nm-local-dir</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>${yarn.log.dir}/userlogs</value>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>10800</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--
History Serve
-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>-1</value>
</property>
<property>
<name>yarn.log-aggregation.retain-check-interval-seconds</name>
<value>-1</value>
</property>
<!--
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
-->
</configuration>
4. etc/hadoop/mapred-site.xml 配置如下:
含义参考:http://blog.csdn.net/u010719917/article/details/73917217
<configuration>
<!--
MapReduce Applications
-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>1536</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx1024M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>3072</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>Xmx2560M</value>
</property>
<property>
<name>mapreduce.task.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.task.io.sort.factor</name>
<value>100</value>
</property>
<property>
<name>mapreduce.reduce.shuffle.parallelcopies</name>
<value>50</value>
</property>
<!--
MapReduce JobHistory Server
-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>centos128:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>centos128:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/mr-history/tmp</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/mr-history/done</value>
</property>
</configuration>
5. etc/hadoop/slaves 如下
[root@centos128 hadoop]# cat slaves
centos129
centos130
6,日志路径:
[root@centos128 logs]# pwd
/usr/src/hadoop/logs
[root@centos128 logs]# ll
total 456
-rw-r--r-- 1 root root 135437 Jul 1 14:06 hadoop-root-namenode-centos128.log
-rw-r--r-- 1 root root 5069 Jul 1 13:19 hadoop-root-namenode-centos128.out
-rw-r--r-- 1 root root 22419 Jul 1 12:54 hadoop-root-secondarynamenode-centos128.log
-rw-r--r-- 1 root root 716 Jul 1 12:54 hadoop-root-secondarynamenode-centos128.out
-rw-r--r-- 1 root root 34891 Jul 1 14:07 mapred-root-historyserver-centos128.log
-rw-r--r-- 1 root root 1477 Jul 1 13:18 mapred-root-historyserver-centos128.out
-rw-r--r-- 1 root root 0 Jul 1 12:53 SecurityAuth-root.audit
-rw-r--r-- 1 root root 20165 Jul 1 13:06 yarn-root-proxyserver-centos128.log
-rw-r--r-- 1 root root 702 Jul 1 13:06 yarn-root-proxyserver-centos128.out
-rw-r--r-- 1 root root 87905 Jul 1 14:06 yarn-root-resourcemanager-centos128.log
-rw-r--r-- 1 root root 1524 Jul 1 13:04 yarn-root-resourcemanager-centos128.out
-rw-r--r-- 1 root root 702 Jul 1 13:00 yarn-root-resourcemanager-centos128.out.1
-rw-r--r-- 1 root root 702 Jul 1 12:58 yarn-root-resourcemanager-centos128.out.2
五 , 其它服务安装hadoop
将配置好的hadoop包,jdk,profile,host 复制到centos129,centos130
cd /
tar cvf hd.tar.gz /usr/src/hadoop/ /usr/src/jdk/ /etc/profile /etc/hosts
六. hadoop脚本及页面
Hadoop Startup
要启动hdfs的HDFS,首先HDFS 格式化:
[hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
开启HDFS的 NameNode:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs start namenode
开启HDFS的 DataNode:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs start datanode
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes can be started with a utility script. As hdfs:
如果etc/hadoop/slaves和ssh trusted都被配置,以上所有进程,可以用如下脚本开启
[hdfs]$ $HADOOP_PREFIX/sbin/start-dfs.sh
开启YARN的ResourceManager:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start resourcemanager
开启YARN的NodeManager:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR start nodemanager
Start a standalone WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR start proxyserver
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be started with a utility script. As yarn:
直接开启yarn全部功能:
[yarn]$ $HADOOP_PREFIX/sbin/start-yarn.sh
开启mapred的MapReduce JobHistory Server服务:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR start historyserver
Hadoop Shutdown
关闭hdfs的namenode:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script hdfs stop namenode
关闭hdfs的DataNode:
[hdfs]$ $HADOOP_PREFIX/sbin/hadoop-daemons.sh --config $HADOOP_CONF_DIR --script hdfs stop datanode
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the HDFS processes may be stopped with a utility script. As hdfs:
[hdfs]$ $HADOOP_PREFIX/sbin/stop-dfs.sh
关闭yarn的ResourceManager:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop resourcemanager
关闭yarn的NodeManager:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemons.sh --config $HADOOP_CONF_DIR stop nodemanager
If etc/hadoop/slaves and ssh trusted access is configured (see Single Node Setup), all of the YARN processes can be stopped with a utility script. As yarn:
[yarn]$ $HADOOP_PREFIX/sbin/stop-yarn.sh
Stop the WebAppProxy server. Run on the WebAppProxy server as yarn. If multiple servers are used with load balancing it should be run on each of them:
[yarn]$ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh --config $HADOOP_CONF_DIR stop proxyserver
关闭mapred的MapReduce JobHistory Serve:
[mapred]$ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh --config $HADOOP_CONF_DIR stop historyserver
Web Interfaces
Once the Hadoop cluster is up and running check the web-ui of the components as described below:
Daemon Web Interface Notes
NameNode http://nn_host:port/ Default HTTP port is 50070.
ResourceManager http://rm_host:port/ Default HTTP port is 8088.
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.