基础环境准备
- 关闭防火墙
a. systemctl stop firewalld.service b. systemctl disable firewalld.service c. systemctl status firewalld.service
- vi /etc/hostname三个机器各自修改为hadoop101、hadoop102、hadoop103
- vi /etc/hosts
192.168.99.101 hadoop101 192.168.99.102 hadoop102 192.168.99.103 hadppo103
- 创建用户并修改密码
useradd hadoop passwd hadoop
- 修改hadoop用户具有root权限,方便后期加sudo执行root权限命令 vi /etc/sudoers, 在%wheel下面添加一行
hadoop ALL=(ALL) NOPASSWD:ALL
- 在/opt目录下创建mkdir /opt/module、mkdir /opt/software文件夹
- 修改module、software文件夹的所有者和所属组均为hadoop用户
chown -R hadoop:hadoop /opt/module chown -R hadoop:hadoop /opt/software
- windows的C:Windows/System32/drivers/etc/hosts中添加配置
192.168.99.101 hadoop101 192.168.99.102 hadoop102 192.168.99.103 hadppo103
机器配置ssh免密登录
- ssh localhost
- 进入.ssh目录:cd ~/.ssh
- 生产密匙:ssh-keygen -t rsa
- 分发密钥:ssh-copy-id hadoop102, ssh-copy-id hadoop103
JDK环境配置
- 卸载自带java: rpm -qa | grep -i java | xargs -n1 rpm -e --nodeps
- 解压jdk:tar -zxvf jdk-8u212-linux-x64.tar.gz -C /opt/module/
- vi /etc/profile.d/my_env.sh文件并添加内容
#JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin
- source /etc/profile,让新的环境变量PATH生效
- 重启机器: reboot
Hadoop配置
- 下载地址:https://archive.apache.org/dist/hadoop/common hadoop-3.1.3.tar.gz
- tar -zxvf hadoop-3.1.3.tar.gz -C /opt/module/
- 添加环境变量 vi /etc/profile.d/my_env.sh,追加内容,并执行source /etc/profile使配置生效
#HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
- 配置hadoop-3.1.3/etc/hadoop/hadoop-env.sh中java home目录
JAVA_HOME=/opt/module/jdk1.8.0_212
- 编辑hadoop-3.1.3/etc/hadoop/workers
hadoop101 hadoop102 hadoop103
- 修改hadoop-3.1.3/etc/hadoop/core-site.xml
- 修改core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml (参见后面详情部分)
- namenode节点hdfs格式化:./bin/hdfs namenode -format
- 将hadoop复制到hadoop102,hadoop103机器
scp -r /opt/module/hadoop-3.1.3 hadoop@192.168.99.102:/opt/modules/hadoop-3.1.3
Hadoop运行
- hadoop101节点启动: start-dfs.sh
- hadoop102节点启动: mapred --daemon start historyserver
- hadoop103节点启动: start-yarn.sh
- jps查看进程:
hadoop101有 NameNode和SecondNameNode hadoop102和hadoop103有DataNode hadoop102有ResourceManager和NodeManager
- 访问nodenode:http://hadoop101:9870
- 访问yarn: http://hadoop102:19888
其他补充
- mwget下载(wget如果下载太慢可以使用mwget下载)
wget http://jaist.dl.sourceforge.net/project/kmphpfm/mwget/0.1/mwget_0.1.0.orig.tar.bz2 yum install bzip2 gcc-c++ openssl-devel intltool -y bzip2 -d mwget_0.1.0.orig.tar.bz2 tar -xvf mwget_0.1.0.orig.tar cd mwget_0.1.0.orig ./configure make make install
- 按住向右方向键 + ctrl可退出virtualbox鼠标界面
配置详情
- core-site.xml
<configuration> <!-- 指定NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop101:8020</value> </property> <!-- 指定hadoop数据的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-3.1.3/data</value> </property> <!-- 配置HDFS网页登录使用的静态用户为hadoop --> <property> <name>hadoop.http.staticuser.user</name> <value>hadoop</value> </property> </configuration>
- hdfs-site.xml
<configuration> <!-- NameNode web端访问地址--> <property> <name>dfs.namenode.http-address</name> <value>hadoop101:9870</value> </property> <!-- SecondNameNode web端访问地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop101:9868</value> </property> </configuration>
- mapred-site.xml
<configuration> <!-- 指定MapReduce程序运行在Yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- 历史服务器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop102:10020</value> </property> <!-- 历史服务器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop102:19888</value> </property> </configuration>
- yarn-site.xml
<configuration> <!-- 指定MR走shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定ResourceManager的地址--> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop103</value> </property> <!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <!-- 开启日志聚集功能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 设置日志聚集服务器地址 --> <property> <name>yarn.log.server.url</name> <value>http://hadoop102:19888/jobhistory/logs</value> </property> <!-- 设置日志保留时间为7天 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> </configuration>