准备三台虚拟机
首先网络配置
配置静态ip
vim /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.200.101
NETMASK=255.255.255.0
GATEWAY=192.168.200.2
DNS1=192.168.200.2
虚拟网络编辑
NAT 设置
搭建Java环境 这里选择的是JDK1.8
解压完成后配置环境变量
#JAVA_HOME
export JAVA_HOME=/opt/bdp/jdk1.8.0_212
export PATH=$PATH:$JAVA_HOME/bin
ssh免密登录配置
ssh-keygen -t rsa
然后敲(三个回车),就会生成两个文件id_rsa(私钥)、id_rsa.pub(公钥)
将公钥拷贝到要免密登录的目标机器上
ssh-copy-id hadoop101
ssh-copy-id hadoop102
ssh-copy-id hadoop103
还需要在hadoop102上采用root账号,配置一下无密登录到hadoop101、hadoop102、hadoop103;
还需要在hadoop103上采用bawei账号配置一下无密登录到hadoop101、hadoop102、hadoop103服务器上。
然后解压
集群规划
服务器hadoop101 | 服务器hadoop102 | 服务器hadoop103 | |
HDFS | NameNode DataNode | DataNode | DataNode SecondaryNameNode |
Yarn | NodeManager | Resourcemanager NodeManager | NodeManager |
core-site.xml
<!-- 指定 NameNode 的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop01:8020</value>
</property>
<!-- 指定 hadoop 数据的存储目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/bdp/hadoop-3.1.3/data</value>
</property>
<!-- 配置 HDFS 网页登录使用的静态用户为 demo -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>demo</value>
</property>
hadoop-env.sh
配置java环境变量
hdfs-site.xml
<!-- nn web 端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop01:9870</value>
</property>
<!-- 2nn web 端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop03:9868</value>
</property>
配置mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 历史服务器端地址 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<!-- 历史服务器 web 端地址 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
配置yarn-site.xml
<!-- 指定MR 走 shuffle -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定 ResourceManager 的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop02</value>
</property>
<!-- 环境变量的继承 -->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CO NF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAP
RED_HOME</value>
</property>
<!-- 开启日志聚集功能 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 设置日志聚集服务器地址 -->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop01:19888/jobhistory/logs</value>
</property>
<!-- 设置日志保留时间为 7 天 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
配置sbin/start-yarn.sh 头部添加
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
配置sbin/stop-yarn.sh 头部添加
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
配置sbin/start-dfs.sh
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
配置sbin/stop-dfs.sh
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
上边的搭建完之后在运行时可能会报一些错
比如
只需要在mapre-site.xml 和 yarn-site.xml配置文件里边添加
<property>
<name>yarn.application.classpath</name>
<value>
/opt/bdp/hadoop-3.1.3/etc/*,
/opt/bdp/hadoop-3.1.3/etc/hadoop,
/opt/bdp/hadoop-3.1.3/share/hadoop/common/lib/*,
/opt/bdp/hadoop-3.1.3/share/hadoop/common/*,
/opt/bdp/hadoop-3.1.3/share/hadoop/hdfs/*,
/opt/bdp/hadoop-3.1.3/share/hadoop/hdfs/lib/*,
/opt/bdp/hadoop-3.1.3/share/hadoop/hdfs/,
/opt/bdp/hadoop-3.1.3/share/hadoop/mapreduce/*,
/opt/bdp/hadoop-3.1.3/share/hadoop/mapreduce/lib_examples/*,
/opt/bdp/hadoop-3.1.3/share/hadoop/yarn/*,
/opt/bdp/hadoop-3.1.3/share/hadoop/yarn/lib/*,
</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
如果运行的时候还会报下边的错 则在yarn-site.xml 添加
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
搭建LZO
首先要安装maven + git(配置编译环境)
这里选择的时 maven 3.6.1版本
vim /etc/profile.d/maven.sh(配置环境变量)
export MAVEN_HOME=/opt/software/apache-maven-3.6.1
export PATH=$PATH:$MAVEN_HOME/bin
MAVEN_OPTS=-Xmx2048m
export JAVA_HOME MAVEN_HOME MAVEN_OPTS JAVA_BIN PATH CLASSPATH
解压完修改 setting.xml配置文件加上阿里云镜像
tar -zxvf apache-maven-3.6.1-bin.tar.gz
setting.xml 文件在apache-maven-3.6.1/conf 下
<mirrors>
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>central</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
<mirror>
<id>CN</id>
<name>OSChina Central</name>
<url>http://maven.oschina.net/content/groups/public/</url>
<mirrorOf>central</mirrorOf>
</mirror>
<mirror>
<id>alimaven</id>
<mirrorOf>central</mirrorOf>
<name>aliyun maven</name>
<url>https://maven.aliyun.com/nexus/content/repositories/central/</url>
</mirror>
<mirror>
<id>jboss-public-repository-group</id>
<mirrorOf>central</mirrorOf>
<name>JBoss Public Repository Group</name>
<url>https://repository.jboss.org/nexus/content/groups/public</url>
</mirror>
</mirrors>
安装git
yum install -y git
编译安装lzo,需要安装以下插件
gcc-c++
zlib-devel
autoconf
automake
libtool
通过yum安装即可
yum -y install gcc-c++ lzo-devel zlib-devel autoconf automake libtool
下载、安装并编译LZO
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
tar -zxvf lzo-2.10.tar.gz
cd lzo-2.10
./configure -prefix=/usr/local/hadoop/lzo/
make
make install
编译hadoop-lzo源码
下载hadoop-lzo的源码,下载地址:https://github.com/twitter/hadoop-lzo/archive/master.zip
wget https://github.com/twitter/hadoop-lzo/archive/master.zip
解压之后,修改pom.xml
<hadoop.current.version>3.1.3</hadoop.current.version>
声明两个临时环境变量
export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
编译,进入hadoop-lzo-master,执行maven编译命令
mvn package -Dmaven.test.skip=true
进入target,hadoop-lzo-0.4.21-SNAPSHOT.jar 即编译成功的hadoop-lzo组件
将编译好后的hadoop-lzo-0.4.20.jar 放入hadoop-3.1.3/share/hadoop/common/
同步hadoop-lzo-0.4.21-SNAPSHOT.jar到hadoop102、hadoop103
xsync.sh hadoop-lzo-0.4.21-SNAPSHOT.jar
core-site.xml增加配置支持LZO压缩
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>io.compression.codecs</name>
<value>
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec,
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
</configuration>
同步core-site.xml到hadoop102、hadoop103
xsync.sh core-site.xml
然后自己测试吧