环境:Ubuntu18.04 LTS系统
虚拟机:选择的是Virtul Box,占用资源比较少
安装三个虚拟机,主机名分别是:master、slave01、slave02,用户名都设置为hadoop
设置网络,桥接模式,可以实现局域网访问
系统安装完毕后,更新系统,
sudo apt update
sudo apt upgrade
同时安装vim
sudo apt install vim
修改hosts
修改hosts文件,位于/ets/hosts,将master、slave01和slave02对应的ip地址和主机名输入到每台机器中
192.168.1.22 master
192.168.1.23 slave01
192.168.1.24 slave02
给用户hadoop增加root权限,切换到root下使用命令visudo
在
root ALL=(ALL) ALL
下面增加
hadoop ALL=(ALL) ALL
配置ssh免登陆
安装openssh-server
在三台机器上分别执行ssh-keygen -t rsa
,然后回车至结束,在master上进入到根目录下的.ssh文件夹
cat id_rsa.pub >> authorized_keys
scp authorized_keys hadoop@slave01:~/.ssh
scp authorized_keys hadoop@slave02:~/.ssh
配置JDK1.8
Hadoop3.1.0要求最低JDK8.0,到官网下载压缩版本解压至任意目录,然后添加环境变量
export JAVA_HOME=/home/hadoop/tools/jdk8
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
以上配置JDK1.8操作需要分别对三台机器同样执行
配置Hadoop3.1.0
配置hadoop环境
# hadoop
export HADOOP_HOME=/home/hadoop/tools/hadoop3
export PATH=$PATH:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
进入目录/hadoop3/etc/hadoop进行一系列配置
**core-site.xml**
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///home/hadoop/tools/hadoop3/tmp</value>
</property>
</configuration>
**hdfs-site.xml**
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/tools/hadoop3/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/tools/hadoop3/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>slave01:9001</value>
</property>
</configuration>
**yarn-site.xml**
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandle</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8025</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8040</value>
</property>
</configuration>
**mapred-site.xml**
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/home/hadoop/tools/hadoop3/etc/hadoop,
/home/hadoop/tools/hadoop3/share/hadoop/common/*,
/home/hadoop/tools/hadoop3/share/hadoop/common/lib/*,
/home/hadoop/tools/hadoop3/share/hadoop/hdfs/*,
/home/hadoop/tools/hadoop3/share/hadoop/hdfs/lib/*,
/home/hadoop/tools/hadoop3/share/hadoop/mapreduce/*,
/home/hadoop/tools/hadoop3/share/hadoop/mapreduce/lib/*,
/home/hadoop/tools/hadoop3/share/hadoop/yarn/*,
/home/hadoop/tools/hadoop3/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
**workers**
注意这边修改的文件就是目录下的workers文件,加入两个slave机器名即可
slave01
slave02
**hadoop-env.sh**
在此文件中加入
export JAVA_HOME=/home/hadoop/tools/jdk8
**yarn-env.sh**
在此文件中加入
export JAVA_HOME=/home/hadoop/tools/jdk8
配置scala
# scala # 在最后添加下面内容 export SCALA_HOME=/home/hadoop/tools/scala export PATH=$PATH:${SCALA_HOME}/bin
配置spark
环境变量
# spark
export SPARK_HOME=/home/hadoop/tools/spark2
export PATH=$PATH:${SPARK_HOME}/bin:${SPARK_HOME}/sbin
配置/conf/spark-env.sh
cp spark-env.sh.template spark-env.sh
#配置内容如下: export SCALA_HOME=/home/hadoop/tools/scala export JAVA_HOME=/home/hadoop/tools/jdk8 export SPARK_MASTER_IP=master export SPARK_WORKER_MEMORY=1g export HADOOP_CONF_DIR=/home/hadoop/tools/hadoop3/etc/hadoop
配置/conf/slaves
cp slaves.template slaves
master
slave01
slave02
配置slave01和slave02的环境变量 /etc/profile
最后将配置好的hadoop3,spark2, scala,jdk8文件复制到slave01和slave02节点
scp -r ----------- hadoop@slave01:~/tools scp -r ----------- hadoop@slave02:~/tools