hadoop与spark环境搭建命令简易教程(Ubuntu18.04)

Hadoop

一.single node cluster

1.安装jdk
java -version #查看java版本
sudo apt-get update
sudo apt install openjdk-8-jdk-headless (推荐1.8)
/或**sudo apt-get install default-jdk **
update-alternatives --display java #查看java安装路径
在这里插入图片描述
注:集群jdk一定要统一,否则运行spark出错!

2.SSH无密码登录
sudo apt-get install ssh
sudo apt-get install rsync
ssh-keygen -t rsa -P ‘’ -f ~/.ssh/id_rsa #产生ssh密钥
ls -l ~/.ssh #查看产生的密钥
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys #将公钥放到许可证文件中 ,登陆主机免密
ssh-copy-id user@host #登陆datanode免密

3.下载安装hadoop
3.1 下载:
官网: hadoop下载地址.
地址:找对应版本,点进去,找hadoopxxx.tar.gz,右键复制链接地址
下载:wget http://archive.apache.org/dist/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
截图:在这里插入图片描述

3.2 安装:
解压:**sudo tar -zxvf hadoop-2.7.5.tar.gz **
移动: sudo mv hadoop-2.7.5 /usr/local/hadoop
查看:ls -l /usr/local/hadoop/
在这里插入图片描述

4.设置环境变量
4.1 打开bashrc:sudo vim ~/.bashrc
4.2 编辑:

#HADOOP ENV VAR
#set jdk path
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
#set hadoop install path
export HADOOP_HOME=/usr/local/hadoop
#set path
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
#set other env path
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
#set of link lib
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-DJava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

4.3 使生效:source ~/.bashrc

5.修改配置文件 (single node cluster)
5.1 hadoop-env.sh
命令:sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
修改:

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

5.2 core-site.xml
命令:sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
修改:

<configuration>
<property>
	<name>fs.default.name</name>
	<value>hdfs://localhost:9000</value>
</property>
</configuration>

5.3 yarn-site.xml
命令: sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
修改:

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>
<property>
	<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>

5.4 mapred-site.xml
命令:1:sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
2:sudo vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
修改:

<configuration>
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>
</configuration>

5.5 hdfs-site.xml
命令:sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
修改:

<configuration>
<property>
	<name>dfs.replication</name>
	<value>2</value>
</property>
<property>
	<name>dfs.namenode.name.dir</name>
	<value> file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value> file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
</property>
</configuration>

6.格式化hdfs
6.1 创建:
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode
6.2 更改拥有者:
sudo chown miugod:miugod -R /usr/local/hadoop
6.3 格式化hdfs:
hadoop namenode -format

7.启动hadoop
法一:
start-dfs.sh
start-yarn.sh
法二:
start-all.sh

8.打开hadoop资源管理web界面
http://localhost:8088

9.nameNode hdfs web界面
http://localhost:50070

二.multi node cluster

1.复制虚拟机:从single node那复制一台,名字叫data1

2.设置网卡:
如果是一台主机,设置虚拟机第二张网卡为 Host-Only:建立内部网络,连接虚拟机和主机
如果要搭建小型局域网hadoop集群,设置第二张网卡为桥接:可以跨主机ping通

主机名&ip地址参考如下: (修改IP地址见后面)

192.168.50.100    master 
192.168.50.101    data1
192.168.50.102    data2     
192.168.50.103    data3    

注意:/etc/hosts ip映射一定要整对! 若127.0.0.1 master,192.168.1.201 master,会默认取第一个!
3.设置DataNode:
3.1 编辑网络配置:
作用:固定ip地址
说明:ubuntu17后弃用interfaces编辑网络配置,而是用 netplan方式,配置文件在
/etc/netplan/01-network-manager-all.yaml
命令:1. sudo vim /etc/netplan/01-network-manager-all.yaml
2. sudo netplan apply
( 无需重启)
内容:

network:
  version: 2
  renderer: NetworkManager
  ethernets:
    enp0s8:
     dhcp4: no
     addresses: [192.168.50.101/24]
     gateway4: 192.168.50.1
     nameservers:
       addresses: [8.8.8.8,8.8.4.4]

参考:https://blog.csdn.net/liuqun69/article/details/88888892
命令: sudo vim /etc/network/interfaces

3.3编辑hosts:
作用:让网络中的所有计算机,知道其他计算机名和ip,让主机名映射到ip
命令:sudo vim /etc/hosts
内容:

192.168.50.100 master
192.168.50.101 data1
192.168.50.102 data2
192.168.50.103 data3

3.4编辑配置文件:
hadoop-env没配置
3.4.1 core-site.xml
命令:sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
修改:

<configuration>
<property>
	<name>fs.default.name</name>
	<value>hdfs://master:9000</value>
</property>
</configuration>

3.4.2 yarn-site.xml
命令:sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
修改:

<configuration>
<!-- Site specific YARN configuration properties -->
<property>
	<name>yarn.resourcemanager.resource-tracker.address</name>
	<value>master:8025</value>
</property>
<property>
	<name>yarn.resourcemanager.scheduler.address</name>
	<value>master:8030</value>
</property>
<property>
	<name>yarn.resourcemanager.address</name>
	<value>master:8050</value>
</property>
</configuration>

3.4.3 mapred-site.xml
#无,则需先创模板 sudo cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
命令:sudo vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
修改:

<configuration>
<property>
	<name>mapred.job.tracker</name>
	<value>master:54311</value>
</property>
</configuration>

3.4.4 hdfs-site.xml
命令:sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
修改:

<configuration>
<property>
	<name>dfs.replication</name>
	<value>3</value>
</property>
<property>
	<name>dfs.datanode.data.dir</name>
	<value>file:/usr/local/hadoop/hadoop_data/hdfs/datanode</value>
</property>
<property>
         <name>dfs.http.address</name>
         <value>master:50070</value>
</property>
</configuration>

4.设置NameNode:(如之前datanode)
4.1 编辑网络配置:
作用:固定ip地址
说明:ubuntu17后弃用interfaces编辑网络配置,而是用 netplan方式,
配置文件,在/etc/netplan/01-network-manager-all.yaml
命令:

  1. sudo vim /etc/netplan/01-network-manager-all.yaml
  2. sudo netplan apply
    内容:
network:
  version: 2
  renderer: NetworkManager
  ethernets:
    enp0s8:
     dhcp4: no
     addresses: [192.168.50.100/24]
     gateway4: 192.168.50.1
     nameservers:
       addresses: [8.8.8.8,8.8.4.4]

4.2 编辑hostname:
作用: 绑定ip
修改:sudo vim /etc/hostname
生效: sudo hostname $(cat /etc/hostname)
内容:

master

4.3编辑hosts:
作用:让网络中的所有计算机,知道其他计算机名和ip,让主机名映射到ip
命令:sudo vim /etc/hosts
内容:

192.168.50.100    master 
192.168.50.101    data1
192.168.50.102    data2     
192.168.50.103   data3    

4.4 core-site.xml: #https://www.zhihu.com/question/31239901
!为防止重启后无namenode,要加tmp.dir
命令:sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
修改:

<configuration>
<property>
	<name>fs.default.name</name>
	<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/miugod/hadoop_tmp</value>
<description>A base for other temporary directories.</description>
</property>
</configuration>

4.5 hdfs-site.xml:
#!http.address 可以让集群外的机子访问50070
命令:sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
修改:

<configuration>
<property>
	<name>dfs.replication</name>
	<value>3</value>
</property>
<property>
	<name>dfs.namenode.name.dir</name>
	<value>file:/usr/local/hadoop/hadoop_data/hdfs/namenode</value>
</property>
<property>
         <name>dfs.http.address</name>
         <value>master:50070</value>
</property>
</configuration>

4.6 编辑masters文件
命令:sudo vim /usr/local/hadoop/etc/hadoop/masters
内容:

master

4.7 编辑slaves文件
命令:sudo vim /usr/local/hadoop/etc/hadoop/slaves
内容:

data1
data2
data3

5.name连接data,创建hdfs目录,有几个data,ssh几次!
5.1 ssh data1 (公钥相同) #连接到data1
或 ssh user@host (公钥不同,非复制)
5.2 sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs #删除旧目录
#!注意:删除旧目录是因为,datanode下面的current里有VERSION,
放着该datanode的clusterID,若已存在而格式化,namenode里的
clusterID会更新,而datanode里的因为已存在而不会重新生成,
导致不一致,从而datanode无法启动,所以要删
5.3 mkdir -p /usr/local/hadoop/hadoop_data/hdfs/datanode #创建新hdfs目录
5.4 sudo chown -R miugod:miugod /usr/local/hadoop #修改所有者权限
5.5 exit #退出

*注,查看VERSION :
cd /usr/local/hadoop/hadoop_data/hdfs/datanode/current
vi Version

6.创建并格式化nameNode的hdfs
6.1 创建:
sudo rm -rf /usr/local/hadoop/hadoop_data/hdfs #删除旧目录
sudo mkdir -p /usr/local/hadoop/hadoop_data/hdfs/namenode #创建新hdfs目录
**sudo chown -R miugod:miugod /usr/local/hadoop ** #修改拥有者所有者权限
6.2 格式化:
hadoop namenode -format

7.启动集群
命令:start-all.sh

三.快速版(远程复制)

1.编辑一个hadoop文件
2.远程复制:
2.1 远程后创建文件夹,更改所有者权限

ssh data1
sudo mkdir /usr/local/hadoop
sudo chown miugod:miugod /usr/local/hadoop 
exit

2.2 远程复制

scp -r /usr/local/hadoop/ data1:/usr/local

3 或用:

scp -r /usr/local/hadoop/* data1:/home/miugod/hadoop
sudo mv hadoop /usr/local/

Spark:

一、单机版

1.安装scala
下载地址: scala.
解压:tar xvf scala-2.12.11.tgz
移动:sudo mv scala-2.12.11 /usr/local/scala
环境变量:
打开:sudo vim ~/.bashrc
编辑:

#SCALA Variables
export SCALA_HOME=/usr/local/scala
export PATH=$PATH:$SCALA_HOME/bin
#SCALA Variables

生效:source ~/.bashrc
启动:scala

2.安装spark
下载地址: spark.
解压:tar zxf spark-3.0.0-bin-hadoop2.7.tgz
移动:sudo mv spark-3.0.0-bin-hadoop2.7 /usr/local/spark
环境变量:
打开:sudo vim ~/.bashrc
编辑:

#SPARK Variables
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/bin
#SPARK Variables

生效:source ~/.bashrc
启动:spark-shell

3.spark-shell信息设置

cd /usr/local/spark/conf
cp log4j.properties.template log4j.properties
sudo vim log4j.properties

#将第三段第二行 INFO 改为WARN

二、spark standalone

1.master安装好spark单机版 √
2.master配置spark-env.sh
复制模板:cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
编辑配置:sudo vim /usr/local/spark/conf/spark-env.sh
添加内容:

export SPARK_MASTER_IP=master
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=800m
export SPARK_WORKER_INSTANCES=1

3.复制程序
ssh data1
sudo mkdir /usr/local/spark
sudo chown miugod:miugod /usr/local/spark #更改所有者
exit
scp -r /usr/local/spark/ data1:/usr/local #远程复制

4.master编辑slaves
sudo vim /usr/local/spark/conf/slaves
添加:

data1 data2 data3

5.standalone上运行spark-shell
/usr/local/spark/sbin/start-all.sh

三、Yarn上运行

1.报错:When running with master ‘yarn’ either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment
解决:
复制jar包:
hdfs dfs -mkdir -p /hadoop/spark_jars
hdfs dfs -put /usr/local/spark/jars/ /hadoop/spark_jars*
添加配置:
cp /usr/local/spark/conf/spark-defaults.conf.template /usr/local/spark/conf/spark-defaults.conf
sudo vim spark-defaults.conf
添加:

spark.yarn.jars hdfs://master:9000/hadoop/spark_jars/*

复制到data:
scp /usr/local/spark/conf/spark-defaults.conf data1:/usr/local/spark/conf/
结果:在这里插入图片描述

2.报错: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
解决:配置hadoop yarn-site.xml (有几台配几个)
命令: sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
添加:

<property>
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

重启hadoop,重启spark
结果:
在这里插入图片描述
参考:
https://www.imooc.com/article/29065?block_id=tuijian_wz
https://www.cnblogs.com/devilmaycry812839668/p/6932960.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值