apache hadoop 部署笔记:单机版本
安装环境:jdk1.8.0_181+hadoop3.2.2+centos7.0
1.官网地址:http://hadoop.apache.org/
帮助文档: https://hadoop.apache.org/docs/r3.2.2/hadoop-project-dist/hadoop-common/SingleCluster.html
2.hadoop的核心:
1.HDFS: Hadoop Distributed File System 分布式文件系统
2.YARN: Yet Another Resource Negotiator 资源管理调度系统
3.Mapreduce:分布式运算框架
3.JDK安装: https://www.oracle.com/java/technologies/javase-downloads.html 里面自行下载
下载:jdk-8u161-linux-x64.tar.gz 文件配置安装
配置jdk打开配置文件
vim /etc/profile
末尾添加 i 进入编辑模式
export JAVA_HOME=/soft/jdk1.8.0_181
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.: J A V A H O M E / l i b : {JAVA_HOME}/lib: JAVAHOME/lib:{JRE_HOME}/lib
export PATH= J A V A H O M E / b i n : {JAVA_HOME}/bin: JAVAHOME/bin:PATH
esc切换 :wq 保存退出
环境变量生效
source /etc/profile
查看java版本
java -version
4.下载hadoop3.2.2
网站下载地址: https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
mkdir /soft
上传hadoop-3.2.2.tar.gz文件到/soft
或shell下载:wget --content-disposition https://mirrors.bfsu.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
解压到soft文件夹配置环境变量 :
tar -zxvf hadoop-3.2.2.tar.gz
打开环境变量配置:
vi /etc/profile
设置hadoop位置:
export HADOOP_HOME=/soft/hadoop-3.2.2
export PATH=${PATH}:${HADOOP_HOME}/bin
刷新配置文件及查看hadoop版本信息:
source /etc/profile
hadoop version
5.修改配置文件
- cd /soft/hadoop-3.2.2/etc/hadoop 修改 hadoop-env.sh
export JAVA_HOME=/soft/jdk1.8.0_181
- cd /soft/hadoop-3.2.2/etc/hadoop 修改core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.189.19:9002</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/root/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
3.cd /soft/hadoop-3.2.2/etc/hadoop 修改 hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--不是root用户也可以写文件到hdfs-->
<property>
<name>dfs.permissions</name>
<value>false</value> <!--关闭防火墙-->
</property>
<property>
<name>dfs.namenode.http.address</name>
<value>192.168.189.19:9870</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/root/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/root/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
</configuration>
4修改start-dfs.sh 顶部添加以下相同
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
5.修改stop-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
6.修改 start-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
7.修改 stop-yarn.sh
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
8.新建文件夹
mkdir /root/hadoop
mkdir /root/hadoop/tmp
mkdir /root/hadoop/var
mkdir /root/hadoop/dfs
mkdir /root/hadoop/dfs/name
mkdir /root/hadoop/dfs/data
9.修改mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>192.168.189.19:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/root/hadoop/var</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
10.修改yarn-site.xml
<configuration>
<!--指定Yarn的老大(ResourceManager)的地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.189.19</value>
</property>
<!--RM对客户端暴露的地址,客户端通过该地址向RM提交应用程序等-->
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<!--RM对AM暴露的地址,AM通过地址想RM申请资源,释放资源等-->
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<!--RM对外暴露的web http地址,用户可通过该地址在浏览器中查看集群信息-->
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<!--web https 地址-->
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<!-- RM对NM暴露地址,NM通过该地址向RM汇报心跳,领取任务等-->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<!--管理员可以通过该地址向RM发送管理命令等-->
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<!--NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--可申请的最大内存资源,以MB为单位-->
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
<discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1024</value>
</property>
<!--yarn.nodemanager.vmem-check-enabled这个的意思是忽略虚拟内存的检查,如果你是安装在虚拟机上,
这个配置很有用,配上去之后后续操作不容易出问题。如果是实体机上,并且内存够多,可以将这个配置去掉-->
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
11.配置vi /etc/selinux/config 修改SELINUX=enforcing更改为SELINUX=disabled
6.ssh免密登录
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
ssh localhost
7.修改 /etc/hosts
echo "192.168.189.19 server1" >> /etc/hosts
重启网络 :service network restart
8.hdfs运行
1.格式化文件系统
cd /soft/hadoop-3.2.2/bin
./hdfs namenode -format
- 启动 NameNode 守护进程和 DataNode 守护进程:
cd ../sbin
./start-dfs.sh
停止:./stop-dfs.sh
3.jps查看服务状态 出现下列四项
5108 SecondaryNameNode
4809 NameNode
5306 Jps
4927 DataNode
4.浏览 NameNode 的 Web 界面
http://192.168.189.19:9870/
5.启动 ResourceManager 守护进程和 NodeManager 守护进程:
./start-yarn.sh
停止守护进程:./stop-yarn.sh
6.启动 ResourceManager 守护进程和 NodeManager 守护进程:
http://192.168.189.19:8088/cluster
http://192.168.189.19:8042/node
7.jps查看服务状态
5108 SecondaryNameNode
4809 NameNode
6026 Jps
4927 DataNode
在原来基础上多如下两个:
5551 ResourceManager
5671 NodeManager
9…开放防火墙端口: 或停止防火墙:systemctl stop firewalld;查看防火墙状态 systemctl status firewalld
firewall-cmd --zone=public --add-port=9870/tcp --permanent
firewall-cmd --zone=public --add-port=8088/tcp --permanent
firewall-cmd --zone=public --add-port=8032/tcp --permanent
firewall-cmd --zone=public --add-port=8088/tcp --permanent
firewall-cmd --reload
10.错误时修改: core-site.xml 文件
原来是端口9000已经被占用,解决办法有两个,第一种:查找占用端口的进程,kill掉它。如果已占用进程需要使用9000端口,可用第二种方法:修改core-site.xml文件,把9000改成其他如9001.
我用的是第一种方法,首先找出占用9000端口的进程,然后kill掉它
netstat -anp|grep 9000
查看端口占用: netstat -tpnl