前期准备
1. 创建三台虚拟机
2. 服务器主机名和IP配置(三台)
3. 修改每个主机的/etc/hosts文件,添加IP和主机名的对应关系(三台)
4. 管理节点到从节点配置无密码登录
5. 配置jdk 1.8(三台)
6. 关闭防火墙(三台)(永久关闭)
7. 关闭selinux(三台)
vi /etc/selinux/config
SELINUX=enforcing --> SELINUX=disabled
重启系统
安装流程
1. 传软件包到管理节点,在管理节点解压并配置
2. 将修改完的解压包,远程拷贝到所有的从节点
3. 启动软件
过程
-
软件包上传并解压
a) tar -zxvf hadoop-2.6.0-cdh5.14.0-with-centos6.9.tar.gz -C /export/servers/
-
查看hadoop支持的压缩方式以及本地库
a) cd /export/servers/hadoop-2.6.0-cdh5.14.0 bin/hadoop checknative
如果出现openssl为false,那么所有机器在线安装openssl即可
b) 安装openssl
yum -y install openssl-devel
./hadoop checknative
-
修改配置文件
①修改 core-site.xml:
cd /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop vim core-site.xml 在文件中加入以下内容: <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node01:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas</value> </property> <!-- 缓冲区大小,实际工作中根据服务器性能动态调整 --> <property> <name>io.file.buffer.size</name> <value>4096</value> </property> <!-- 开启hdfs的垃圾桶机制,删除掉的数据可以从垃圾桶中回收,单位分钟 --> <property> <name>fs.trash.interval</name> <value>10080</value> </property> </configuration>
②修改hdfs-site.xml:
cd /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop vim hdfs-site.xml 在文件中加入以下内容: <configuration> <!-- NameNode存储元数据信息的路径,实际工作中,一般先确定磁盘的挂载目录,然后多个目录用,进行分割 --> <!-- 集群动态上下线 <property> <name>dfs.hosts</name> <value>/export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop/accept_host</value> </property> <property> <name>dfs.hosts.exclude</name> <value>/export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop/deny_host</value> </property> --> <property> <name>dfs.namenode.secondary.http-address</name> <value>node01:50090</value> </property> <property> <name>dfs.namenode.http-address</name> <value>node01:50070</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/namenodeDatas</value> </property> <!-- 定义dataNode数据存储的节点位置,实际工作中,一般先确定磁盘的挂载目录,然后多个目录用,进行分割 --> <property> <name>dfs.datanode.data.dir</name> <value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/datanodeDatas</value> </property> <property> <name>dfs.namenode.edits.dir</name> <value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/nn/edits</value> </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/snn/name</value> </property> <property> <name>dfs.namenode.checkpoint.edits.dir</name> <value>file:///export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/nn/snn/edits</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> </configuration>
③修改Hadoop-env.sh:(不用改)
④修改mapred-site.xml:cd /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop vim mapred-site.xml i. cp mapred-site.xml.template mapred-site.xml ii. 修改 在文件中加入以下内容: <configuration> <property> <!--运行模式--> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <!--JVM重用 --> <name>mapreduce.job.ubertask.enable</name> <value>true</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>node01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node01:19888</value> </property> </configuration>
⑤修改yarn-site.xml:
cd /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop vim yarn-site.xml 在文件中加入以下内容: <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>node01</value> </property> <property> <!-- nodemanager 上的附属服务,只有配置成mapreduce_shuffle 才能运行--> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
⑥修改slaves:
cd /export/servers/hadoop-2.6.0-cdh5.14.0/etc/hadoop vim slaves 在文件中加入以下内容: node01 node02 node03
-
创建文件存放目录
mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/tempDatas mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/namenodeDatas mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/datanodeDatas mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/nn/edits mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/snn/name mkdir -p /export/servers/hadoop-2.6.0-cdh5.14.0/hadoopDatas/dfs/nn/snn/edits
-
安装包的分发
scp -r hadoop-2.6.0-cdh5.14.0/ node02:$PWD scp -r hadoop-2.6.0-cdh5.14.0/ node03:$PWD
-
配置hadoop环境变量
a) 创建文件/etc/profile.d/hadoop.sh 并编辑编辑内容: export HADOOP_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0 export PATH=$PATH:$HADOOP_HOME/bin
b) 刷新文件:source /etc/profile
-
格式化集群
在第一个节点执行: hadoop namenode -format
-
启动集群
进入/export/servers/hadoop-2.6.0-cdh5.14.0/sbin目录下: 方式一:每个组件独立启动: 1. 启动namenode node01节点: ./hadoop-daemon.sh start namenode 2. 启动datanode node01、02、03节点: ./hadoop-daemon.sh start datanode 3. 启动resourcemanager node01节点: ./yarn-daemon.sh start resourcemanager 4. 启动nodemanager node01、02、03节点: ./yarn-daemon.sh start nodemanager 方式二:单独启动: 1. 单独启动HDFS : ./start-dfs.sh 关闭HDFS : ./stop-dfs.sh 2. 单独启动Yarn: ./start-yarn.sh 关闭Yarn: ./stop-yarn.sh 方式三: 一键启动所有 启动: ./start-all.sh 关闭: ./stop-all.sh
-
浏览器查看启动页面
http://主节点ip:50070
http://主节点ip:8088