一 前期准备
- 安装虚拟机 Linux安装(略)
- 配置网络地址(NAT)(略)
[root@hadoopNode1 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
注:确定IP地址,网段
-
修改hostname配置(略)
[root@hadoopNode1 ~]# vi /etc/hostname
注:主算机名使用英文字母组成, 设置好后就不能随意的修改
-
修改hosts映射配置()
例如:192.168.138.100 HadoopNode1
-
关闭防火墙(略)
关闭:
开机禁用:
[root@hadoopNode1 ~]#systemctl stop firewalld [root@hadoopNode1 ~]#systemctl disable firewalld
重起操作系统生效
-
创建用户ambow,并创建密码ambow(略)
[root@hadoopNode1 ~]# useradd ambow
[root@hadoopNode1 ~]# passwd ambow
- 设置ambow用户具有root权限 sudo
使用root用户,修改 /etc/sudoers 文件,找到下面一行,在root下面添加一行,如下所示:
[ambow@ master soft]# vi /etc/sudoers
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
ambow ALL=(ALL) ALL
修改完毕,现在可以用ambow帐号登录,然后用命令 su - ambow,即可获得root权限进行操作。
-
安装JDK
tar包:
[ambow@hadoopNode1 ~]$ pwd /home/ambow [ambow@hadoopNode1 ~]$ mkdir soft [ambow@hadoopNode1 ~]$ mkdir app [ambow@hadoopNode1 ~]$ ls app soft [ambow@hadoopNode1 ~]$ tree . . ├── app └── soft ├── hadoop-2.7.3.tar.gz ├── jdk-8u121-linux-x64.tar.gz └── zookeeper-3.4.6.tar.gz 2 directories, 3 files [ambow@hadoopNode1 ~]$ pwd /home/ambow [ambow@hadoopNode1 ~]$ tar -zxvf ./soft/jdk-8u121-linux-x64.tar.gz -C ./app/
配置JDK:
[ambow@hadoopNode1 jdk1.8.0_121]$ vi ~/.bash_profile
[ambow@hadoopNode1 jdk1.8.0_121]$ cat ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
JAVA_HOME=/home/ambow/app/jdk1.8.0_121
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin
export PATH
export JAVA_HOME
[ambow@hadoopNode1 jdk1.8.0_121]$
[ambow@hadoopNode1 jdk1.8.0_121]$ source ~/.bash_profile
source ~/.bash_profile 让配置文件生效
-
重启操作系统
reboot
Hadoop 三种模式安装1.本地模式 用于开发和调式
2.伪分布式 模拟一个小规模的集群
一台主机模拟多主机启动 NameNode DataNode ResouceManger,nodeManager
3.集群模式:(生产环境)多台主机,分别充当NaameNode,DataNode 。。。。
Hadoop本地模式安装:
Hadoop本地模式安装
-
解压Hadoop软件
[ambow@hadoopNode1 sbin]$ tar -zxvf ~/soft/hadoop-2.7.3.tar.gz -C ~/app/
-
配置Hadoop环境变量
```shell
[ambow@hadoopNode1 hadoop-2.7.3]$ vi ~/.bash_profile
[ambow@hadoopNode1 hadoop-2.7.3]$ cat ~/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
JAVA_HOME=/home/ambow/app/jdk1.8.0_121
HADOOP_HOME=/home/ambow/app/hadoop-2.7.3
PATH=$PATH:$HOME/.local/bin:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export PATH
export JAVA_HOME
export HADOOP_HOME
```
-
环境变量生效
[ambow@hadoopNode1 hadoop-2.7.3]$ source ~/.bash_profile
-
测试
新建测试的数据文件: ~/data/mydata.txt
测试语法格式:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar 类名 输入 输出目录
[ambow@hadoopNode1 mydata.out]$ hadoop jar ~/app/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount ~/data/mydata.txt ~/data/mydata.out2
伪分布模式配置:
-
JDK安装
-
Hadoop安装
-
配置Hadoop的$HADOOP_HOME/etc/hadoop/core-siter.xml
fs.defaultFS
hadoop.tmp.dir[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/core-site.xml
<configuration> <!-- 配置默认FS hadoop3.X 默认端口为9820 hadoop2.X 默认端口为8020 hadoop1.X 默认端口为9000 一般伪分布设置为localhost:8020 --> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property> <!-- 指定hadoop运行时产生文件存储的目录 会自动创建 不建议默认 --> <property> <name>hadoop.tmp.dir</name> <value>/home/ambow/hdfs/data</value> </property> </configuration>
-
配置 hdfs-siter.xml
dfs.replication 设置块的副本个数: 伪分布模式只能设置为1 默认为3
[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<configuration> <property> <!-- 配置每个block的副本个数 默认3个 当是单节点时配置为1 不能配置态多,态多反而降低效果 --> <name>dfs.replication</name> <!-- 伪分布式只能配1个副本 --> <value>1</value> </property> </configuration>
-
格式化
[ambow@hadoopNode1 ~]$ hadoop namenode -format
一般只格式化一次,如果要再格式化,建议要把各dataNode节点的数据删除要,防止DataNode和NameNode的集群ID号不一致而无法启动
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-pto6Dc2v-1575559953078)(.\hadoop_imag\1565854449952.png)]
6.启动
hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
hadoop-daemon.sh stop namenode
hadoop-daemon.sh stop datanode
7.查看进程
jps
- logs日志文件
~/soft/hadop/logs
9:WEB访问查看
http://192.168.100.100:50070/
-
YARN上运行MR 要配置两个配置文件
配置mapred-siter.xml
[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration> <property> <!-- 指定MapReduce使用Yarn资源管理框架 --> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
-
配置yarn-siter.xml
yarn.resourcemanger.hostname
yarn.nodemaager.aux-service
<configuration>
<property>
<!-- 指定yaran主要管理一个机节点 主机名 -->
<name>yarn.resourcemanager.hostname</name>
<value>hadoopNode1</value>
</property>
<property>
<!-- 使用mapreduce_shuffle服务 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
- 启动 yarn
[ambow@hadoopNode1 data]$ yarn-daemon.sh start resourcemanager
[ambow@hadoopNode1 data]$ yarn-daemon.sh start nodemanager
- 测试MR操作
上传Linux系统中的 ~/data/mydata.txt文件 至 HDFS文件系统/user/ambow目录中去
[ambow@hadoopNode1 data]$ hadoop dfs -put ~/data/mydata.txt /user/ambow
对hdfs的文件使用yan来进行wordcount操作:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /user/ambow/mydata.txt /user/ambow/output/wc/
分布式集群安装
1.修/etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.200 hadoopNode1
192.168.100.201 hadoopNode2
192.168.100.202 hadoopNode3
192.168.100.203 hadoopNode4
2.以伪分布模来 克隆二个虚拟机
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zh4Yl3nz-1575560034539)(C:\Users\LDG\AppData\Local\Temp\1565926490995.png)]
3.分别配置各虚拟机节点的:IP地址,主机名,映射文件
[root@hadoopNode2 ~]# vi /etc/hostname
[root@hadoopNode2 ~]# vi /etc/sysconfig/network-scripts/ifcfg-ens33
[root@hadoopNode2 ~]# vi /etc/hosts
4.验证配置
[root@hadoopNode2 ~]# ping hadoopNode1
PING hadoopNode1 (192.168.100.200) 56(84) bytes of data.
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=1 ttl=64 time=0.190 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=2 ttl=64 time=0.230 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=3 ttl=64 time=0.263 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=4 ttl=64 time=0.227 ms
^C64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=5 ttl=64 time=0.195 ms
64 bytes from hadoopNode1 (192.168.100.200): icmp_seq=6 ttl=64 time=0.268 ms
^C
--- hadoopNode1 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.190/0.228/0.268/0.035 ms
[root@hadoopNode2 ~]# ping hadoopNode2
PING hadoopNode2 (192.168.100.201) 56(84) bytes of data.
64 bytes from hadoopNode2 (192.168.100.201): icmp_seq=1 ttl=64 time=0.011 ms
64 bytes from hadoopNode2 (192.168.100.201): icmp_seq=2 ttl=64 time=0.022 ms
^C
--- hadoopNode2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.011/0.016/0.022/0.006 ms
[root@hadoopNode2 ~]# ping hadoopNode3
PING hadoopNode3 (192.168.100.202) 56(84) bytes of data.
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=1 ttl=64 time=0.246 ms
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=2 ttl=64 time=0.218 ms
64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=3 ttl=64 time=0.218 ms
^C64 bytes from hadoopNode3 (192.168.100.202): icmp_seq=4 ttl=64 time=0.227 ms
^C
--- hadoopNode3 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 0.218/0.227/0.246/0.015 ms
Master—slave主从架构
- 在master节点上进设置免密登陆
1).生成Maste节点公钥和私钥
ssh-keygen -t rsa
2)分发
ssh-copy-id localhost
ssh-copy-id hadoopNOde1
ssh-copy-id hadoopNOde2
ssh-copy-id hadoopNOde3
3)验证 在Master节点上登陆各节点,看是否需要密码
ssh hadoopNode2
ssh hadoopNode3
6.配置核心文件 core-siter.xml
[ambow@hadoopNode1 hadoop]$ vi core-site.xml
<configuration>
<!-- 配置默认FS hadoop3.X 默认端口为9820 hadoop2.X 默认端口为8020 hadoop1.X 默认端口为9000 一般伪分布设置为localhost:8020 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopNode1:8020</value>
</property>
<!-- 指定hadoop运行时产生文件存储的目录 会自动创建 不建议默认 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/ambow/hdfs/data</value>
</property>
</configuration>
- hdfs-site.xml
<configuration>
<property>
<!-- 配置每个block的副本个数 默认3个 当是单节点时配置为1 不能配置态多,态多反而降低效果 -->
<name>dfs.replication</name>
<!-- 伪分布式只能配1个副本 -->
<value>3</value>
</property>
<property>
<!-- 设置第辅助主节点 2NN -->
<name>dfs.namenode.secondary.http-address</name>
<value>hadoopNode2:50090</value>
</property>
<property>
<!-- 检查点的路径 -->
<name>dfs.namenode.checkpoint.dir</name>
<value>/home/ambow/hdfs/namesecondary</value>
</property>
</configuration>
8.mapred-site.xml
<configuration>
<property>
<!-- 指定MapReduce使用Yarn资源管理框架 -->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<!-- 指定yaran主要管理一个机节点 -->
<name>yarn.resourcemanager.hostname</name>
<value>hadoopNode1</value>
</property>
<property>
<!-- 使用mapreduce_shuffle服务 -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
10.修改slaves文件 来指定当前集群中那些节点是DataNode节点 把节点的主机名添加到slaves文件中
[ambow@hadoopNode1 hadoop]$ vi $HADOOP_HOME/etc/hadoop/slaves
hadoopNode1
hadoopNode2
hadoopNode3
11.分发文件到其他节点
注:分发之前要停止所有服务
网络复制:语法: scp -r 源文件目录 用户名@主机名:目标路径
-r 递归复制
[ambow@hadoopNode1 hadoop]$ scp -r $HADOOP_HOME/etc/hadoop ambow@hadoopNode2:$HADOOP_HOME/etc/
[ambow@hadoopNode1 hadoop]$ scp -r $HADOOP_HOME/etc/hadoop ambow@hadoopNode3:$HADOOP_HOME/etc/
注意:分发结束前一定要停止服务
分发完后要格式化
12.测试
start-all.sh(启动)
stop-all.sh(停止)