创建虚拟机集群
参考 https://blog.csdn.net/weixin_44371237/article/details/123974335
hadoop1.x hadoop2.x hadoop3.x 区别
配置hadoop环境
进入
/etc/profile.d
新建my_env.sh,按照自己的安装路径
export JAVA_HOME=/usr/java/jdk1.8.0_311-amd64
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export HADOOP_HOME=/opt/module/hadoop-3.1.3
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
配置定义文件
自定义配置文件都存在 /opt/module/hadoop-3.1.3/etc/hadoop
默认全配置在安装包里面 hadoop-3.1.3\share\doc\hadoop\hadoop-mapreduce-client\hadoop-mapreduce-client-core
core-site.xml
<configuration>
<!--指定NameNode地址-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop100:8020</value>
</property>
<!--指定hadoop数据存储目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/data</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<!--NameNode web端访问地址-->
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop100:9870</value>
</property>
<!--2NameNode web端访问地址-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop102:9868</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!--指定MR走shuffle-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--指定ResourceManager地址-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop101</value>
</property>
<!--环境变量的继承-->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!--开启日志聚集功能-->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!--设置日志聚集服务器地址-->
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop100:19888/jobhistory/logs</value>
</property>
<!--设置日志保留时间为7天-->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<!--指定MapReduce程序运行在yarn上-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--历史服务器端地址-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop100:10020</value>
</property>
<!--历史服务器web地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop100:19888</value>
</property>
</configuration>
配置启动文件
进入 /opt/module/hadoop-3.1.3/sbin,在以下文件加入
start-dfs.sh与stop-dfs.sh
#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
start-yarn.sh与stop-yarn.sh
#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
配置workers
到安装目录 /opt/module/hadoop-3.1.3/etc/hadoop,编辑workers,添加如下
hadoop100
hadoop101
hadoop102
配置Windows hosts
进入Windows主机 C:\Windows\System32\drivers\etc
注意要配置host-only网卡的IP,而不是NAT网卡的IP,不然会造成宿主访问不了虚拟机
192.168.56.100 hadoop100
192.168.56.101 hadoop101
192.168.56.102 hadoop102
配置Linux hosts
进入Linux主机 /etc/hosts
10.0.2.15 hadoop100
10.0.2.4 hadoop101
10.0.2.5 hadoop102
配置ifcfg-eth0
进入Linux主机/etc/sysconfig/network-scripts
DEVICE="eth0"
BOOTPROTO="dhcp"
ONBOOT="yes"
TYPE="Ethernet"
PERSISTENT_DHCLIENT="yes"
配置ifcfg-eth1
进入Linux主机/etc/sysconfig/network-scripts
NM_CONTROLLED=yes
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.56.101
NETMASK=255.255.255.0
DEVICE=eth1
PEERDNS=no
配置ifcfg-lo
进入Linux主机/etc/sysconfig/network-scripts
DEVICE=lo
IPADDR=127.0.0.1
NETMASK=255.0.0.0
NETWORK=127.0.0.0
BROADCAST=127.255.255.255
ONBOOT=yes
NAME=loopback
初始化
进入第1台机器hadoop100的hadoop安装目录
cd /opt/module/hadoop-3.1.3
初始化,多出以下两个目录
hdfs namenode -format
启动HDFS
进入第1台机器hadoop100的hadoop安装目录
sbin/start-dfs.sh
访问web页面 http://hadoop100:9870/
启动resourcemanager
进入第2台机器hadoop101的hadoop安装目录
cd /opt/module/hadoop-3.1.3
启动,web页面 http://hadoop101:8088/
sbin/start-yarn.sh
启动历史服务器
进入第1台机器hadoop100的hadoop安装目录
bin/mapred --daemon start historyserver
停止历史服务器
进入第1台机器hadoop100的hadoop安装目录
bin/mapred --daemon stop historyserver
注意:hadoop namenode -format格式化时格式化了多次造成那么spaceID不一致
https://blog.csdn.net/m0_54849873/article/details/124479234
HDFS的Shell操作
查看所有命令 hadoop fs
新建目录
bin/hadoop fs -mkdir /myinput
查看结果 http://hadoop100:9870/explorer.html#/
上传文件,-put:等同于copyFromLocal,生产环境更习惯用put
bin/hadoop fs -put /chen.txt /myinput
上传后实际存储路径
/opt/module/hadoop-3.1.3/data/dfs/data/current/BP-1431786564-192.168.56.100-1658914351556/current/finalized/subdir0/subdir0
追加
bin/hadoop fs -appendToFile /chen2.txt /myinput/chen.txt
统计单词,part-r-00000里面是统计结果
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /myinput /myoutput
Linux集群设置ssh免密码登录
https://blog.csdn.net/weixin_44371237/article/details/125969677
hadoop集群启停脚本
https://blog.csdn.net/weixin_44371237/article/details/126040977