伪分布式模式
前戏
-
jdk和hadoop环境变量配置
-
确保防火墙关闭:
systemctl status firewalld
;返回 inactive ;关闭命令systemctl stop firewalld
;防止自启动命令systemctl disable firewalld
- 报错:-bash: /usr/bin/systemctl: Permission denied
- 解决:
sudo chmod -R 750 /usr/bin/systemctl
- 报错:ERROR:systemctl:Unit firewalld.service could not be found.
- 解决:
yum install firewalld firewall-config
-
确保NAT模式和静态IP的确定(参考:centos7虚拟机静态ip设置详细教程(超全超详细亲测有效))
-
vim /etc/sysconfig/network-scripts/ifcfg-ens33
-
将
BOOTPROTO="dhcp"
改为BOOTPROTO="static"
-
-
确保/etc/hosts文件里,ip和hostname的映射关系
vim /etc/hosts
- 在最后添加
192.168.211.101 主机名
- (重启网络
systemctl restart network
)可选
-
确保免密登录localhost有效
ssh-keygen -t rsa
;后直接entercd ~/.ssh/
ssh-copy-id 主机名
——yes
——密码
- 验证:
ssh 主机名
——无密码登录——exit
退出 - (ssh操作命令:
systemctl restart/start/status/stop sshd
)
-
确保JDK与Hadoop的环境、变量配置
中戏
-
配置文件修改(未完待续):
-
core-site.xml
cd $HADOOP_HOME/etc/hadoop/
——vim core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://lanr:9820</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop-3.3.1/tmp</value> </property> </configuration>
-
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>lanr:9868</value> </property> <property> <name>dfs.namenode.http-address</name> <value>lanr:9870</value> </property> </configuration>
-
hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_321 export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root
-
-
格式化集群
- 首先确保hadoop-3.3.1目录下没有tmp这个文件夹
hdfs namenode -format
-
启动集群
start-dfs.sh
——jps
查看进程
-
WebUI查看集群状态
- 192.168.211.101:9870
案例演示:wordcount
-
数据准备
cd ~ mkdir input && cd input echo "hello world hadoop linux hadoop" >> file1 echo "hadoop linux world hadoop linux hadoop" >> file1 echo "hello world hadoop linux hadoop" >> file1 echo "hello world hadoop linux hadoop" >> file1 echo "hello good programmer hadoop linux hadoop" >> file2 echo "hello world hadoop linux hadoop ok nice" >> file2
-
上传到集群
cd ~ hdfs dfs -put input/ / hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.1.jar wordcount /input /output # 该input、output文件夹是在hdfs上的,不是linux本地的 且输出路径不能已存在 hdfs dfs -cat /output/* # 查看结果