Hadoop环境搭建
常用命令
[liqiang@Gargantua ~]$ cd $HADOOP_HOME;pwd
/home/liqiang/app/hadoop
【启动、停止】
[liqiang@Gargantua ~]$ cd $HADOOP_HOME/sbin
# 启动 hdfs 、启动 yarn
start-dfs.sh
start-yarn.sh
start-all.sh
# 关闭
stop-alll.sh
stop-dfs.sh
stop-yarn.sh
【操作hdfs文件】
[liqiang@Gargantua ~]$ cd $HADOOP_HOME/bin
# path可以是绝对路径,也可以是相对路径。不指定path则操作当前用户工作主目录
hdfs dfs -ls # 列出工作主目录下的信息
hdfs dfs -ls / # 列出hdfs根路径下的信息
hdfs dfs -ls /input 【hadoop dfs -ls】
hdfs dfs -cat /input/wc.data 【hadoop dfs -cat】
hdfs dfs -text /input/data.lzo [可用于查看压缩文件,不会乱码]
hdfs dfs -mkdir /input 【hadoop dfs -mkdir】
hdfs dfs -put wc.log /input 【hadoop dfs -put】
hdfs dfs -get /input ~/data 【hadoop dfs -get】
hdfs dfs -rm [-r] [-f] <uri> # 删除目录或文件,-r -f不能组合成-rf
hdfs dfs -rm -r -f /test # 删除根目录下的test目录
hdfs dfs -rmdir /test # 删除目录:只能删除空目录
【运行jar】
bin/hadoop jar xxx.jar grep input output ‘dfs[a-z.]+’
[当设置环境变量后]
yarn jar xxx.jar wordcount /input /output
更多命令总结:HDFS常用命令
HADOOP_HOME 下的目录
bin目录下有如下可执行文件,如以上操作hdfs文件 hdfs dfs命令、yarn 命令
$HADOOP_HOME/bin 【/home/liqiang/app/hadoop/bin】
-rwxr-xr-x 1 liqiang liqiang 8707 Jan 3 2021 hadoop 【cd hadoop: Not a directory】
-rwxr-xr-x 1 liqiang liqiang 11274 Jan 3 2021 hdfs
-rwxr-xr-x 1 liqiang liqiang 6237 Jan 3 2021 mapred
-rwxr-xr-x 1 liqiang liqiang 12112 Jan 3 2021 yarn
sbin目录下启动 、停止的命令
$HADOOP_HOME/sbin 【/home/liqiang/app/hadoop/sbin】
-rwxr-xr-x 1 liqiang liqiang 2756 Jan 3 2021 distribute-exclude.sh
drwxr-xr-x 4 liqiang liqiang 4096 Jan 3 2021 FederationStateStore
-rwxr-xr-x 1 liqiang liqiang 1983 Jan 3 2021 hadoop-daemon.sh
-rwxr-xr-x 1 liqiang liqiang 2522 Jan 3 2021 hadoop-daemons.sh
-rwxr-xr-x 1 liqiang liqiang 1542 Jan 3 2021 httpfs.sh
-rwxr-xr-x 1 liqiang liqiang 1500 Jan 3 2021 kms.sh
-rwxr-xr-x 1 liqiang liqiang 1841 Jan 3 2021 mr-jobhistory-daemon.sh
-rwxr-xr-x 1 liqiang liqiang 2086 Jan 3 2021 refresh-namenodes.sh
-rwxr-xr-x 1 liqiang liqiang 2221 Jan 3 2021 start-all.sh 【启动全部】
-rwxr-xr-x 1 liqiang liqiang 1880 Jan 3 2021 start-balancer.sh
-rwxr-xr-x 1 liqiang liqiang 5170 Jan 3 2021 start-dfs.sh 【启动hdfs】
-rwxr-xr-x 1 liqiang liqiang 1793 Jan 3 2021 start-secure-dns.sh
-rwxr-xr-x 1 liqiang liqiang 3342 Jan 3 2021 start-yarn.sh 【启动yarn】
-rwxr-xr-x 1 liqiang liqiang 2166 Jan 3 2021 stop-all.sh 【停止全部】
-rwxr-xr-x 1 liqiang liqiang 1783 Jan 3 2021 stop-balancer.sh
-rwxr-xr-x 1 liqiang liqiang 3898 Jan 3 2021 stop-dfs.sh
-rwxr-xr-x 1 liqiang liqiang 1756 Jan 3 2021 stop-secure-dns.sh
-rwxr-xr-x 1 liqiang liqiang 3083 Jan 3 2021 stop-yarn.sh
-rwxr-xr-x 1 liqiang liqiang 1982 Jan 3 2021 workers.sh
-rwxr-xr-x 1 liqiang liqiang 1814 Jan 3 2021 yarn-daemon.sh
-rwxr-xr-x 1 liqiang liqiang 2328 Jan 3 2021 yarn-daemons.sh
etc/hadoop 目录下是关于hadoop的配置文件
$HADOOP_HOME/etc/hadoop 【/home/liqiang/app/hadoop/etc/hadoop】
# 常用:
-rw-r--r-- 1 liqiang liqiang 16356 Dec 28 02:13 hadoop-env.sh 【JAVA_HOME、HADOOP_PID_DIR】
-rw-r--r-- 1 liqiang liqiang 634 Dec 31 01:20 core-site.xml 【fs.defaultFS即hdfs对外提供的ip:端口、:NN数据目录...】【按需配置(9000)而9870是web端保持默认即可】
-rw-r--r-- 1 liqiang liqiang 1881 Jan 9 20:51 hdfs-site.xml 【dfs.replication即block副本数量】
-rw-r--r-- 1 liqiang liqiang 10 Dec 28 01:09 workers 【指定DN启动的hosts (是文件 Not a directory)】
-rw-r--r-- 1 liqiang liqiang 1764 Jan 3 2021 mapred-env.sh
-rw-r--r-- 1 liqiang liqiang 519 Dec 28 20:15 mapred-site.xml 【指定mr计算框架、运行时classpath目录等】
-rw-r--r-- 1 liqiang liqiang 6272 Jan 3 2021 yarn-env.sh
-rw-r--r-- 1 liqiang liqiang 1456 Dec 28 20:18 yarn-site.xml 【yarn作业web界面端口默认8088(123)】
-rw-r--r-- 1 liqiang liqiang 9213 Jan 3 2021 capacity-scheduler.xml
-rw-r--r-- 1 liqiang liqiang 1335 Jan 3 2021 configuration.xsl
-rw-r--r-- 1 liqiang liqiang 1940 Jan 3 2021 container-executor.cfg
-rw-r--r-- 1 liqiang liqiang 3321 Jan 3 2021 hadoop-metrics2.properties
-rw-r--r-- 1 liqiang liqiang 11392 Jan 3 2021 hadoop-policy.xml
-rw-r--r-- 1 liqiang liqiang 3414 Jan 3 2021 hadoop-user-functions.sh.example
-rw-r--r-- 1 liqiang liqiang 1484 Jan 3 2021 httpfs-env.sh
-rw-r--r-- 1 liqiang liqiang 1657 Jan 3 2021 httpfs-log4j.properties
-rw-r--r-- 1 liqiang liqiang 21 Jan 3 2021 httpfs-signature.secret
-rw-r--r-- 1 liqiang liqiang 620 Jan 3 2021 httpfs-site.xml
-rw-r--r-- 1 liqiang liqiang 3518 Jan 3 2021 kms-acls.xml
-rw-r--r-- 1 liqiang liqiang 1351 Jan 3 2021 kms-env.sh
-rw-r--r-- 1 liqiang liqiang 1860 Jan 3 2021 kms-log4j.properties
-rw-r--r-- 1 liqiang liqiang 682 Jan 3 2021 kms-site.xml
-rw-r--r-- 1 liqiang liqiang 14713 Jan 3 2021 log4j.properties
drwxr-xr-x 2 liqiang liqiang 4096 Jan 3 2021 shellprofile.d
-rw-r--r-- 1 liqiang liqiang 2316 Jan 3 2021 ssl-client.xml.example
-rw-r--r-- 1 liqiang liqiang 2697 Jan 3 2021 ssl-server.xml.example
-rw-r--r-- 1 liqiang liqiang 2642 Jan 3 2021 user_ec_policies.xml.template
-rw-r--r-- 1 liqiang liqiang 4113 Jan 3 2021 mapred-queues.xml.template
-rw-r--r-- 1 liqiang liqiang 2591 Jan 3 2021 yarnservice-log4j.properties
share/hadoop 目录下是关于hdfs、mr、yarn… 的各种jar包
drwxr-xr-x 2 liqiang liqiang 4096 Jan 3 2021 client
drwxr-xr-x 6 liqiang liqiang 4096 Jan 3 2021 common
drwxr-xr-x 6 liqiang liqiang 4096 Jan 3 2021 hdfs
drwxr-xr-x 6 liqiang liqiang 4096 Jan 3 2021 mapreduce
drwxr-xr-x 6 liqiang liqiang 4096 Jan 3 2021 tools
drwxr-xr-x 8 liqiang liqiang 4096 Jan 3 2021 yarn
如下文中提到 使用 find ./ -name ‘*example*’ 查找官方提供mr案例的jar其实就在
~/app/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar
logs 目录下是关于hadoop的日志
如部署后namenode启动失败,可:
[liqiang@Gargantua ~]$ cd $HADOOP_HOME/logs;ll
-rw-rw-r-- 1 liqiang liqiang 1256356 Jan 9 21:56 hadoop-liqiang-namenode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang 1294353 Jan 9 21:56 hadoop-liqiang-secondarynamenode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang 623236 Jan 9 20:55 hadoop-liqiang-datanode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang 731136 Jan 9 22:16 hadoop-liqiang-nodemanager-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang 551680 Jan 9 22:06 hadoop-liqiang-resourcemanager-Gargantua.log
查看 namenode 日志
# 全部加载并根据 ERROR过滤保留前后10行 【可以补充下针对大文件更高效的方式 less】
cat hadoop-liqiang-namenode-Gargantua.log|grep ERROR -C10
# 实时查看
tail -200f hadoop-liqiang-namenode-Gargantua.log
部署过程参考官方文档
https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
配置jdk
hadoop依赖jdk
jdk部署参考上篇博文:https://editor.csdn.net/md/?articleId=121432910
本机JAVA_HOME:/usr/java/jdk1.8.0_121
安装&配置
上传hadoop
官网hadoop.apache.org下载hadoop-3.2.2.tar.gz,版本:3.2.2
rz / xftp上传到服务器/tmp下。 [ /tmp目录会定时清除没有使用的文件,默认30天。]
新建用户、工作目录
useradd liqiang
id liqiang
su - liqiang
mkdir sourcecode software app log data lib tmp
移动解压
[root@Gargantua tmp]# mv /tmp/hadoop-3.2.2.tar.gz /home/liqiang/software/
[root@Gargantua tmp]# tar -zxvf /home/liqiang/software/hadoop-3.2.2.tar.gz -C /home/liqiang/app/
【-C 解压到指定目录】
[root@Gargantua app]# ln -s hadoop-3.2.2/ hadoop
【root用户执行的解压和创建软连接,所以需要将权限修正】
[root@Gargantua app]# chown liqiang:liqiang hadoop
[root@Gargantua app]# chown liqiang:liqiang hadoop/*
[root@Gargantua hadoop]# chown liqiang:liqiang app/*
hadoop解压后文件夹说明
bin # hadoop相关命令
etc # 配置文件
include
lib # 存放Hadoop的本地库(对数据进行压缩解压缩功能)
libexec
sbin # hadoop服务启动停止脚本 【sbin/start-dfs.sh、sbin/start-yarn.sh】
share # 存放Hadoop的依赖jar包、文档、和官方案例
logs # 日志文件
配置ssh: 远程登录
[liqiang@Gargantua ~]$ ssh
ssh ssh-add ssh-agent ssh-copy-id sshd sshd-keygen ssh-keygen ssh-keyscan
[liqiang@Gargantua ~]$ ssh-keygen 【ssh与keygen之间只有-没有空格】
【三次回车,得到公钥和私钥】
Your identification has been saved in /home/liqiang/.ssh/id_rsa.
Your public key has been saved in /home/liqiang/.ssh/id_rsa.pub.
将公钥追加到 ~/.ssh/authorized_keys
[liqiang@Gargantua ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
刷新权限,否则ssh连接时仍然会提示输入密码
[liqiang@Gargantua ~]$ chmod 0600 ~/.ssh/authorized_keys
# 测试
ssh Gargantua # 第一次需要输入yes
# 如果还需要输入密码,那么ssh配置或者600权限有问题。
配置伪分布式模式
配置JAVA_HOME
hadoop不能识别到/etc/profile里的JAVA_HOME,需要在hadoop-env.sh自己配置一遍
[root@Gargantua /]# su - liqiang
[liqiang@Gargantua ~]$ cd app/hadoop/etc/hadoop
[liqiang@Gargantua hadoop]$ vi hadoop-env.sh
# 加入以下配置
export JAVA_HOME=/usr/java/jdk1.8.0_121
export HADOOP_PID_DIR=/home/liqiang/tmp
- 配置HADOOP_PID_DIR 的目的:
- 查看 /tmp目录下发现,hadoop好几个数据和文件都是默认存放在 /tmp下,而 /tmp下的内容是会被定期删除的,非常危险。
假如未做HADOOP_PID_DIR(hadoop-env.sh) 和 hadoop.tmp.dir(core-site.xml)配置:
[liqiang@Gargantua hadoop]$ ll /tmp/
drwxr-xr-x 3 liqiang liqiang 4096 Dec 27 22:57 hadoop
drwxrwxr-x 4 liqiang liqiang 4096 Dec 27 22:57 hadoop-liqiang 【# 默认的数据存储目录hadoop.tmp.dir,在core-site.xml改掉】
-rw-rw-r-- 1 liqiang liqiang 5 Dec 28 00:27 hadoop-liqiang-datanode.pid 【# 默认pid文件的存储目录,在hadoop-env.sh改掉】
-rw-rw-r-- 1 liqiang liqiang 5 Dec 28 00:27 hadoop-liqiang-namenode.pid
-rw-rw-r-- 1 liqiang liqiang 5 Dec 28 00:27 hadoop-liqiang-secondarynamenode.pid- 以上pid文件集群中记录每个进程启动的pid编号。当执行sbin/stop-dfs.sh或stop-all.sh等命令的时候,hadoop会根据pid文件找到每个进程的pid,然后执行kill -9 pid来关闭进程。如果 pid 文件丢失,将会导致节点在执行stop命令是并没有关闭,进而无法重启使新配置文件生效。(记得修改hadoop-env.sh前先stop-all不然stop时就已经找不到pid文件了)
配置启动节点
hadoop的配置文件都在HADOOP_HOME/etc目录下:
[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop/etc/hadoop
[liqiang@Gargantua hadoop]$ vi core-site.xml
[liqiang@Gargantua hadoop]$ vi hdfs-site.xml
core-site.xml
<configuration>
<!--配置NameNode的启动端点-->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://Gargantua:9000</value>
</property>
<!-- 配置hadoop namenode 数据目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/liqiang/tmp/hadoop-${user.name}</value>
</property>
</configuration>
说明:
-
fs.defaultFS 代表配置namecode以Gargantua启动,[确保 /etc/profile中已有Gargantua配置本机内网Ip]。而datanode需要在 ~/app/hadoop/etc/hadoop/workers 中将 localhost修改为Gargantua。将NN,SNN,DN节点都以同一host而不是ip启动,可以方便如果ip变更,则只需要在hosts文件中修改一次即可。
-
hadoop.tmp.dir 最好在hadoop第一次启动前做好配置,否则namenode 数据目录按默认目录是在 /tmp下,而 /tmp下的内容是会被定期删除的,非常危险。
如果在没有变更配置的情况下已经启动过,再直接改配置文件的此项配置,会导致NameNode服务启动失败。hadoop的每个进程每次启动都会生成一个版本文件,确保除第一次启动外这个文件只有一个(改完配置记得把文件也复制过去)【需要1.stop-all.sh,2.修改core-site.xml,3.拷贝hadoop文件到该去的地方,4.start-all.sh】
hdfs-site.xml
<configuration>
<!--配置block副本数量,默认为3-->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<!--配置Secondary NameNode的启动端点(http协议)-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Gargantua:9868</value>
</property>
<!--配置Secondary NameNode的启动端点(https协议)-->
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>Gargantua:9869</value>
</property>
<!--如果一台机器挂载了多个数据盘,那么需要做一下配置:
<property>
<name>dfs.datanode.data.dir</name>
<value>/data01/dfs/dn,/data02/dfs/dn,/data03/dfs/dn</value>
</property>
-->
</configuration>
说明:
- 配置Secondary以Gargantua启动。
- 如果一台机器挂载了多块物理磁盘,需要对dfs.datanode.data.dir做配置。
例如:一块磁盘的写能力30M/s,装载10快磁盘后,就是300M/s,写同样的数据,后者更高效。多块磁盘是为了存储空间更大,且高效率的读写IO。 肯定比单块磁盘更快。所以在生产上,DataNode的dfs.datanode.data.dir参数必须根据机器的实际情况配置。
启动
写在前:若启动失败(logs)
如果某个节点启动不成功,可以在尝试在$HADOOP_HOME/logs 里找节点对应的日志文件
[liqiang@Gargantua ~]$ cd $HADOOP_HOME/logs;ll
-rw-rw-r-- 1 liqiang liqiang 145676 Dec 28 02:14 hadoop-liqiang-datanode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang 692 Dec 28 02:14 hadoop-liqiang-datanode-Gargantua.out
-rw-rw-r-- 1 liqiang liqiang 692 Dec 28 02:06 hadoop-liqiang-datanode-Gargantua.out.1
-rw-rw-r-- 1 liqiang liqiang 183747 Dec 28 02:15 hadoop-liqiang-namenode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang 692 Dec 28 02:14 hadoop-liqiang-namenode-Gargantua.out
-rw-rw-r-- 1 liqiang liqiang 692 Dec 28 02:06 hadoop-liqiang-namenode-Gargantua.out.1
-rw-rw-r-- 1 liqiang liqiang 154067 Dec 28 02:15 hadoop-liqiang-secondarynamenode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang 692 Dec 28 02:14 hadoop-liqiang-secondarynamenode-Gargantua.out
-rw-rw-r-- 1 liqiang liqiang 692 Dec 28 02:06 hadoop-liqiang-secondarynamenode-Gargantua.out.1
如namenode启动失败:
cat hadoop-liqiang-namenode-Gargantua.log|grep ERROR -C10
或
tail -200f hadoop-liqiang-namenode-Gargantua.log
1.格式化hdfs文件目录
[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop
[liqiang@Gargantua hadoop]$ bin/hdfs namenode -format
2.启动主节点和数据节点
NameNode:存储的是数据的元数据,例如文件名称,路径,大小等信息。
DataNode:存储的是数据。
[liqiang@Gargantua hadoop]$ sbin/start-dfs.sh
Starting namenodes on [Gargantua]
Starting datanodes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Starting secondary namenodes [Gargantua]
【启动成功之后,使用jps查看 、 或 ps -ef|grep hadoop】
[liqiang@Gargantua hadoop]$ jps
5425 SecondaryNameNode
5205 DataNode
5558 Jps
5087 NameNode
web端访问:外网IP:9870
hadoop2.x hdfs web界面 默认端口号是 50070
hadoop3.x 默认端口号 9870
操作 hdfs
[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -mkdir /user
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -ls /
drwxr-xr-x - liqiang supergroup 0 2021-12-27 22:42 /user
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -mkdir /user/liqiang
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -mkdir input 【会默认也在liqiang目录下创建input】
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -ls /user/liqiang
drwxr-xr-x - liqiang supergroup 0 2021-12-27 22:45 /user/liqiang/input
再次确认一下需要复制过去的xml文件权限
[liqiang@Gargantua hadoop]$ ll etc/hadoop/*.xml
-rw-r--r-- 1 liqiang liqiang 9213 Jan 3 2021 etc/hadoop/capacity-scheduler.xml
-rw-r--r-- 1 liqiang liqiang 884 Dec 27 22:17 etc/hadoop/core-site.xml
-rw-r--r-- 1 liqiang liqiang 11392 Jan 3 2021 etc/hadoop/hadoop-policy.xml
-rw-r--r-- 1 liqiang liqiang 867 Dec 27 22:24 etc/hadoop/hdfs-site.xml
-rw-r--r-- 1 liqiang liqiang 620 Jan 3 2021 etc/hadoop/httpfs-site.xml
-rw-r--r-- 1 liqiang liqiang 3518 Jan 3 2021 etc/hadoop/kms-acls.xml
-rw-r--r-- 1 liqiang liqiang 682 Jan 3 2021 etc/hadoop/kms-site.xml
-rw-r--r-- 1 liqiang liqiang 758 Jan 3 2021 etc/hadoop/mapred-site.xml
-rw-r--r-- 1 liqiang liqiang 690 Jan 3 2021 etc/hadoop/yarn-site.xml
复制到hdfs
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -ls /user/liqiang/input/ 【查看input中文件】
Run some of the examples provided:尝试运行一个计算实例
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output ‘dfs[a-z.]+’
将hdfs中output的内容下载查看or 直接cat查看
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -get output output
[liqiang@Gargantua hadoop]$ cat output/*
1 dfsadmin
1 dfs.replication
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication
启动Yern
准备
hadoop的配置文件都在 $HADOOP_HOME/etc目录下:
[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop/etc/hadoop
[liqiang@Gargantua hadoop]$ vi core-site.xml
[liqiang@Gargantua hadoop]$ vi hdfs-site.xml
[liqiang@Gargantua hadoop]$ vi mapred-site.xml
[liqiang@Gargantua hadoop]$ vi yarn-site.xml
mapred-site.xml
<configuration>
<!--配置mr作业的计算框架-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--配置mr运行application的classpath目录-->
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!--NodeManager上运行的附属服务-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--环境变量白名单-->
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
<!--yarn作业web界面,如果不配置,则采用yarn的8088端口极容易遭到8088挖矿,记得改端口-->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>Gargantua:8088</value>
</property>
</configuration>
启动yarn
[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop
[liqiang@Gargantua hadoop]$ sbin/start-yarn.sh
web端访问:外网ip:8088/cluster
(8088已改8123)
wordcount案例
环境变量
[liqiang@Gargantua ~]$ vi .bashrc
export HADOOP_HOME=/home/liqiang/app/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
[liqiang@Gargantua ~]$ . .bashrc
[liqiang@Gargantua ~]$ which hadoop
~/app/hadoop/bin/hadoop
准备一个文件并上传到hdfs
[liqiang@Gargantua ~]$ vi wc.log
jepson
ruoze
xingxing
a b c
b a c
jepson
gargantua a b c
[liqiang@Gargantua ~]$ hdfs dfs -mkdir /input
[liqiang@Gargantua ~]$ hdfs dfs -put wc.log /input
[liqiang@Gargantua ~]$ hdfs dfs -cat /input/wc.log
找到官方案例jar包
[liqiang@Gargantua ~]$ find ./ -name ‘*example*’
./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar
尝试找到命令运行此jar包
[liqiang@Gargantua ~]$ hadoop --help
jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.
[liqiang@Gargantua ~]$ yarn jar ./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar
【# RunJar jarFile [mainClass] args... 还需要写出这个jar的主类main】
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
运行jar
wordcount /input /output :指定主类 wc 和输入/输出目录
- [liqiang@Gargantua ~]$ yarn jar ./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /input /output
正常来说,接下来就会执行作业。
2021-11-26 22:43:01,129 INFO mapreduce.JobSubmitter: number of splits:1 【切片是1 规则 】
// ...
File System Counters
Job Counters
Launched map tasks=1 【map 任务 1】
Launched reduce tasks=1 【reduce 任务 1】
Map-Reduce Framework
// ...
Shuffle Errors
// ...
查看wc结果
[liqiang@Gargantua ~]$ hdfs dfs -cat /output/part-r-00000
a 3
b 3
c 3
gargantua 1
jepson 2
ruoze 2
xingxing 1
如果遇到
/bin/bash: /bin/java: No such file or directory
尝试建立一个软连接指向我们到JAVA_HOME(mysql时有个【mysql.sock】也是这样临时解决)
ln -s /usr/java/jdk1.8.0_121/bin/java /bin/java
一般是有些bug ,该读javahome的时候没有去读取系统环境变量,所以找不到java;
或者某些默认配置还在/tmp下,而不久被系统删掉所以找不到。