Hadoop环境搭建过程记录

Hadoop环境搭建

常用命令
[liqiang@Gargantua ~]$ cd $HADOOP_HOME;pwd
/home/liqiang/app/hadoop

【启动、停止】
[liqiang@Gargantua ~]$ cd $HADOOP_HOME/sbin     

# 启动 hdfs 、启动 yarn 
 start-dfs.sh 
 start-yarn.sh 
 start-all.sh 
# 关闭 
 stop-alll.sh
 stop-dfs.sh
 stop-yarn.sh

【操作hdfs文件】
[liqiang@Gargantua ~]$ cd $HADOOP_HOME/bin   

# path可以是绝对路径,也可以是相对路径。不指定path则操作当前用户工作主目录

 hdfs dfs -ls       # 列出工作主目录下的信息
 hdfs dfs -ls /     # 列出hdfs根路径下的信息

 hdfs dfs -ls /input            【hadoop dfs -ls】
 hdfs dfs -cat /input/wc.data    【hadoop dfs -cat】
 hdfs dfs -text /input/data.lzo   [可用于查看压缩文件,不会乱码]
 
 hdfs dfs -mkdir /input         【hadoop dfs -mkdir】
 hdfs dfs -put wc.log /input    【hadoop dfs -put】
 hdfs dfs -get /input ~/data    【hadoop dfs -get】
 
 hdfs dfs -rm [-r] [-f] <uri>  # 删除目录或文件,-r -f不能组合成-rf
 hdfs dfs -rm -r -f /test      # 删除根目录下的test目录
 hdfs dfs -rmdir /test         # 删除目录:只能删除空目录

【运行jar】
 bin/hadoop jar xxx.jar grep input output ‘dfs[a-z.]+’
[当设置环境变量后]
 yarn jar xxx.jar wordcount /input /output

更多命令总结:HDFS常用命令

HADOOP_HOME 下的目录
bin目录下有如下可执行文件,如以上操作hdfs文件 hdfs dfs命令、yarn 命令
$HADOOP_HOME/bin 【/home/liqiang/app/hadoop/bin】 

-rwxr-xr-x 1 liqiang liqiang   8707 Jan  3  2021 hadoop   【cd hadoop: Not a directory】
-rwxr-xr-x 1 liqiang liqiang  11274 Jan  3  2021 hdfs
-rwxr-xr-x 1 liqiang liqiang   6237 Jan  3  2021 mapred
-rwxr-xr-x 1 liqiang liqiang  12112 Jan  3  2021 yarn
sbin目录下启动 、停止的命令
$HADOOP_HOME/sbin 【/home/liqiang/app/hadoop/sbin】 

-rwxr-xr-x 1 liqiang liqiang 2756 Jan  3  2021 distribute-exclude.sh
drwxr-xr-x 4 liqiang liqiang 4096 Jan  3  2021 FederationStateStore
-rwxr-xr-x 1 liqiang liqiang 1983 Jan  3  2021 hadoop-daemon.sh
-rwxr-xr-x 1 liqiang liqiang 2522 Jan  3  2021 hadoop-daemons.sh
-rwxr-xr-x 1 liqiang liqiang 1542 Jan  3  2021 httpfs.sh
-rwxr-xr-x 1 liqiang liqiang 1500 Jan  3  2021 kms.sh
-rwxr-xr-x 1 liqiang liqiang 1841 Jan  3  2021 mr-jobhistory-daemon.sh
-rwxr-xr-x 1 liqiang liqiang 2086 Jan  3  2021 refresh-namenodes.sh
-rwxr-xr-x 1 liqiang liqiang 2221 Jan  3  2021 start-all.sh       【启动全部】
-rwxr-xr-x 1 liqiang liqiang 1880 Jan  3  2021 start-balancer.sh
-rwxr-xr-x 1 liqiang liqiang 5170 Jan  3  2021 start-dfs.sh       【启动hdfs】
-rwxr-xr-x 1 liqiang liqiang 1793 Jan  3  2021 start-secure-dns.sh
-rwxr-xr-x 1 liqiang liqiang 3342 Jan  3  2021 start-yarn.sh      【启动yarn】
-rwxr-xr-x 1 liqiang liqiang 2166 Jan  3  2021 stop-all.sh        【停止全部】
-rwxr-xr-x 1 liqiang liqiang 1783 Jan  3  2021 stop-balancer.sh
-rwxr-xr-x 1 liqiang liqiang 3898 Jan  3  2021 stop-dfs.sh
-rwxr-xr-x 1 liqiang liqiang 1756 Jan  3  2021 stop-secure-dns.sh
-rwxr-xr-x 1 liqiang liqiang 3083 Jan  3  2021 stop-yarn.sh
-rwxr-xr-x 1 liqiang liqiang 1982 Jan  3  2021 workers.sh
-rwxr-xr-x 1 liqiang liqiang 1814 Jan  3  2021 yarn-daemon.sh
-rwxr-xr-x 1 liqiang liqiang 2328 Jan  3  2021 yarn-daemons.sh
etc/hadoop 目录下是关于hadoop的配置文件
$HADOOP_HOME/etc/hadoop  【/home/liqiang/app/hadoop/etc/hadoop】
# 常用:
-rw-r--r-- 1 liqiang liqiang 16356 Dec 28 02:13 hadoop-env.sh   【JAVA_HOME、HADOOP_PID_DIR】
-rw-r--r-- 1 liqiang liqiang   634 Dec 31 01:20 core-site.xml   【fs.defaultFS即hdfs对外提供的ip:端口、:NN数据目录...】【按需配置(9000)而9870是web端保持默认即可】
-rw-r--r-- 1 liqiang liqiang  1881 Jan  9 20:51 hdfs-site.xml   【dfs.replication即block副本数量】
-rw-r--r-- 1 liqiang liqiang    10 Dec 28 01:09 workers         【指定DN启动的hosts (是文件 Not a directory)】

-rw-r--r-- 1 liqiang liqiang  1764 Jan  3  2021 mapred-env.sh
-rw-r--r-- 1 liqiang liqiang   519 Dec 28 20:15 mapred-site.xml 【指定mr计算框架、运行时classpath目录等】
-rw-r--r-- 1 liqiang liqiang  6272 Jan  3  2021 yarn-env.sh
-rw-r--r-- 1 liqiang liqiang  1456 Dec 28 20:18 yarn-site.xml   【yarn作业web界面端口默认8088(123)】

-rw-r--r-- 1 liqiang liqiang  9213 Jan  3  2021 capacity-scheduler.xml
-rw-r--r-- 1 liqiang liqiang  1335 Jan  3  2021 configuration.xsl
-rw-r--r-- 1 liqiang liqiang  1940 Jan  3  2021 container-executor.cfg
-rw-r--r-- 1 liqiang liqiang  3321 Jan  3  2021 hadoop-metrics2.properties
-rw-r--r-- 1 liqiang liqiang 11392 Jan  3  2021 hadoop-policy.xml
-rw-r--r-- 1 liqiang liqiang  3414 Jan  3  2021 hadoop-user-functions.sh.example
-rw-r--r-- 1 liqiang liqiang  1484 Jan  3  2021 httpfs-env.sh
-rw-r--r-- 1 liqiang liqiang  1657 Jan  3  2021 httpfs-log4j.properties
-rw-r--r-- 1 liqiang liqiang    21 Jan  3  2021 httpfs-signature.secret
-rw-r--r-- 1 liqiang liqiang   620 Jan  3  2021 httpfs-site.xml
-rw-r--r-- 1 liqiang liqiang  3518 Jan  3  2021 kms-acls.xml
-rw-r--r-- 1 liqiang liqiang  1351 Jan  3  2021 kms-env.sh
-rw-r--r-- 1 liqiang liqiang  1860 Jan  3  2021 kms-log4j.properties
-rw-r--r-- 1 liqiang liqiang   682 Jan  3  2021 kms-site.xml
-rw-r--r-- 1 liqiang liqiang 14713 Jan  3  2021 log4j.properties
drwxr-xr-x 2 liqiang liqiang  4096 Jan  3  2021 shellprofile.d
-rw-r--r-- 1 liqiang liqiang  2316 Jan  3  2021 ssl-client.xml.example
-rw-r--r-- 1 liqiang liqiang  2697 Jan  3  2021 ssl-server.xml.example
-rw-r--r-- 1 liqiang liqiang  2642 Jan  3  2021 user_ec_policies.xml.template

-rw-r--r-- 1 liqiang liqiang  4113 Jan  3  2021 mapred-queues.xml.template
-rw-r--r-- 1 liqiang liqiang  2591 Jan  3  2021 yarnservice-log4j.properties
share/hadoop 目录下是关于hdfs、mr、yarn… 的各种jar包
drwxr-xr-x 2 liqiang liqiang 4096 Jan  3  2021 client
drwxr-xr-x 6 liqiang liqiang 4096 Jan  3  2021 common
drwxr-xr-x 6 liqiang liqiang 4096 Jan  3  2021 hdfs
drwxr-xr-x 6 liqiang liqiang 4096 Jan  3  2021 mapreduce
drwxr-xr-x 6 liqiang liqiang 4096 Jan  3  2021 tools
drwxr-xr-x 8 liqiang liqiang 4096 Jan  3  2021 yarn

如下文中提到 使用 find ./ -name ‘*example*’ 查找官方提供mr案例的jar其实就在

~/app/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar
logs 目录下是关于hadoop的日志

如部署后namenode启动失败,可:

[liqiang@Gargantua ~]$ cd $HADOOP_HOME/logs;ll
-rw-rw-r-- 1 liqiang liqiang 1256356 Jan  9 21:56 hadoop-liqiang-namenode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang 1294353 Jan  9 21:56 hadoop-liqiang-secondarynamenode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang  623236 Jan  9 20:55 hadoop-liqiang-datanode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang  731136 Jan  9 22:16 hadoop-liqiang-nodemanager-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang  551680 Jan  9 22:06 hadoop-liqiang-resourcemanager-Gargantua.log

查看 namenode 日志

# 全部加载并根据 ERROR过滤保留前后10行 【可以补充下针对大文件更高效的方式 less】
cat hadoop-liqiang-namenode-Gargantua.log|grep ERROR -C10

# 实时查看
tail  -200f  hadoop-liqiang-namenode-Gargantua.log

部署过程参考官方文档

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

配置jdk

hadoop依赖jdk
jdk部署参考上篇博文:https://editor.csdn.net/md/?articleId=121432910
本机JAVA_HOME:/usr/java/jdk1.8.0_121

安装&配置

上传hadoop

官网hadoop.apache.org下载hadoop-3.2.2.tar.gz,版本:3.2.2
rz / xftp上传到服务器/tmp下。 [ /tmp目录会定时清除没有使用的文件,默认30天。]

新建用户、工作目录

useradd liqiang
id liqiang
su - liqiang
mkdir sourcecode software app log data lib tmp

移动解压
[root@Gargantua tmp]# mv /tmp/hadoop-3.2.2.tar.gz /home/liqiang/software/

[root@Gargantua tmp]# tar -zxvf /home/liqiang/software/hadoop-3.2.2.tar.gz -C /home/liqiang/app/
【-C 解压到指定目录】

[root@Gargantua app]# ln -s hadoop-3.2.2/ hadoop

【root用户执行的解压和创建软连接,所以需要将权限修正】
[root@Gargantua app]# chown liqiang:liqiang hadoop
[root@Gargantua app]# chown liqiang:liqiang hadoop/*
[root@Gargantua hadoop]# chown liqiang:liqiang app/*

hadoop解压后文件夹说明

bin      # hadoop相关命令
etc      # 配置文件
include
lib      # 存放Hadoop的本地库(对数据进行压缩解压缩功能)
libexec
sbin    # hadoop服务启动停止脚本 【sbin/start-dfs.sh、sbin/start-yarn.sh】
share   # 存放Hadoop的依赖jar包、文档、和官方案例
logs   # 日志文件

配置ssh: 远程登录

[liqiang@Gargantua ~]$ ssh
ssh    ssh-add     ssh-agent    ssh-copy-id  sshd    sshd-keygen  ssh-keygen 	ssh-keyscan  
[liqiang@Gargantua ~]$ ssh-keygen 【ssh与keygen之间只有-没有空格】

【三次回车,得到公钥和私钥】
Your identification has been saved in /home/liqiang/.ssh/id_rsa.
Your public key has been saved in /home/liqiang/.ssh/id_rsa.pub.

将公钥追加到 ~/.ssh/authorized_keys

[liqiang@Gargantua ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

刷新权限,否则ssh连接时仍然会提示输入密码

[liqiang@Gargantua ~]$ chmod 0600 ~/.ssh/authorized_keys
# 测试
ssh Gargantua # 第一次需要输入yes
# 如果还需要输入密码,那么ssh配置或者600权限有问题。

配置伪分布式模式

配置JAVA_HOME

hadoop不能识别到/etc/profile里的JAVA_HOME,需要在hadoop-env.sh自己配置一遍

[root@Gargantua /]# su - liqiang
[liqiang@Gargantua ~]$ cd app/hadoop/etc/hadoop
[liqiang@Gargantua hadoop]$ vi hadoop-env.sh

# 加入以下配置
export JAVA_HOME=/usr/java/jdk1.8.0_121
export HADOOP_PID_DIR=/home/liqiang/tmp
  • 配置HADOOP_PID_DIR 的目的:
  • 查看 /tmp目录下发现,hadoop好几个数据和文件都是默认存放在 /tmp下,而 /tmp下的内容是会被定期删除的,非常危险。
    假如未做HADOOP_PID_DIR(hadoop-env.sh) 和 hadoop.tmp.dir(core-site.xml)配置:
    [liqiang@Gargantua hadoop]$ ll /tmp/
    drwxr-xr-x 3 liqiang liqiang 4096 Dec 27 22:57 hadoop
    drwxrwxr-x 4 liqiang liqiang 4096 Dec 27 22:57 hadoop-liqiang 【# 默认的数据存储目录hadoop.tmp.dir,在core-site.xml改掉】
    -rw-rw-r-- 1 liqiang liqiang 5 Dec 28 00:27 hadoop-liqiang-datanode.pid 【# 默认pid文件的存储目录,在hadoop-env.sh改掉】
    -rw-rw-r-- 1 liqiang liqiang 5 Dec 28 00:27 hadoop-liqiang-namenode.pid
    -rw-rw-r-- 1 liqiang liqiang 5 Dec 28 00:27 hadoop-liqiang-secondarynamenode.pid
  • 以上pid文件集群中记录每个进程启动的pid编号。当执行sbin/stop-dfs.sh或stop-all.sh等命令的时候,hadoop会根据pid文件找到每个进程的pid,然后执行kill -9 pid来关闭进程。如果 pid 文件丢失,将会导致节点在执行stop命令是并没有关闭,进而无法重启使新配置文件生效。(记得修改hadoop-env.sh前先stop-all不然stop时就已经找不到pid文件了)
配置启动节点

hadoop的配置文件都在HADOOP_HOME/etc目录下:

[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop/etc/hadoop
[liqiang@Gargantua hadoop]$ vi core-site.xml
[liqiang@Gargantua hadoop]$ vi hdfs-site.xml

core-site.xml
<configuration>
    <!--配置NameNode的启动端点-->
    <configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://Gargantua:9000</value>
    </property>
        <!-- 配置hadoop namenode 数据目录 -->
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/home/liqiang/tmp/hadoop-${user.name}</value>
    </property>
</configuration>

说明:

  • fs.defaultFS 代表配置namecode以Gargantua启动,[确保 /etc/profile中已有Gargantua配置本机内网Ip]。而datanode需要在 ~/app/hadoop/etc/hadoop/workers 中将 localhost修改为Gargantua。将NN,SNN,DN节点都以同一host而不是ip启动,可以方便如果ip变更,则只需要在hosts文件中修改一次即可。

  • hadoop.tmp.dir 最好在hadoop第一次启动前做好配置,否则namenode 数据目录按默认目录是在 /tmp下,而 /tmp下的内容是会被定期删除的,非常危险。
    如果在没有变更配置的情况下已经启动过,再直接改配置文件的此项配置,会导致NameNode服务启动失败。hadoop的每个进程每次启动都会生成一个版本文件,确保除第一次启动外这个文件只有一个(改完配置记得把文件也复制过去)【需要1.stop-all.sh,2.修改core-site.xml,3.拷贝hadoop文件到该去的地方,4.start-all.sh】

hdfs-site.xml
<configuration>
    <!--配置block副本数量,默认为3-->
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <!--配置Secondary NameNode的启动端点(http协议)-->
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>Gargantua:9868</value>
    </property>
    <!--配置Secondary NameNode的启动端点(https协议)-->
    <property>
        <name>dfs.namenode.secondary.https-address</name>
        <value>Gargantua:9869</value>
    </property>
    <!--如果一台机器挂载了多个数据盘,那么需要做一下配置:
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/data01/dfs/dn,/data02/dfs/dn,/data03/dfs/dn</value>
        </property>
    -->
</configuration>

说明:

  • 配置Secondary以Gargantua启动。
  • 如果一台机器挂载了多块物理磁盘,需要对dfs.datanode.data.dir做配置。
    例如:一块磁盘的写能力30M/s,装载10快磁盘后,就是300M/s,写同样的数据,后者更高效。多块磁盘是为了存储空间更大,且高效率的读写IO。 肯定比单块磁盘更快。所以在生产上,DataNode的dfs.datanode.data.dir参数必须根据机器的实际情况配置。

启动

写在前:若启动失败(logs)

如果某个节点启动不成功,可以在尝试在$HADOOP_HOME/logs 里找节点对应的日志文件

[liqiang@Gargantua ~]$ cd $HADOOP_HOME/logs;ll

-rw-rw-r-- 1 liqiang liqiang 145676 Dec 28 02:14 hadoop-liqiang-datanode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang    692 Dec 28 02:14 hadoop-liqiang-datanode-Gargantua.out
-rw-rw-r-- 1 liqiang liqiang    692 Dec 28 02:06 hadoop-liqiang-datanode-Gargantua.out.1

-rw-rw-r-- 1 liqiang liqiang 183747 Dec 28 02:15 hadoop-liqiang-namenode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang    692 Dec 28 02:14 hadoop-liqiang-namenode-Gargantua.out
-rw-rw-r-- 1 liqiang liqiang    692 Dec 28 02:06 hadoop-liqiang-namenode-Gargantua.out.1

-rw-rw-r-- 1 liqiang liqiang 154067 Dec 28 02:15 hadoop-liqiang-secondarynamenode-Gargantua.log
-rw-rw-r-- 1 liqiang liqiang    692 Dec 28 02:14 hadoop-liqiang-secondarynamenode-Gargantua.out
-rw-rw-r-- 1 liqiang liqiang    692 Dec 28 02:06 hadoop-liqiang-secondarynamenode-Gargantua.out.1

如namenode启动失败:

cat hadoop-liqiang-namenode-Gargantua.log|grep ERROR -C10

tail -200f hadoop-liqiang-namenode-Gargantua.log

1.格式化hdfs文件目录

[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop
[liqiang@Gargantua hadoop]$ bin/hdfs namenode -format

2.启动主节点和数据节点

NameNode:存储的是数据的元数据,例如文件名称,路径,大小等信息。
DataNode:存储的是数据。

[liqiang@Gargantua hadoop]$ sbin/start-dfs.sh
Starting namenodes on [Gargantua]
Starting datanodes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Starting secondary namenodes [Gargantua]

【启动成功之后,使用jps查看 、 或 ps -ef|grep hadoop】

[liqiang@Gargantua hadoop]$ jps
5425 SecondaryNameNode
5205 DataNode
5558 Jps
5087 NameNode
web端访问:外网IP:9870

hadoop2.x hdfs web界面 默认端口号是 50070
hadoop3.x 默认端口号 9870

操作 hdfs

[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop

[liqiang@Gargantua hadoop]$ bin/hdfs dfs -mkdir /user
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -ls /
drwxr-xr-x - liqiang supergroup 0 2021-12-27 22:42 /user

[liqiang@Gargantua hadoop]$ bin/hdfs dfs -mkdir /user/liqiang
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -mkdir input 【会默认也在liqiang目录下创建input】

[liqiang@Gargantua hadoop]$ bin/hdfs dfs -ls /user/liqiang
drwxr-xr-x - liqiang supergroup 0 2021-12-27 22:45 /user/liqiang/input

再次确认一下需要复制过去的xml文件权限

[liqiang@Gargantua hadoop]$ ll etc/hadoop/*.xml

-rw-r--r-- 1 liqiang liqiang  9213 Jan  3  2021 etc/hadoop/capacity-scheduler.xml
-rw-r--r-- 1 liqiang liqiang   884 Dec 27 22:17 etc/hadoop/core-site.xml
-rw-r--r-- 1 liqiang liqiang 11392 Jan  3  2021 etc/hadoop/hadoop-policy.xml
-rw-r--r-- 1 liqiang liqiang   867 Dec 27 22:24 etc/hadoop/hdfs-site.xml
-rw-r--r-- 1 liqiang liqiang   620 Jan  3  2021 etc/hadoop/httpfs-site.xml
-rw-r--r-- 1 liqiang liqiang  3518 Jan  3  2021 etc/hadoop/kms-acls.xml
-rw-r--r-- 1 liqiang liqiang   682 Jan  3  2021 etc/hadoop/kms-site.xml
-rw-r--r-- 1 liqiang liqiang   758 Jan  3  2021 etc/hadoop/mapred-site.xml
-rw-r--r-- 1 liqiang liqiang   690 Jan  3  2021 etc/hadoop/yarn-site.xml

复制到hdfs

[liqiang@Gargantua hadoop]$ bin/hdfs dfs -put etc/hadoop/*.xml input
[liqiang@Gargantua hadoop]$ bin/hdfs dfs -ls /user/liqiang/input/ 【查看input中文件】

Run some of the examples provided:尝试运行一个计算实例

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar grep input output ‘dfs[a-z.]+’

将hdfs中output的内容下载查看or 直接cat查看

[liqiang@Gargantua hadoop]$ bin/hdfs dfs -get output output
[liqiang@Gargantua hadoop]$ cat output/*
1 dfsadmin
1 dfs.replication

[liqiang@Gargantua hadoop]$ bin/hdfs dfs -cat output/*
1 dfsadmin
1 dfs.replication

启动Yern

准备

hadoop的配置文件都在 $HADOOP_HOME/etc目录下:

[liqiang@Gargantua hadoop]$ pwd
/home/liqiang/app/hadoop/etc/hadoop
[liqiang@Gargantua hadoop]$ vi core-site.xml
[liqiang@Gargantua hadoop]$ vi hdfs-site.xml

[liqiang@Gargantua hadoop]$ vi mapred-site.xml
[liqiang@Gargantua hadoop]$ vi yarn-site.xml
mapred-site.xml
<configuration>
    <!--配置mr作业的计算框架-->
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <!--配置mr运行application的classpath目录-->
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>
yarn-site.xml
<configuration>
    <!--NodeManager上运行的附属服务-->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <!--环境变量白名单-->
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
    </property>
    <!--yarn作业web界面,如果不配置,则采用yarn的8088端口极容易遭到8088挖矿,记得改端口-->
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>Gargantua:8088</value>
    </property>
</configuration>
启动yarn
 [liqiang@Gargantua hadoop]$ pwd
 /home/liqiang/app/hadoop 
 [liqiang@Gargantua hadoop]$ sbin/start-yarn.sh
web端访问:外网ip:8088/cluster

(8088已改8123)

wordcount案例

环境变量

[liqiang@Gargantua ~]$ vi .bashrc

export HADOOP_HOME=/home/liqiang/app/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

[liqiang@Gargantua ~]$ . .bashrc
[liqiang@Gargantua ~]$ which hadoop
~/app/hadoop/bin/hadoop

准备一个文件并上传到hdfs

[liqiang@Gargantua ~]$ vi wc.log

jepson
ruoze
xingxing
a b c
b a c
jepson
gargantua a b c

[liqiang@Gargantua ~]$ hdfs dfs -mkdir /input
[liqiang@Gargantua ~]$ hdfs dfs -put wc.log /input
[liqiang@Gargantua ~]$ hdfs dfs -cat /input/wc.log

找到官方案例jar包

[liqiang@Gargantua ~]$ find ./ -name ‘*example*’

./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar

尝试找到命令运行此jar包
[liqiang@Gargantua ~]$ hadoop --help

jar <jar>     run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command.

[liqiang@Gargantua ~]$ yarn jar ./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar

【# RunJar jarFile [mainClass] args... 还需要写出这个jar的主类main】

An example program must be given as the first argument.
Valid program names are:
	aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
	aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
	bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
	dbcount: An example job that count the pageview counts from a database.
	distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
	grep: A map/reduce program that counts the matches of a regex in the input.
    join: A job that effects a join over sorted, equally partitioned datasets
	multifilewc: A job that counts words from several files.
	pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
	pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
	randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
	randomwriter: A map/reduce program that writes 10GB of random data per node.
	secondarysort: An example defining a secondary sort to the reduce.
	sort: A map/reduce program that sorts the data written by the random writer.
	sudoku: A sudoku solver.
	teragen: Generate data for the terasort
	terasort: Run the terasort
	teravalidate: Checking results of terasort
	wordcount: A map/reduce program that counts the words in the input files.
	wordmean: A map/reduce program that counts the average length of the words in the input files.
	wordmedian: A map/reduce program that counts the median length of the words in the input files.
	wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
运行jar

wordcount /input /output :指定主类 wc 和输入/输出目录

  • [liqiang@Gargantua ~]$ yarn jar ./app/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /input /output

正常来说,接下来就会执行作业。

2021-11-26 22:43:01,129 INFO mapreduce.JobSubmitter: number of splits:1  【切片是1 规则 】
// ...
	 File System Counters
	 Job Counters 
            Launched map tasks=1  【map 任务 1】
            Launched reduce tasks=1 【reduce 任务 1】
	 Map-Reduce Framework
		// ...
	 Shuffle Errors
// ...
查看wc结果

[liqiang@Gargantua ~]$ hdfs dfs -cat /output/part-r-00000

a	3
b	3
c	3
gargantua	1
jepson	2
ruoze	2
xingxing	1

如果遇到
/bin/bash: /bin/java: No such file or directory
尝试建立一个软连接指向我们到JAVA_HOME(mysql时有个【mysql.sock】也是这样临时解决)
ln -s /usr/java/jdk1.8.0_121/bin/java /bin/java

一般是有些bug ,该读javahome的时候没有去读取系统环境变量,所以找不到java;
或者某些默认配置还在/tmp下,而不久被系统删掉所以找不到。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值