Ubuntu18.04 安装搭建 hadoop-3.3.0 集群

Ubuntu18.04 安装搭建 hadoop-3.3.0 集群

参考博文:https://blog.csdn.net/sunxiaoju/article/details/85222290?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522162443354216780261915061%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=162443354216780261915061&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2allbaidu_landing_v2~default-7-85222290.pc_search_result_control_group&utm_term=ubuntu%E6%90%AD%E5%BB%BAhadoop%E9%9B%86%E7%BE%A4&spm=1018.2226.3001.4187

背景介绍

时间:2021-06。

公司想发展大数据方面,于是为就搭建一个简单的 hadoop 集群来测试使用,一共买了 3 台电脑,用两台电脑搭建一个简单的集群测试,电脑都是使用 Ubuntu18.04 系统。

该教程是在搭建 apache kylin 项目实验背景下安装 hadoop 集群,在安装 hadoop 之前需要先安装配置好 Java 环境。

集群分配

系统/版本说明
软件版本
系统ubuntu18.04
JDK1.8.0_291
hadoop3.3.0
集群分析表

在搭建集群之前需要将所有电脑接入同一个局域网中,在本次实验中只有两台电脑,所以为通过一根网线直连两台电脑。连接后需要设置两台电脑的 IP 地址,设置完成后可以通过 Ping 命令两台电脑能够互相 ping 通。

主节点从节点
功能NameNode、DataNodeDataNode
IP 地址192.168.1.7192.168.1.8
主机名user-ThinkServer-TS80Xwatertek-thinkserver-ts80x
域名master.watertek.comslave1.watertek.com

连网安装工具

Ubuntu 刚安装的系统时还缺少一些开发的环境,在配置之前电脑需要提前连网安装所需工具,并且两台电脑都需要安装。

工具名称安装命令功能作用
vimsudo apt-get install vim终端编辑器,Ubuntu 初始的 vim 编辑器使用起来会比较不习惯,所以安装一些该编辑器后使用起来会很方便
gcc,g++sudo apt-get install build-essentialC/C++ 编译器,编译程序使用
net-toolssudo apt-get install net-tools网络工具,ifconfig,ping 等命令都会使用到
openssh-serversudo apt-get install openssh-server终端 ssh 远程连接工具,可以在终端通过命令远程访问其他电脑,这样就可以在一台电脑上操作了。

网络配置

  1. 主机名可以修改,修改 /etc/hostname 文件下的名称即可。每台电脑的主机名修改为不一样的名称。
  2. 域名定义需要修改 /etc/hosts 文件,如下代码所示,新增了两行域名定义。需要注意的是不能放在最下边,从注释行开始往下是配置 ipv6 的,需要添加到 ipv6 注释之上。ip 和域名之间必须是一个 tab,且域名后不能有空格,否则是 ping 不通的。需要将 2 台电脑都必须配置相同才能相互 ping 通。

master-hosts

#127.0.0.1  localhost
#127.0.1.1  user-ThinkServer-TS80X
# 新增
192.168.1.7 master.watertek.com
192.168.1.8 slave1.watertek.com

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
  1. 配置完成后需要重启启动网络。
sudo /etc/init.d/networking restart

之后两台电脑就可以相互 ping 通了。

hadoop-ping

建立 ssh 无密码远程登陆

ssh 生成密钥有 rsa 和 dsa 两种生成方式,默认使用 rsa 方式。

  1. 在 hadoop 主机 master(192.168.1.7,以下直接使用 master)上采用 rsa 的方式创建 ssh-key。(P 是大写的,后面 “” 表示五密码)
ssh-keygen -t rsa -P ""
  1. 在 hadoop 从机 slave1 (192.168.1.8,以下直接使用 slave1) 上使用同样命令生成密钥。
  2. 通过步骤 1 命令后会在 ~/.ssh 目录下生成两个文件 id_rsa 和 id_rsa.pub,这两个文件是成对出现的,可以进入到 ~/.ssh 目录下查看。
  3. 将 master 上的 id_rsa.pub 传至 salve1 上。
// 进入 slave1 上 ~/.ssh 目录
cd ~/.ssh

// 传 id_rsa.pub 文件,由于 slave1 上已经存在 id_rsa.pub 文件,所以不能用相同的名称。
scp id_rsa.pub master.watertek.com:~/.ssh/slave1_id_rsa.pub
  1. 在 master 上将 id_rsa.pub 和 slave1_id_rsa.pub 追加到 authorized_keys 授权文件中,开始是没有 authorized_keys 文件的。
cat *.pub >> authorized_keys
  1. 将 master 上 authorized_keys 文件传输之 slave1 上。
scp authorized_keys slave1.watertek.com:~/.ssh
  1. 查看一下 authorized_keys 文件权限,-rw-rw-r-- 表示权限正常不用修改,如果不是则需要修改一下权限。
// 查看文件权限信息,下写的 l
ll

// 修改权限
sudo chmod 664 authorized_keys

authorized_keys

  1. master 上 ssh 测试无密码登陆 slave1。

ssh-login

  1. slave1 上 ssh 测试无密码登陆 master.

slave1-ssh-login

  1. 如果无法登陆,查看 /home 下的用户权限是否是 751,最低权限 751,只能高不能低。

hadoop 安装配置步骤

在一些其他的安装教程中会创建一个 hadoop 用户,但是在本次实验中发现是可以不用创建的,当然创建用户也没啥影响。所以本次教程中就省略该步骤,需要创建的朋友可以看本文的参考博文。

下面的操作步骤在 master 上完成,master 上完成后再将配置好的 hadoop 拷贝之 slave1 上即可。

下载并解压
// 1. 将安装包解压至 /usr/local 目录下(安装包放在 /home/user/下载 目录下) 
sudo tar -zxvf hadoop-3.3.0.tar.gz -C /usr/local/  

// 2. 进入 /usr/local 目录下 
cd /usr/local  

// 3. 修改文件夹权限(user 为当前用户名) 
sudo chown -R user:user hadoop-3.3.0
配置环境变量
// 1. 用 vim 打开并编辑 ~/.bashrc 文件 
vi ~/.bashrc 

// 2. Shift + G 光标跳转之文件末尾并添加 hadoop 环境变量 
# hadoop
export HADOOP_HOME=/usr/local/hadoop-3.3.0
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_CLASS_PATH=$HADOOP_CONF_DIR
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

// 3. 保存并退出编辑文件,运行 ~/.bashrc 使配置的环境变量生效 
source ~/.bashrc
修改配置文件

以下需要修改文的文件在 hadoop-3.3.0/etc/hadoop 目录下。

  1. core-site.xml
  2. hadoop-env.sh
  3. hdfs-site.xml
  4. mapred-site.xml
  5. yarn-site.xml
  6. workers
// 进入 /usr/local/hadoop-3.3.0/etc/hadoop
cd /usr/local/hadoop-3.3.0/etc/hadoop
  1. 编辑 core-site.xml
vi core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master.watertek.com:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/hadoop-3.3.0/hdfs/tmp</value>
    </property>
</configuration>
  1. 编辑 hadoop-env.sh
vi hadoop-env.sh
# hadoop-env.sh 中配置 java 环境变量,需要绝对路径
export JAVA_HOME=/usr/local/jdk1.8.0_291
  1. 编辑 hdfs-site.xml
vi hdfs-site.xml
<configuration>
  <configuration>
    <property>
        <name>dfs.replication</name>
        <value>2</value><!--master 和 slave1 都设置为 datanode 节点,所以为 2-->
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/usr/local/hadoop-3.3.0/hdfs/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/usr/local/hadoop-3.3.0/hdfs/data</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>slave1.watertek.com:9001</value>
    </property>
    <property>
        <name>dfs.http.address</name>
        <value>master.watertek.com:50070</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.permissions.superusergroup</name>
        <value>staff</value>
    </property>
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
</configuration>
  1. 编辑 mapred-site.xml

如果目录下没有 mapred-site.xml 而是 mapred-site.xml.template,那么需要先重命名或在复制一下该文件。

cp mapred-site.xml.template mapred-site.xml
或者
mv mapred-site.xml.tempalte mapred-site.xml

// 打开 mapred-site.xml
vi mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
  1. 编辑 yarn-site.xml
vi yarn-site.xml
<configuration>

<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>master.watertek.com</value>
    </property>
</configuration>
  1. 编辑 workers

在目录下应该没有 workers 文件,自己新建一个文件编辑即可。在 hadoop-3.0.0 以前使用 slaves 文件,hadoop-3.0.0 之后使用 workers 文件。

vi workers
master.watertek.com
slave1.watertek.com
  1. 将 master 上配置完成的 hadoop 传至 slave1 上。
// 进入 /usr/local 目录
cd /usr/local

/ / 传输,由于权限问题无法直接传至 slave1 上 /usr/local 目录下
scp -r hadoop-3.3.0 slave1.watertek.com:~/

//  slave1 上移动 hadoop 至 /usr/local 目录下
sudo mv hadoop-3.3.0 /usr/local

// 修改文件夹权限,user 为 slave1 用户
sudo chown -R user:user hadoop-3.3.0 
  1. 在 master 上格式化
hdfs namenode -format
  1. 启动 hadoop 集群
// 由于在开始配置 hadoop 环境变量,可以直接命令启动
start-all.sh
// 停止
stop-all.sh

// 或在先进入 /usr/local/hadoop-3.3.0 /sbin 目录下,在启动
cd /usr/local/hadoop-3.3.0/sbin
./start-all.sh
// 停止
./stop-all.sh
  1. 查看 hadoop 集群是否运行正常
// 使用 jps 命令查看 hadoop 进程
jps

master-jps

slave1-jps

hadoop-50070

hadoop-8088

注意事项

在配置完成后可能会遇到很多问题,需要正对问题逐个解决才行。

master 启动正常,slave1 上无法启动 datanode

描述:这个问题困扰了为好几天,参考好几篇博文反复配置 hadoop 各个配置文件好几遍。配置完成后 master 上各个进程能够正常启动运行,但是 slave1 上只能启动 SecondaryNameNode 进程,而 DataNode 和 NodeManager 进程没有启动。

原因:开始使用 /usr/local/hadoop-3.3.0/etc/hadoop 目录下 slaves 文件,从 hadoop-3.0.0 版本之后,配置文件 slaves 文件改为了 workers 文件。

解决方法:将 slaves 文件名改为 workers 即可,slave1 上也需要修改。

// 进入 /usr/local/hadoop-3.3.0/etc/hadoop 目录
cd /usr/local/hadoop-3.3.0/etc/hadoop

// 修改文件名
mv slaves workers
master 上 datanode 无法启动

描述:刚配置完成 hadoop 时 master 上都运行正常,重新格式化之后 datanode 无法启动了。

原因:重新格式化之后版本号改变导致无法启动 datanode。

解决方法:有两种解决方法,一是删除 /usr/local/hadoop-3.3.0/hdfs 文件夹,然后重新格式化后启动。而是修改版本号。

  1. /usr/local/hadoop-3.3.0 目录下 hdfs 文件夹是配置文件中建立的,直接将整个文件夹删除,在使用 hdfs namenode -format 命令格式化后启动就正常了。
  2. /usr/local/hadoop-3.3.0/etc/hadoop/hdfs/name/current/VERSION 文件中和 /usr/local/hadoop-3.3.0/etc/hadoop/hdfs/data/current/VERSION 中 clusterID 相同。

运行时报错

描述:在测试示例程序时运行报错,错误信息如下;

2021-06-23 11:52:11,747 INFO mapreduce.Job: Job job_1624343986613_0002 failed with state FAILED due to: Application application_1624343986613_0002 failed 2 times due to AM Container for appattempt_1624343986613_0002_000002 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2021-06-23 11:52:11.139]Exception from container-launch.
Container id: container_1624343986613_0002_02_000001
Exit code: 1

[2021-06-23 11:52:11.141]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster


[2021-06-23 11:52:11.141]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
错误: 找不到或无法加载主类 org.apache.hadoop.mapreduce.v2.app.MRAppMaster


For more detailed output, check the application tracking page: http://master.watertek.com:8088/cluster/app/application_1624343986613_0002 Then click on links to logs of each attempt.
. Failing the application.
2021-06-23 11:52:11,765 INFO mapreduce.Job: Counters: 0

解决方法:

编辑 /usr/local/hadoop-3.3.0/etc/hadoop 目录下 yarn-site.xml 配置文件并添加 yarn.application.classpath 属性和对应值。使用 hadoop classpath 可以查看输出的值。

// 命令
hadoop classpath

// 输出信息
/usr/local/hadoop-3.3.0/etc/hadoop:/usr/local/hadoop-3.3.0/share/hadoop/common/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/common/*:/usr/local/hadoop-3.3.0/share/hadoop/hdfs:/usr/local/hadoop-3.3.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/hdfs/*:/usr/local/hadoop-3.3.0/share/hadoop/mapreduce/*:/usr/local/hadoop-3.3.0/share/hadoop/yarn:/usr/local/hadoop-3.3.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/yarn/*
<!--编辑 yarn-site.xml,添加一个属性-->
<property>
        <name>yarn.application.classpath</name>
        <value>/usr/local/hadoop-3.3.0/etc/hadoop:/usr/local/hadoop-3.3.0/share/hadoop/common/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/common/*:/usr/local/hadoop-3.3.0/share/hadoop/hdfs:/usr/local/hadoop-3.3.0/share/hadoop/hdfs/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/hdfs/*:/usr/local/hadoop-3.3.0/share/hadoop/mapreduce/*:/usr/local/hadoop-3.3.0/share/hadoop/yarn:/usr/local/hadoop-3.3.0/share/hadoop/yarn/lib/*:/usr/local/hadoop-3.3.0/share/hadoop/yarn/*</value>
    </property>

测试 hadoop 集群

测试示例 1
hadoop jar /usr/local/hadoop-3.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2021-06-23 14:27:35,336 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master.watertek.com/192.168.1.7:8032
2021-06-23 14:27:35,641 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/user/.staging/job_1624428808391_0002
2021-06-23 14:27:35,819 INFO input.FileInputFormat: Total input files to process : 10
2021-06-23 14:27:35,999 INFO mapreduce.JobSubmitter: number of splits:10
2021-06-23 14:27:36,189 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1624428808391_0002
2021-06-23 14:27:36,189 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-06-23 14:27:36,312 INFO conf.Configuration: resource-types.xml not found
2021-06-23 14:27:36,313 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-06-23 14:27:36,366 INFO impl.YarnClientImpl: Submitted application application_1624428808391_0002
2021-06-23 14:27:36,401 INFO mapreduce.Job: The url to track the job: http://master.watertek.com:8088/proxy/application_1624428808391_0002/
2021-06-23 14:27:36,401 INFO mapreduce.Job: Running job: job_1624428808391_0002
2021-06-23 14:27:41,490 INFO mapreduce.Job: Job job_1624428808391_0002 running in uber mode : false
2021-06-23 14:27:41,492 INFO mapreduce.Job:  map 0% reduce 0%
2021-06-23 14:27:46,627 INFO mapreduce.Job:  map 20% reduce 0%
2021-06-23 14:27:54,701 INFO mapreduce.Job:  map 100% reduce 0%
2021-06-23 14:27:55,712 INFO mapreduce.Job:  map 100% reduce 100%
2021-06-23 14:27:56,735 INFO mapreduce.Job: Job job_1624428808391_0002 completed successfully
2021-06-23 14:27:56,820 INFO mapreduce.Job: Counters: 54
	File System Counters
		FILE: Number of bytes read=226
		FILE: Number of bytes written=2913955
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2730
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=45
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
		HDFS: Number of bytes read erasure-coded=0
	Job Counters 
		Launched map tasks=10
		Launched reduce tasks=1
		Data-local map tasks=10
		Total time spent by all maps in occupied slots (ms)=89286
		Total time spent by all reduces in occupied slots (ms)=7107
		Total time spent by all map tasks (ms)=89286
		Total time spent by all reduce tasks (ms)=7107
		Total vcore-milliseconds taken by all map tasks=89286
		Total vcore-milliseconds taken by all reduce tasks=7107
		Total megabyte-milliseconds taken by all map tasks=91428864
		Total megabyte-milliseconds taken by all reduce tasks=7277568
	Map-Reduce Framework
		Map input records=10
		Map output records=20
		Map output bytes=180
		Map output materialized bytes=280
		Input split bytes=1550
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=280
		Reduce input records=20
		Reduce output records=0
		Spilled Records=40
		Shuffled Maps =10
		Failed Shuffles=0
		Merged Map outputs=10
		GC time elapsed (ms)=2898
		CPU time spent (ms)=6310
		Physical memory (bytes) snapshot=3237498880
		Virtual memory (bytes) snapshot=29165371392
		Total committed heap usage (bytes)=3401056256
		Peak Map Physical memory (bytes)=333475840
		Peak Map Virtual memory (bytes)=2654306304
		Peak Reduce Physical memory (bytes)=230047744
		Peak Reduce Virtual memory (bytes)=2658615296
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1180
	File Output Format Counters 
		Bytes Written=97
Job Finished in 21.584 seconds
Estimated value of Pi is 3.20000000000000000000

可以看出最后输出结果为:Estimated value of Pi is 3.20000000000000000000,说明运行成功了。

测试示例 2
  1. hadoop 自带了 wordcount 例子,是用于统计单词个数的,首先要在 hdfs 系统中创建文件夹,要查看 hdfs 系统可以通过 hadoop fs -ls 查看 hdfs 系统的文件以及目录情况,图中 word_count_input 和 word_count_output 为我测试示例使用,如下图:

hadoop-fs-ls

  1. 使用 hadoop fs -rm -r word_count_input word_count_output 删除 hdfs 系统中之前测试的结果。

hadoop-fs-rm

  1. 使用 hadoop fs -mkdir word_cout_input 命令在 hdfs 系统创建文件夹。

hadoop-fs-mkdir

  1. 在本地创建两个文件,示例中在 /home/user 目录下创建一个文件夹 word_count_test,在 /home/user/word_count_test 目录下创建两个文件 file1.txt 和 file2.txt。
vi file1.txt
vi file2.txt

// file1.txt 内容
hello hadoop
hello hive
hello ljj
hello hadoop
good morning
hello hbase
hello hadoop

// file2.txt 内容
linux window
hello linux
hello window

根据文本中内容可以统计出 hello 8, hadoop 3, hive 1, ljj 1, good 1, morning 1, hbase 1, linux 2, window 2。

  1. 将本地文件上传至 hdfs 系统的 word_count_input 文件夹下。
hadoop fs -put *.txt word_count_input

hadoop-fs-put

  1. 使用命令运行 wordcount,其中 word_count_output 为结果输出目录。
hadoop jar /usr/local/hadoop-3.3.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordcount word_count_input word_count_output

运行输出信息:

2021-06-23 14:52:51,680 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at master.watertek.com/192.168.1.7:8032
2021-06-23 14:52:52,313 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/user/.staging/job_1624428808391_0003
2021-06-23 14:52:52,622 INFO input.FileInputFormat: Total input files to process : 2
2021-06-23 14:52:52,819 INFO mapreduce.JobSubmitter: number of splits:2
2021-06-23 14:52:52,991 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1624428808391_0003
2021-06-23 14:52:52,991 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-06-23 14:52:53,132 INFO conf.Configuration: resource-types.xml not found
2021-06-23 14:52:53,132 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-06-23 14:52:53,182 INFO impl.YarnClientImpl: Submitted application application_1624428808391_0003
2021-06-23 14:52:53,214 INFO mapreduce.Job: The url to track the job: http://master.watertek.com:8088/proxy/application_1624428808391_0003/
2021-06-23 14:52:53,215 INFO mapreduce.Job: Running job: job_1624428808391_0003
2021-06-23 14:52:59,382 INFO mapreduce.Job: Job job_1624428808391_0003 running in uber mode : false
2021-06-23 14:52:59,383 INFO mapreduce.Job:  map 0% reduce 0%
2021-06-23 14:53:03,458 INFO mapreduce.Job:  map 50% reduce 0%
2021-06-23 14:53:04,469 INFO mapreduce.Job:  map 100% reduce 0%
2021-06-23 14:53:09,504 INFO mapreduce.Job:  map 100% reduce 100%
2021-06-23 14:53:09,516 INFO mapreduce.Job: Job job_1624428808391_0003 completed successfully
2021-06-23 14:53:09,593 INFO mapreduce.Job: Counters: 54
	File System Counters
		FILE: Number of bytes read=126
		FILE: Number of bytes written=793862
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=389
		HDFS: Number of bytes written=72
		HDFS: Number of read operations=11
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
		HDFS: Number of bytes read erasure-coded=0
	Job Counters 
		Launched map tasks=2
		Launched reduce tasks=1
		Data-local map tasks=2
		Total time spent by all maps in occupied slots (ms)=5645
		Total time spent by all reduces in occupied slots (ms)=3123
		Total time spent by all map tasks (ms)=5645
		Total time spent by all reduce tasks (ms)=3123
		Total vcore-milliseconds taken by all map tasks=5645
		Total vcore-milliseconds taken by all reduce tasks=3123
		Total megabyte-milliseconds taken by all map tasks=5780480
		Total megabyte-milliseconds taken by all reduce tasks=3197952
	Map-Reduce Framework
		Map input records=10
		Map output records=20
		Map output bytes=203
		Map output materialized bytes=132
		Input split bytes=266
		Combine input records=20
		Combine output records=10
		Reduce input groups=9
		Reduce shuffle bytes=132
		Reduce input records=10
		Reduce output records=9
		Spilled Records=20
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=222
		CPU time spent (ms)=1800
		Physical memory (bytes) snapshot=747245568
		Virtual memory (bytes) snapshot=7958106112
		Total committed heap usage (bytes)=799539200
		Peak Map Physical memory (bytes)=288419840
		Peak Map Virtual memory (bytes)=2650775552
		Peak Reduce Physical memory (bytes)=216190976
		Peak Reduce Virtual memory (bytes)=2657652736
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=123
	File Output Format Counters 
		Bytes Written=72

计算结果:

通过命令可以查看目录下多出 word_count_output 文件夹,文件夹下有两个文件,其中 _SUCCESS 文件为空,表示计算成功。 part-r-00000 文件保存了计算的结果。

wordcount_out

  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ALONE_WORK

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值