WordCount案例实操
1.需求
在给定的文本文件中统计输出每一个单词出现的总次数
(1)输入数据
需要在D盘新建一个文档input,中放入hello.txt
hello.txt中数据
atguigu atguigu
ss ss
cls cls
jiao
banzhang
xue
hadoop
(2)期望输出数据
atguigu 2
banzhang 1
cls 2
hadoop 1
jiao 1
ss 2
xue 1
2.需求分析
按照MapReduce编程规范,分别编写Mapper,Reducer,Driver
3.idea操作
1.首先新建moven工程
2.pom.xml文件添加依赖
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.zpark</groupId>
<artifactId>mapreduce</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.8.1</version>
</dependency>
<dependency>
<groupId>jdk.tools</groupId>
<artifactId>jdk.tools</artifactId>
<version>1.8</version>
<scope>system</scope>
<systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
</dependency>
</dependencies>
</project>
3.在项目的src/main/resources目录下,新建一个文件,命名为“log4j.properties”,
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
4.新建几个class
5.编写Mapper类
package com.zpark.wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WcMapper extends Mapper<LongWritable,Text, Text, IntWritable> {
private Text word =new Text();
private IntWritable one =new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//拿到这一行数据
String line = value.toString();
//按照空格切分数据
String[] words = line.split(" ");
//遍历数组,把单词变成(word,1)的形式交给框架
for(String word:words){
this.word.set(word);
context.write(this.word,this.one);
}
}
}
6.编写Reducer类
package com.zpark.wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class Wcreducer extends Reducer<Text, IntWritable,Text,IntWritable> {
private IntWritable total =new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
//做累加
int sum =0;
for (IntWritable value:values){
sum += value.get();
}
//包装结果并输出
total.set(sum);
context.write(key,total);
}
}
7.编写Driver驱动类
package com.zpark.wordcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WcDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
//1.获取一个Job实例
Job job =Job.getInstance(new Configuration());
//2.设置我们的类路径(classpath)
job.setJarByClass(WcDriver.class);
//3.设置mapper和reducer
job.setMapperClass(WcMapper.class);
job.setReducerClass(Wcreducer.class);
//4.设置mapper和reducer输出的类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//5.设置输入输出数据
FileInputFormat.setInputPaths(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));
//6.提交我们的Job
boolean b = job.waitForCompletion(true);
System.out.println(b ? 0 :1);
}
}
结果:
在D盘中也会出现output文件
"C:\Program Files\Java\jdk1.8.0_144\bin\java.exe" "-javaagent:F:\idea2019\2019.1.3idea\IntelliJ IDEA 2019.1.3\lib\idea_rt.jar=49975:F:\idea2019\2019.1.3idea\IntelliJ IDEA 2019.1.3\bin" -Dfile.encoding=UTF-8 -classpath "C:\Program Files\Java\jdk1.8.0_144\jre\lib\charsets.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\deploy.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\access-bridge-64.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\cldrdata.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\dnsns.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\jaccess.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\jfxrt.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\localedata.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\nashorn.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\sunec.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\sunjce_provider.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\sunmscapi.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\sunpkcs11.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\ext\zipfs.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\javaws.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\jce.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\jfr.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\jfxswt.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\jsse.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\management-agent.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\plugin.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\resources.jar;C:\Program Files\Java\jdk1.8.0_144\jre\lib\rt.jar;F:\idea2019\mapreduce\target\classes;C:\Users\ABU\.m2\repository\junit\junit\4.13\junit-4.13.jar;C:\Users\ABU\.m2\repository\org\hamcrest\hamcrest-core\1.3\hamcrest-core-1.3.jar;C:\Users\ABU\.m2\repository\org\apache\logging\log4j\log4j-core\2.8.2\log4j-core-2.8.2.jar;C:\Users\ABU\.m2\repository\org\apache\logging\log4j\log4j-api\2.8.2\log4j-api-2.8.2.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-common\2.8.1\hadoop-common-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-annotations\2.8.1\hadoop-annotations-2.8.1.jar;C:\Users\ABU\.m2\repository\com\google\guava\guava\11.0.2\guava-11.0.2.jar;C:\Users\ABU\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\ABU\.m2\repository\org\apache\commons\commons-math3\3.1.1\commons-math3-3.1.1.jar;C:\Users\ABU\.m2\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;C:\Users\ABU\.m2\repository\org\apache\httpcomponents\httpclient\4.5.2\httpclient-4.5.2.jar;C:\Users\ABU\.m2\repository\org\apache\httpcomponents\httpcore\4.4.4\httpcore-4.4.4.jar;C:\Users\ABU\.m2\repository\commons-codec\commons-codec\1.4\commons-codec-1.4.jar;C:\Users\ABU\.m2\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\Users\ABU\.m2\repository\commons-net\commons-net\3.1\commons-net-3.1.jar;C:\Users\ABU\.m2\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\ABU\.m2\repository\javax\servlet\servlet-api\2.5\servlet-api-2.5.jar;C:\Users\ABU\.m2\repository\org\mortbay\jetty\jetty\6.1.26\jetty-6.1.26.jar;C:\Users\ABU\.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\ABU\.m2\repository\org\mortbay\jetty\jetty-sslengine\6.1.26\jetty-sslengine-6.1.26.jar;C:\Users\ABU\.m2\repository\javax\servlet\jsp\jsp-api\2.1\jsp-api-2.1.jar;C:\Users\ABU\.m2\repository\com\sun\jersey\jersey-core\1.9\jersey-core-1.9.jar;C:\Users\ABU\.m2\repository\com\sun\jersey\jersey-json\1.9\jersey-json-1.9.jar;C:\Users\ABU\.m2\repository\org\codehaus\jettison\jettison\1.1\jettison-1.1.jar;C:\Users\ABU\.m2\repository\com\sun\xml\bind\jaxb-impl\2.2.3-1\jaxb-impl-2.2.3-1.jar;C:\Users\ABU\.m2\repository\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;C:\Users\ABU\.m2\repository\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;C:\Users\ABU\.m2\repository\javax\activation\activation\1.1\activation-1.1.jar;C:\Users\ABU\.m2\repository\org\codehaus\jackson\jackson-jaxrs\1.8.3\jackson-jaxrs-1.8.3.jar;C:\Users\ABU\.m2\repository\org\codehaus\jackson\jackson-xc\1.8.3\jackson-xc-1.8.3.jar;C:\Users\ABU\.m2\repository\com\sun\jersey\jersey-server\1.9\jersey-server-1.9.jar;C:\Users\ABU\.m2\repository\asm\asm\3.1\asm-3.1.jar;C:\Users\ABU\.m2\repository\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\Users\ABU\.m2\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Users\ABU\.m2\repository\net\java\dev\jets3t\jets3t\0.9.0\jets3t-0.9.0.jar;C:\Users\ABU\.m2\repository\com\jamesmurty\utils\java-xmlbuilder\0.4\java-xmlbuilder-0.4.jar;C:\Users\ABU\.m2\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\Users\ABU\.m2\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;C:\Users\ABU\.m2\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;C:\Users\ABU\.m2\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;C:\Users\ABU\.m2\repository\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;C:\Users\ABU\.m2\repository\org\slf4j\slf4j-api\1.7.10\slf4j-api-1.7.10.jar;C:\Users\ABU\.m2\repository\org\slf4j\slf4j-log4j12\1.7.10\slf4j-log4j12-1.7.10.jar;C:\Users\ABU\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.13\jackson-core-asl-1.9.13.jar;C:\Users\ABU\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.13\jackson-mapper-asl-1.9.13.jar;C:\Users\ABU\.m2\repository\org\apache\avro\avro\1.7.4\avro-1.7.4.jar;C:\Users\ABU\.m2\repository\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\Users\ABU\.m2\repository\org\xerial\snappy\snappy-java\1.0.4.1\snappy-java-1.0.4.1.jar;C:\Users\ABU\.m2\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\Users\ABU\.m2\repository\com\google\code\gson\gson\2.2.4\gson-2.2.4.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-auth\2.8.1\hadoop-auth-2.8.1.jar;C:\Users\ABU\.m2\repository\com\nimbusds\nimbus-jose-jwt\3.9\nimbus-jose-jwt-3.9.jar;C:\Users\ABU\.m2\repository\net\jcip\jcip-annotations\1.0\jcip-annotations-1.0.jar;C:\Users\ABU\.m2\repository\net\minidev\json-smart\1.1.1\json-smart-1.1.1.jar;C:\Users\ABU\.m2\repository\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;C:\Users\ABU\.m2\repository\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;C:\Users\ABU\.m2\repository\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;C:\Users\ABU\.m2\repository\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;C:\Users\ABU\.m2\repository\org\apache\curator\curator-framework\2.7.1\curator-framework-2.7.1.jar;C:\Users\ABU\.m2\repository\com\jcraft\jsch\0.1.51\jsch-0.1.51.jar;C:\Users\ABU\.m2\repository\org\apache\curator\curator-client\2.7.1\curator-client-2.7.1.jar;C:\Users\ABU\.m2\repository\org\apache\curator\curator-recipes\2.7.1\curator-recipes-2.7.1.jar;C:\Users\ABU\.m2\repository\com\google\code\findbugs\jsr305\3.0.0\jsr305-3.0.0.jar;C:\Users\ABU\.m2\repository\org\apache\htrace\htrace-core4\4.0.1-incubating\htrace-core4-4.0.1-incubating.jar;C:\Users\ABU\.m2\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;C:\Users\ABU\.m2\repository\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;C:\Users\ABU\.m2\repository\org\tukaani\xz\1.0\xz-1.0.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-client\2.8.1\hadoop-client-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.8.1\hadoop-mapreduce-client-app-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.8.1\hadoop-mapreduce-client-common-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-yarn-client\2.8.1\hadoop-yarn-client-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-yarn-server-common\2.8.1\hadoop-yarn-server-common-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.8.1\hadoop-mapreduce-client-shuffle-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-yarn-api\2.8.1\hadoop-yarn-api-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.8.1\hadoop-mapreduce-client-core-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-yarn-common\2.8.1\hadoop-yarn-common-2.8.1.jar;C:\Users\ABU\.m2\repository\com\sun\jersey\jersey-client\1.9\jersey-client-1.9.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.8.1\hadoop-mapreduce-client-jobclient-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-hdfs\2.8.1\hadoop-hdfs-2.8.1.jar;C:\Users\ABU\.m2\repository\org\apache\hadoop\hadoop-hdfs-client\2.8.1\hadoop-hdfs-client-2.8.1.jar;C:\Users\ABU\.m2\repository\com\squareup\okhttp\okhttp\2.4.0\okhttp-2.4.0.jar;C:\Users\ABU\.m2\repository\com\squareup\okio\okio\1.4.0\okio-1.4.0.jar;C:\Users\ABU\.m2\repository\commons-daemon\commons-daemon\1.0.13\commons-daemon-1.0.13.jar;C:\Users\ABU\.m2\repository\io\netty\netty\3.6.2.Final\netty-3.6.2.Final.jar;C:\Users\ABU\.m2\repository\io\netty\netty-all\4.0.23.Final\netty-all-4.0.23.Final.jar;C:\Users\ABU\.m2\repository\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;C:\Users\ABU\.m2\repository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;C:\Users\ABU\.m2\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Program Files\Java\jdk1.8.0_144\lib\tools.jar" com.zpark.wordcount.WcDriver d:\input d:\output
2020-02-01 17:41:06,554 INFO [org.apache.hadoop.conf.Configuration.deprecation] - session.id is deprecated. Instead, use dfs.metrics.session-id
2020-02-01 17:41:06,556 INFO [org.apache.hadoop.metrics.jvm.JvmMetrics] - Initializing JVM Metrics with processName=JobTracker, sessionId=
2020-02-01 17:41:07,708 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-02-01 17:41:07,757 WARN [org.apache.hadoop.mapreduce.JobResourceUploader] - No job jar file set. User classes may not be found. See Job or Job#setJar(String).
2020-02-01 17:41:07,882 INFO [org.apache.hadoop.mapreduce.lib.input.FileInputFormat] - Total input files to process : 1
2020-02-01 17:41:07,938 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - number of splits:1
2020-02-01 17:41:08,054 INFO [org.apache.hadoop.mapreduce.JobSubmitter] - Submitting tokens for job: job_local996914008_0001
2020-02-01 17:41:08,283 INFO [org.apache.hadoop.mapreduce.Job] - The url to track the job: http://localhost:8080/
2020-02-01 17:41:08,284 INFO [org.apache.hadoop.mapreduce.Job] - Running job: job_local996914008_0001
2020-02-01 17:41:08,294 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter set in config null
2020-02-01 17:41:08,304 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-02-01 17:41:08,304 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-02-01 17:41:08,305 INFO [org.apache.hadoop.mapred.LocalJobRunner] - OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2020-02-01 17:41:08,366 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for map tasks
2020-02-01 17:41:08,368 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local996914008_0001_m_000000_0
2020-02-01 17:41:08,412 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-02-01 17:41:08,412 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-02-01 17:41:08,424 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2020-02-01 17:41:08,480 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@116b94af
2020-02-01 17:41:08,487 INFO [org.apache.hadoop.mapred.MapTask] - Processing split: file:/d:/input/hello.TXT:0+60
2020-02-01 17:41:08,577 INFO [org.apache.hadoop.mapred.MapTask] - (EQUATOR) 0 kvi 26214396(104857584)
2020-02-01 17:41:08,578 INFO [org.apache.hadoop.mapred.MapTask] - mapreduce.task.io.sort.mb: 100
2020-02-01 17:41:08,578 INFO [org.apache.hadoop.mapred.MapTask] - soft limit at 83886080
2020-02-01 17:41:08,578 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufvoid = 104857600
2020-02-01 17:41:08,578 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396; length = 6553600
2020-02-01 17:41:08,587 INFO [org.apache.hadoop.mapred.MapTask] - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2020-02-01 17:41:08,606 INFO [org.apache.hadoop.mapred.LocalJobRunner] -
2020-02-01 17:41:08,606 INFO [org.apache.hadoop.mapred.MapTask] - Starting flush of map output
2020-02-01 17:41:08,606 INFO [org.apache.hadoop.mapred.MapTask] - Spilling map output
2020-02-01 17:41:08,606 INFO [org.apache.hadoop.mapred.MapTask] - bufstart = 0; bufend = 95; bufvoid = 104857600
2020-02-01 17:41:08,606 INFO [org.apache.hadoop.mapred.MapTask] - kvstart = 26214396(104857584); kvend = 26214360(104857440); length = 37/6553600
2020-02-01 17:41:08,621 INFO [org.apache.hadoop.mapred.MapTask] - Finished spill 0
2020-02-01 17:41:08,640 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local996914008_0001_m_000000_0 is done. And is in the process of committing
2020-02-01 17:41:08,648 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map
2020-02-01 17:41:08,648 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local996914008_0001_m_000000_0' done.
2020-02-01 17:41:08,648 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local996914008_0001_m_000000_0
2020-02-01 17:41:08,648 INFO [org.apache.hadoop.mapred.LocalJobRunner] - map task executor complete.
2020-02-01 17:41:08,651 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Waiting for reduce tasks
2020-02-01 17:41:08,652 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Starting task: attempt_local996914008_0001_r_000000_0
2020-02-01 17:41:08,662 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - File Output Committer Algorithm version is 1
2020-02-01 17:41:08,662 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-02-01 17:41:08,664 INFO [org.apache.hadoop.yarn.util.ProcfsBasedProcessTree] - ProcfsBasedProcessTree currently is supported only on Linux.
2020-02-01 17:41:08,737 INFO [org.apache.hadoop.mapred.Task] - Using ResourceCalculatorProcessTree : org.apache.hadoop.yarn.util.WindowsBasedProcessTree@44d8a0be
2020-02-01 17:41:08,745 INFO [org.apache.hadoop.mapred.ReduceTask] - Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@41eed66b
2020-02-01 17:41:08,775 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - MergerManager: memoryLimit=1311663744, maxSingleShuffleLimit=327915936, mergeThreshold=865698112, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2020-02-01 17:41:08,782 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - attempt_local996914008_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2020-02-01 17:41:08,829 INFO [org.apache.hadoop.mapreduce.task.reduce.LocalFetcher] - localfetcher#1 about to shuffle output of map attempt_local996914008_0001_m_000000_0 decomp: 117 len: 121 to MEMORY
2020-02-01 17:41:08,838 INFO [org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput] - Read 117 bytes from map-output for attempt_local996914008_0001_m_000000_0
2020-02-01 17:41:08,840 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - closeInMemoryFile -> map-output of size: 117, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->117
2020-02-01 17:41:08,841 INFO [org.apache.hadoop.mapreduce.task.reduce.EventFetcher] - EventFetcher is interrupted.. Returning
2020-02-01 17:41:08,843 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2020-02-01 17:41:08,844 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2020-02-01 17:41:08,854 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments
2020-02-01 17:41:08,855 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 107 bytes
2020-02-01 17:41:08,858 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merged 1 segments, 117 bytes to disk to satisfy reduce memory limit
2020-02-01 17:41:08,859 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 1 files, 121 bytes from disk
2020-02-01 17:41:08,860 INFO [org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl] - Merging 0 segments, 0 bytes from memory into reduce
2020-02-01 17:41:08,860 INFO [org.apache.hadoop.mapred.Merger] - Merging 1 sorted segments
2020-02-01 17:41:08,861 INFO [org.apache.hadoop.mapred.Merger] - Down to the last merge-pass, with 1 segments left of total size: 107 bytes
2020-02-01 17:41:08,861 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2020-02-01 17:41:08,866 INFO [org.apache.hadoop.conf.Configuration.deprecation] - mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2020-02-01 17:41:08,886 INFO [org.apache.hadoop.mapred.Task] - Task:attempt_local996914008_0001_r_000000_0 is done. And is in the process of committing
2020-02-01 17:41:08,888 INFO [org.apache.hadoop.mapred.LocalJobRunner] - 1 / 1 copied.
2020-02-01 17:41:08,888 INFO [org.apache.hadoop.mapred.Task] - Task attempt_local996914008_0001_r_000000_0 is allowed to commit now
2020-02-01 17:41:08,892 INFO [org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter] - Saved output of task 'attempt_local996914008_0001_r_000000_0' to file:/d:/output/_temporary/0/task_local996914008_0001_r_000000
2020-02-01 17:41:08,893 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce > reduce
2020-02-01 17:41:08,893 INFO [org.apache.hadoop.mapred.Task] - Task 'attempt_local996914008_0001_r_000000_0' done.
2020-02-01 17:41:08,893 INFO [org.apache.hadoop.mapred.LocalJobRunner] - Finishing task: attempt_local996914008_0001_r_000000_0
2020-02-01 17:41:08,894 INFO [org.apache.hadoop.mapred.LocalJobRunner] - reduce task executor complete.
2020-02-01 17:41:09,287 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local996914008_0001 running in uber mode : false
2020-02-01 17:41:09,289 INFO [org.apache.hadoop.mapreduce.Job] - map 100% reduce 100%
2020-02-01 17:41:09,290 INFO [org.apache.hadoop.mapreduce.Job] - Job job_local996914008_0001 completed successfully
2020-02-01 17:41:09,304 INFO [org.apache.hadoop.mapreduce.Job] - Counters: 30
File System Counters
FILE: Number of bytes read=680
FILE: Number of bytes written=635263
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=7
Map output records=10
Map output bytes=95
Map output materialized bytes=121
Input split bytes=89
Combine input records=0
Combine output records=0
Reduce input groups=7
Reduce shuffle bytes=121
Reduce input records=10
Reduce output records=7
Spilled Records=20
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
Total committed heap usage (bytes)=464519168
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=60
File Output Format Counters
Bytes Written=66
0
Process finished with exit code 0