统计数据
1.需要实现的方法有:
write(DataOutput out) readfields(DataInput in)
Word.java
package test;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;
public class Word implements Writable{
private String name;
private int num;
private int count;
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public int getNum() {
return num;
}
public void setNum(int num) {
this.num = num;
}
public int getCount() {
return count;
}
public void setCount(int count) {
this.count = count;
}
public void write(DataOutput out) throws IOException {
out.writeUTF(name);
out.writeInt(num);
out.writeInt(count);
}
public void readFields(DataInput in) throws IOException {
name=in.readUTF();
num=in.readInt();
count=in.readInt();
}
@Override
public String toString() {
return name+" "+count;
}
}
2.Main类(Mapper,Reducer)
package test;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class Main {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Job job=Job.getInstance(new Configuration());
job.setJarByClass(Main.class);
job.setMapperClass(wordMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Word.class);
job.setReducerClass(WordReduce.class);
job.setOutputKeyClass(Word.class);
job.setOutputValueClass(NullWritable.class);
FileInputFormat.addInputPaths(job, args[0]);
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
public static class wordMapper extends Mapper<LongWritable, Text, Text, Word>{
@Override
public void map(LongWritable key, Text text, Mapper<LongWritable, Text, Text, Word>.Context context)
throws IOException, InterruptedException {
String str=text.toString();
String[] arr=str.split("\\s");
Word word=new Word();
word.setName(arr[0]);
word.setNum(Integer.valueOf(arr[1]));
context.write(new Text(arr[0]), word);
}
}
public static class WordReduce extends Reducer<Text, Word, Word, NullWritable>{
@Override
public void reduce(Text arg0, Iterable<Word> words, Reducer<Text, Word, Word, NullWritable>.Context content)
throws IOException, InterruptedException {
int sum=0;
String name=null;
for(Word word:words){
name=word.getName();
sum+=word.getNum();
}
Word word=new Word();
word.setName(name);
word.setCount(sum);
content.write(word, NullWritable.get());
}
}
}
3.上传到集群中
4.执行命令
hadoop jar wc /word /out01
5.输出以下则成功:
[root@wpy apps]# hadoop jar wc.jar /word.txt /hh03
19/07/10 17:31:05 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/07/10 17:31:06 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/07/10 17:31:06 INFO input.FileInputFormat: Total input paths to process : 1
19/07/10 17:31:06 INFO mapreduce.JobSubmitter: number of splits:1
19/07/10 17:31:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1562749459938_0003
19/07/10 17:31:07 INFO impl.YarnClientImpl: Submitted application application_1562749459938_0003
19/07/10 17:31:07 INFO mapreduce.Job: The url to track the job: http://wpy:8088/proxy/application_1562749459938_0003/
19/07/10 17:31:07 INFO mapreduce.Job: Running job: job_1562749459938_0003
19/07/10 17:31:15 INFO mapreduce.Job: Job job_1562749459938_0003 running in uber mode : false
19/07/10 17:31:15 INFO mapreduce.Job: map 0% reduce 0%
19/07/10 17:31:22 INFO mapreduce.Job: map 100% reduce 0%
19/07/10 17:31:31 INFO mapreduce.Job: map 100% reduce 100%
19/07/10 17:31:32 INFO mapreduce.Job: Job job_1562749459938_0003 completed successfully
19/07/10 17:31:32 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=175
FILE: Number of bytes written=213491
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=165
HDFS: Number of bytes written=41
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4808
Total time spent by all reduces in occupied slots (ms)=7508
Total time spent by all map tasks (ms)=4808
Total time spent by all reduce tasks (ms)=7508
Total vcore-milliseconds taken by all map tasks=4808
Total vcore-milliseconds taken by all reduce tasks=7508
Total megabyte-milliseconds taken by all map tasks=4923392
Total megabyte-milliseconds taken by all reduce tasks=7688192
Map-Reduce Framework
Map input records=7
Map output records=7
Map output bytes=155
Map output materialized bytes=175
Input split bytes=92
Combine input records=0
Combine output records=0
Reduce input groups=4
Reduce shuffle bytes=175
Reduce input records=7
Reduce output records=4
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=120
CPU time spent (ms)=1010
Physical memory (bytes) snapshot=323358720
Virtual memory (bytes) snapshot=1685254144
Total committed heap usage (bytes)=136056832
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=73
File Output Format Counters
Bytes Written=41
6.查看结果:
命令:hdfs dfs -cat /outo1/*
7.结果如下: