统计学生的平均成绩
先在集群里创建几个文件
- 第一个:vim score.txt
[hadoop@master mapreduce]$ cd
[hadoop@master ~]$ ls
hadoop-2.7.7.master.tar.gz hadoop-2.7.7.tar.gz
[hadoop@master ~]$ vim score.txt
linli math 95
linli chinese 90
linli english 100
liming math 78
liming chinese 86
liming english 90
me math 90
me chinese 90
me english 90
- 第二个:vim score1.txt
[hadoop@master ~]$ vim score1.txt
root math 67
root chinese 89
root english 78
hadoop math 90
hadoop chinese 93
hadoop english 89
文件写好就上传到分布式文件系统
[hadoop@master ~]$ hadoop fs -mkdir /score //同样我会先创建一个存放目录
[hadoop@master ~]$ hadoop fs -lsr / //由于文件有点多,我只复制了相应的文件
lsr: DEPRECATED: Please use ‘ls -R’ instead.
drwxr-xr-x - hadoop supergroup 0 2020-04-19 21:38 /data
-rw-r–r-- 3 hadoop supergroup 51 2020-04-19 21:38 /data/1.txt
-rw-r–r-- 3 hadoop supergroup 53 2020-04-19 21:38 /data/2.txt
drwxr-xr-x - hadoop supergroup 0 2020-04-19 23:45 /out-jar
-rw-r–r-- 3 hadoop supergroup 0 2020-04-19 23:45 /out-jar/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 78 2020-04-19 23:45 /out-jar/part-r-00000
drwxr-xr-x - hadoop supergroup 0 2020-04-19 21:41 /out-word
-rw-r–r-- 3 hadoop supergroup 0 2020-04-19 21:41 /out-word/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 78 2020-04-19 21:41 /out-word/part-r-00000
drwxr-xr-x - hadoop supergroup 0 2020-04-20 00:29 /score
[hadoop@master ~]$ hadoop fs -put score.txt score1.txt /score/
[hadoop@master ~]$ hadoop fs -lsr /
lsr: DEPRECATED: Please use ‘ls -R’ instead.
drwxr-xr-x - hadoop supergroup 0 2020-04-19 21:38 /data
-rw-r–r-- 3 hadoop supergroup 51 2020-04-19 21:38 /data/1.txt
-rw-r–r-- 3 hadoop supergroup 53 2020-04-19 21:38 /data/2.txt
drwxr-xr-x - hadoop supergroup 0 2020-04-19 23:45 /out-jar
-rw-r–r-- 3 hadoop supergroup 0 2020-04-19 23:45 /out-jar/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 78 2020-04-19 23:45 /out-jar/part-r-00000
drwxr-xr-x - hadoop supergroup 0 2020-04-19 21:41 /out-word
-rw-r–r-- 3 hadoop supergroup 0 2020-04-19 21:41 /out-word/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 78 2020-04-19 21:41 /out-word/part-r-00000
drwxr-xr-x - hadoop supergroup 0 2020-04-20 00:30 /score
-rw-r–r-- 3 hadoop supergroup 139 2020-04-20 00:30 /score/score.txt
-rw-r–r-- 3 hadoop supergroup 96 2020-04-20 00:30 /score/score1.txt
[hadoop@master ~]$
编写JAVA程序Score.java
package com.hadoop.ComputerScore;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class Score {
//Map类
public static class MyMapper extends Mapper<Object, Text, Text, FloatWritable>
{
@Override
protected void map(Object key, Text value, Mapper<Object, Text, Text, FloatWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
String val = value.toString();
String [] vals = val.split(" "); //一定要注意这个空格只空一次,空多了会出错。比如你空两个,文本里没有两个的,那就不会被分割,最后还是一整行为一列,然后下面的转换成小数那里就没有2了,就会报错说下标不对
float sc = Float.parseFloat(vals[2]);
context.write(new Text(vals[0]), new FloatWritable(sc));
}
}
//Reducer
// liming {90, 80}
public static class MyReducer extends Reducer<Text, FloatWritable, Text, FloatWritable>
{
@Override
protected void reduce(Text key, Iterable<FloatWritable> values,
Reducer<Text, FloatWritable, Text, FloatWritable>.Context context)
throws IOException, InterruptedException {
// TODO Auto-generated method stub
float sum = 0;
int i = 0;
for(FloatWritable value : values)
{
sum += value.get();
i++;
}
sum = sum / i;
context.write(key, new FloatWritable(sum));
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException
{
// TODO Auto-generated method stub
if(args.length<2)
{
System.out.println("the arguments are adfadf");
System.exit(0);
}
Configuration conf = new Configuration();
String []arg = new GenericOptionsParser(conf, args).getRemainingArgs();
@SuppressWarnings("deprecation")
Job job = new Job(conf, "score"); //设置环境参数
job.setJarByClass(Score.class); //设置整个程序的类名(驱动类)
job.setMapperClass(MyMapper.class); //添加 Mapper类
job.setReducerClass(MyReducer.class); //添加Reducer类
job.setOutputKeyClass(Text.class); //设置输出类型
job.setOutputValueClass(FloatWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0])); //设置输入文件
FileOutputFormat.setOutputPath(job, new Path(arg[1])); //设置输出文件
System.exit(job.waitForCompletion(true)?0:1);
}
}
转成jar包,放入集群
[hadoop@master ~]$ ls
?? hadoop-2.7.7.master.tar.gz hadoop-2.7.7.tar.gz score1.txt score.txt
[hadoop@master ~]$ rz
rz waiting to receive.
¿ªÊ¼ zmodem ´«Êä¡£ °´ Ctrl+C È¡Ïû¡£
100% 8 KB 8 KB/s 00:00:01 0 Errors
[hadoop@master ~]$ ls
?? hadoop-2.7.7.master.tar.gz score1.txt
ComputerScore.jar hadoop-2.7.7.tar.gz score.txt
编译成功
[hadoop@master mapreduce]$ hadoop jar computerScore.jar /score/ /score/out
查看结果
[hadoop@master mapreduce]$ hadoop fs -lsr /score/out/
lsr: DEPRECATED: Please use ‘ls -R’ instead.
-rw-r–r-- 3 hadoop supergroup 0 2020-04-20 03:17 /score/out/_SUCCESS
-rw-r–r-- 3 hadoop supergroup 63 2020-04-20 03:17 /score/out/part-r-00000
[hadoop@master mapreduce]$ hadoop fs -cat /score/out/part-r-00000
hadoop 90.666664
liming 84.666664
linli 95.0
me 90.0
root 78.0
[hadoop@master mapreduce]$