title: 实验八mapreduce编程运用8
date: 2023-10-10 13:35:49
categories:
- mapreduce
tags:
目录
环境介绍
- Linux: Ubuntu 16.04 LTS
- hadoop2.7
. 前提准备工作
-
开始
1.1 准备数据集
- 看代码里是用什么分隔符,要是csv就是有","的,txt就是空格
- 然后编码格式是要utf-8
1.2 准备好java环境 - 随便导入一个项目,例如:ch11ppsgs品牌制造商的个数这个项目文件 > **注意:** 一定要导入的jar包有: - 1.“/usr/local/hadoop/share/hadoop/yarn”目录下的所有JAR包,注意,不包括目录lib、sources和test; - 2.“/usr/local/hadoop/share/hadoop/yarn/lib”目录下的所有JAR包; 1.3 启动hadoop,清理以前input和output里面的文件(可选,因为你可以本地运行项目) ```bash (base) hadoop@ubuntu:~$ cd /usr/local/hadoop
(base) hadoop@ubuntu:/usr/local/hadoop$ ./sbin/start-dfs.sh
(base) hadoop@ubuntu:/usr/local/hadoop$ ./bin/hdfs dfs -rm -r input
(base) hadoop@ubuntu:/usr/local/hadoop$ ./bin/hdfs dfs -rm -r output
解决问题
注意: 开始解决问题
- 可以把所有项目导入进去虚拟机,如图:我导入了所有的项目,那我就不需要一个一个配置环境了(导入jar包)
1.1 项目结构如图:
1.2 代码举例,这个是统计男女占比的,这个是本地运行,读取本地路劲
package demo6;
///性别比例
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class SexCount {
public static void main(String[] args) throws Exception {
args = new String[] { "inputdatas", "output/ageCount" };
Configuration conf = new Configuration();
if(args.length!=2||args==null){
System.err.print("Please Input Full Path!");
System.exit(1);
}
//Job job=new Job(new Configuration(), SexCount.class.getSimpleName());
// conf.set("fs.defaultFS", "hdfs://localhost:9000");
Job job=Job.getInstance(conf, "xbxl");
job.setJarByClass(SexCount.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapperClass(SexMap.class);
job.setReducerClass(SexReduce.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
// FileInputFormat.setInputPaths(job, new Path(args[0]));
// FileOutputFormat.setOutputPath(job, new Path(args[1]));
//job.waitForCompletion(true);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileSystem.get(conf).delete(new Path(args[1]), true);
FileOutputFormat.setOutputPath(job, new Path(args[1]));//第二次运行自动删除目录
System.exit(job.waitForCompletion(true)?0:1);
}
static class SexMap extends Mapper<LongWritable, Text, Text, IntWritable>{
public IntWritable one=new IntWritable(1);
protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper<LongWritable,Text,Text,IntWritable>.Context context) throws java.io.IOException ,InterruptedException {
String[] lines = value.toString().split("\t");
if(lines!=null && lines.length==39){
if(lines[38]!=null){
String sex=lines[38];
context.write(new Text(sex), one);
}
}
};
}
static class SexReduce extends Reducer<Text, IntWritable, Text, DoubleWritable>{
Map<String, Integer> maps=new HashMap<String,Integer>();
double total=0;
protected void reduce(Text key, java.lang.Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer<Text,IntWritable,Text,DoubleWritable>.Context context) throws java.io.IOException ,InterruptedException {
int sum=0;
for (@SuppressWarnings("unused") IntWritable count : values) {
sum=sum+count.get();
}
total=total+sum;///求出性别总数
maps.put(key.toString(), sum);
};
protected void cleanup(org.apache.hadoop.mapreduce.Reducer<Text,IntWritable,Text,DoubleWritable>.Context context) throws java.io.IOException ,InterruptedException {
Set<String> keySet = maps.keySet();
for (String str : keySet) {
int value = maps.get(str);
//求出比例
double percent=value/total;
context.write(new Text(str), new DoubleWritable(percent));
}
};
}
}
1.3 代码解释
- 这个是指定的输入输出路径
args = new String[] { "inputdatas", "output/ageCount" };
- 使用本地的一定要记得注释掉hdfs连接,否则识别的是hdfs上的路劲
// conf.set("fs.defaultFS", "hdfs://localhost:9000");
- 最后点击运行即可输出
1.4 如果想要导出jar包运行
- 点击运行项目
- 运行完成后,导出runner jar包
- 导入数据到input output
- 运行:
./bin/hadoop jar ./myapp/yourjarname.jar input output