实验八mapreduce编程运用8

Elik-hb

已于 2024-04-21 12:35:53 修改

阅读量665

点赞数 19

分类专栏：大数据/mapreduce 文章标签： mapreduce 大数据

于 2024-04-21 12:03:00 首次发布

本文链接：https://blog.csdn.net/qq_65099052/article/details/138029583

版权

大数据/mapreduce 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

本文介绍了在Hadoop2.7环境下使用MapReduce进行数据处理的步骤，包括准备数据集、设置Java环境、启动Hadoop并清理旧文件，以及提供了一个计算性别比例的MapReduce代码示例。详细讲解了如何配置输入输出路径和导出jar包运行命令。

摘要由CSDN通过智能技术生成

title: 实验八mapreduce编程运用8
date: 2023-10-10 13:35:49
categories:

mapreduce
tags:

1.2 准备好java环境

- 随便导入一个项目，例如：ch11ppsgs品牌制造商的个数这个项目文件
   
> **注意：** 一定要导入的jar包有：
  
  - 1.“/usr/local/hadoop/share/hadoop/yarn”目录下的所有JAR包,注意，不包括目录lib、sources和test；
  
  - 2.“/usr/local/hadoop/share/hadoop/yarn/lib”目录下的所有JAR包;

  
1.3 启动hadoop，清理以前input和output里面的文件（可选，因为你可以本地运行项目）
```bash
(base) hadoop@ubuntu:~$ cd /usr/local/hadoop

(base) hadoop@ubuntu:/usr/local/hadoop$ ./sbin/start-dfs.sh

(base) hadoop@ubuntu:/usr/local/hadoop$ ./bin/hdfs dfs -rm -r input

(base) hadoop@ubuntu:/usr/local/hadoop$ ./bin/hdfs dfs -rm -r output

解决问题

注意： 开始解决问题

可以把所有项目导入进去虚拟机，如图：我导入了所有的项目，那我就不需要一个一个配置环境了（导入jar包）
1.1 项目结构如图：

1.2 代码举例，这个是统计男女占比的，这个是本地运行，读取本地路劲

package demo6;
///性别比例
import java.util.HashMap;
import java.util.Map;
import java.util.Set;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class SexCount {
public static void main(String[] args) throws Exception {
  args = new String[] { "inputdatas", "output/ageCount" };
  
  Configuration conf = new Configuration();
 if(args.length!=2||args==null){
   System.err.print("Please Input Full Path!");
   System.exit(1);
 }
 //Job job=new Job(new Configuration(), SexCount.class.getSimpleName()); 

//    conf.set("fs.defaultFS", "hdfs://localhost:9000");
 Job job=Job.getInstance(conf, "xbxl");
 
 
 job.setJarByClass(SexCount.class);
 
 job.setInputFormatClass(TextInputFormat.class);
 job.setOutputFormatClass(TextOutputFormat.class);
 
 job.setMapperClass(SexMap.class);
 job.setReducerClass(SexReduce.class);
 
 job.setMapOutputKeyClass(Text.class);
 job.setMapOutputValueClass(IntWritable.class);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(DoubleWritable.class);
 
//  FileInputFormat.setInputPaths(job, new Path(args[0]));
//  FileOutputFormat.setOutputPath(job, new Path(args[1]));
 
 //job.waitForCompletion(true);
 FileInputFormat.addInputPath(job, new Path(args[0]));
 FileSystem.get(conf).delete(new Path(args[1]), true);
 FileOutputFormat.setOutputPath(job, new Path(args[1]));//第二次运行自动删除目录
 System.exit(job.waitForCompletion(true)?0:1);
 
 
}
static class SexMap extends Mapper<LongWritable, Text, Text, IntWritable>{
 public IntWritable one=new IntWritable(1);
 protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper<LongWritable,Text,Text,IntWritable>.Context context) throws java.io.IOException ,InterruptedException {
   String[] lines = value.toString().split("\t");
   if(lines!=null && lines.length==39){
     if(lines[38]!=null){
       String sex=lines[38];
       context.write(new Text(sex), one);
   }
   }
 };
}
static class SexReduce extends Reducer<Text, IntWritable, Text, DoubleWritable>{
 Map<String, Integer> maps=new HashMap<String,Integer>();
 double total=0;
 
 protected void reduce(Text key, java.lang.Iterable<IntWritable> values, org.apache.hadoop.mapreduce.Reducer<Text,IntWritable,Text,DoubleWritable>.Context context) throws java.io.IOException ,InterruptedException {
   int sum=0;
   for (@SuppressWarnings("unused") IntWritable count : values) {
     sum=sum+count.get();
   }
   total=total+sum;///求出性别总数　
   maps.put(key.toString(), sum);
 };
 protected void cleanup(org.apache.hadoop.mapreduce.Reducer<Text,IntWritable,Text,DoubleWritable>.Context context) throws java.io.IOException ,InterruptedException {
   Set<String> keySet = maps.keySet();
   for (String str : keySet) {
     int value = maps.get(str);
     //求出比例
     double percent=value/total;
     context.write(new Text(str), new DoubleWritable(percent));
   }
   
 };
}
}

1.3 代码解释

这个是指定的输入输出路径

args = new String[] { "inputdatas", "output/ageCount" };

使用本地的一定要记得注释掉hdfs连接，否则识别的是hdfs上的路劲

//    conf.set("fs.defaultFS", "hdfs://localhost:9000");

最后点击运行即可输出

1.4 如果想要导出jar包运行

点击运行项目
运行完成后，导出runner jar包
导入数据到input output
运行：

./bin/hadoop jar ./myapp/yourjarname.jar input output

Elik-hb

关注

19
点赞
踩
17

收藏

觉得还不错? 一键收藏
打赏
1
评论
实验八mapreduce编程运用8

hadoop2.7。
复制链接

扫一扫

专栏目录

实验八mapreduce编程运用8

目录

环境介绍

. 前提准备工作

解决问题