MapReduce编程实现矩阵转置

最新推荐文章于 2024-07-24 21:14:44 发布

Tim_long

最新推荐文章于 2024-07-24 21:14:44 发布

阅读量709

点赞数

分类专栏： big data 文章标签： hadoop MapReduce Matrix 矩阵转置

本文链接：https://blog.csdn.net/qq_38161676/article/details/79122804

版权

big data 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

MapReduce实现矩阵的转置

在大型数据处理中我们经常会用到Hadoop分布式数据处理技术，矩阵转置在矩阵相乘算法中是核心算法，而矩阵相乘也是算法中涉及的数学公式常有的。本篇介绍用MapReduce实现矩阵相乘。

首先，搭建eclipse的hadoop开发环境，可以参考这里。在hdfs根目录下新建目录“matrix”，向其中上传一份事先按照指定格式编写好的矩阵格式，如下图

编写mapper类：

package me.timlong.step1;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class Mapper1 extends Mapper<LongWritable, Text, Text, Text> {
	
	private Text outKey = new Text();
	private Text outValue = new Text();
	
	/*
	 * key : 行号1,2。。。
	 * value ： 1	1_0,2_3,3_-1,4_2,5_-3
	 */
	@Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
		String[] rowAndLine = value.toString().split("\t");
		
		//矩阵的行号
		String row = rowAndLine[0];
		String[] lines = rowAndLine[1].split(",");
		
		//["1_0","2_3","3_-1","4_2","5_-3"]
		for(int i = 0; i < lines.length; i ++) {
			String column = lines[i].split("_")[0];
			String valueStr = lines[i].split("_")[1];
			
			//key: 列号  value ： 行号_值
			outKey.set(column);
			outValue.set(row + "_" + valueStr);
			context.write(outKey, outValue);
		}
	}
}

编写reducer类：

package me.timlong.step1;

import java.io.IOException;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class Reducer1 extends Reducer<Text, Text, Text, Text> {

	private Text outKey = new Text();
	private Text outValue = new Text();
	
//key: 列号 value ： 	[行号_值, 行号_值, 行号_值, ...]	
	@Override
	protected void reduce(Text key, Iterable<Text> values, Context context)
			throws IOException, InterruptedException {
		StringBuilder sb = new StringBuilder();
		
		for(Text text : values) {
			//text : 行号_值
			sb.append(text + ",");
		}	
		String line = null;
		if(sb.toString().endsWith(",")) {
			line = sb.substring(0, sb.length() - 1);
		}	
		outKey.set(key);
		outValue.set(line);
		context.write(outKey, outValue);		
	}	
}

编写主方法类，包含run()方法，主方法调用：

package me.timlong.step1;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MR1 {

	//输入文件的相对路径
	private static String inPath = "/matrix/step1_input/matrix2.txt";
	//输出文件的相对路径
	private static String outPath = "/matrix/step1_output";
	//hdfs地址  待修改

	//hdfs = "hdfs://10.255.248.61:9000"
	private static String hdfs = "hdfs://10.255.248.61:9000";

	public int run() {
		try {
			//创建job的配置类
			Configuration conf = new Configuration();
			//设置hdfs地址
			conf.set("fs.defaultFS", hdfs);
			//创建一个job实例
			Job job= Job.getInstance(conf, "step1");

			//设置job主类
			job.setJarByClass(MR1.class);
			//设置job的Mapper类和Reducer类
			job.setMapperClass(Mapper1.class);
			job.setReducerClass(Reducer1.class);

			//设置mapper输出的类型
			job.setMapOutputKeyClass(Text.class);
			job.setMapOutputValueClass(Text.class);

			//设置reducer输出的类型
			job.setOutputKeyClass(Text.class);
			job.setOutputValueClass(Text.class);

			FileSystem fs = FileSystem.get(conf);
			//设置输入和输出的路径
			Path inputPath = new Path(inPath);
			if(fs.exists(inputPath)) {
				FileInputFormat.addInputPath(job, inputPath);
			}

			Path outputPath = new Path(outPath);
			fs.delete(outputPath, true);

			FileOutputFormat.setOutputPath(job, outputPath);

			
			System.out.println("run here");
			return job.waitForCompletion(true)? 1 : -1;
		} catch (IOException e) {
			e.printStackTrace();
		} catch (ClassNotFoundException e) {
			e.printStackTrace();
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
		return -1;
	}

	public static void main(String[] args) {
		int result = -1;
		if(1 == result) {
			System.out.println("step1运行成功。。。");
		}else {
			System.out.println("step1运行失败。。。");
		}
	}
}

运行成功后在matrix目录下将会生成一份包含原矩阵转置之后的txt文件的文件夹“step1_output”，自己尝试一下吧！

希望每天进步一点！

Tim_long

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
MapReduce编程实现矩阵转置

MapReduce实现矩阵的转置在大型数据处理中我们经常会用到Hadoop分布式数据处理技术，矩阵转置在矩阵相乘算法中是核心算法，而矩阵相乘也是算法中涉及的数学公式常有的。本篇介绍用MapReduce实现矩阵相乘。首先，搭建eclipse的hadoop开发环境，可以参考这里。在hdfs根目录下新建目录“matrix”，向其中上传一份事先按照指定格式编写好的矩阵格式，如下图
复制链接

扫一扫

专栏目录