MapReduce实现矩阵的转置
在大型数据处理中我们经常会用到Hadoop分布式数据处理技术,矩阵转置在矩阵相乘算法中是核心算法,而矩阵相乘也是算法中涉及的数学公式常有的。本篇介绍用MapReduce实现矩阵相乘。
编写mapper类:
package me.timlong.step1;
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class Mapper1 extends Mapper<LongWritable, Text, Text, Text> {
private Text outKey = new Text();
private Text outValue = new Text();
/*
* key : 行号1,2。。。
* value : 1 1_0,2_3,3_-1,4_2,5_-3
*/
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] rowAndLine = value.toString().split("\t");
//矩阵的行号
String row = rowAndLine[0];
String[] lines = rowAndLine[1].split(",");
//["1_0","2_3","3_-1","4_2","5_-3"]
for(int i = 0; i < lines.length; i ++) {
String column = lines[i].split("_")[0];
String valueStr = lines[i].split("_")[1];
//key: 列号 value : 行号_值
outKey.set(column);
outValue.set(row + "_" + valueStr);
context.write(outKey, outValue);
}
}
}
编写reducer类:
package me.timlong.step1;
import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class Reducer1 extends Reducer<Text, Text, Text, Text> {
private Text outKey = new Text();
private Text outValue = new Text();
//key: 列号 value : [行号_值, 行号_值, 行号_值, ...]
@Override
protected void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
for(Text text : values) {
//text : 行号_值
sb.append(text + ",");
}
String line = null;
if(sb.toString().endsWith(",")) {
line = sb.substring(0, sb.length() - 1);
}
outKey.set(key);
outValue.set(line);
context.write(outKey, outValue);
}
}
编写主方法类,包含run()方法,主方法调用:
package me.timlong.step1;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class MR1 {
//输入文件的相对路径
private static String inPath = "/matrix/step1_input/matrix2.txt";
//输出文件的相对路径
private static String outPath = "/matrix/step1_output";
//hdfs地址 待修改
//hdfs = "hdfs://10.255.248.61:9000"
private static String hdfs = "hdfs://10.255.248.61:9000";
public int run() {
try {
//创建job的配置类
Configuration conf = new Configuration();
//设置hdfs地址
conf.set("fs.defaultFS", hdfs);
//创建一个job实例
Job job= Job.getInstance(conf, "step1");
//设置job主类
job.setJarByClass(MR1.class);
//设置job的Mapper类和Reducer类
job.setMapperClass(Mapper1.class);
job.setReducerClass(Reducer1.class);
//设置mapper输出的类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
//设置reducer输出的类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileSystem fs = FileSystem.get(conf);
//设置输入和输出的路径
Path inputPath = new Path(inPath);
if(fs.exists(inputPath)) {
FileInputFormat.addInputPath(job, inputPath);
}
Path outputPath = new Path(outPath);
fs.delete(outputPath, true);
FileOutputFormat.setOutputPath(job, outputPath);
System.out.println("run here");
return job.waitForCompletion(true)? 1 : -1;
} catch (IOException e) {
e.printStackTrace();
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
return -1;
}
public static void main(String[] args) {
int result = -1;
if(1 == result) {
System.out.println("step1运行成功。。。");
}else {
System.out.println("step1运行失败。。。");
}
}
}
运行成功后在matrix目录下将会生成一份包含原矩阵转置之后的txt文件的文件夹“step1_output”,自己尝试一下吧!
希望每天进步一点!
![再见](http://static.blog.csdn.net/xheditor/xheditor_emot/default/bye.gif)