Hadoop入门(10)--分布式缓存DistributedCache

最新推荐文章于 2024-07-19 00:13:26 发布

oifengo

最新推荐文章于 2024-07-19 00:13:26 发布

阅读量1k

点赞数 5

分类专栏： # Hadoop入门文章标签： Hadoop

本文链接：https://blog.csdn.net/weixin_39381833/article/details/81294955

版权

Hadoop入门专栏收录该内容

38 篇文章 3 订阅

订阅专栏

DistributedCache的原理

执行MapReduce时，可能Mapper之间之间需要共享一些信息，若信息量不大，可以将起从HDFS加载到内存中，这就是Hadoop的分布式缓存机制

实例统计在清单中的单词的次数

1 将单词清单加载到缓存中
这里写图片描述
2 输入第一行和单词清单中的单词进行逐个对比

3 对比后将对比的结果进行输出

如何使用DistributedCache

1 在main方法中加载共享文件的HDFS路径，路径可以是目录也可以是文件。在map阶段使用别名需要在路径末尾追加“#”+别名

String cache = “hdfs://10.105.xxx.xxx:8082/cache/file”;//目录或文件
cache = cache + “#myfile”；//myfile是文件的别名
job.addCacheFile(new Path(cache),toUri().conf);//添加到job设置

2 在Mapper类或Reduce的setup方法中，用输入流获取分布式缓存的文件
在mapper类中的setup方法

protected void setup(Context context)throws IOException,
          interruptedException{
        FileReader reader = new FileReader("myfile");//路径参数为别名
        BufferedReader br = new BufferedReader(reader);
        ......
        }

此方法只执行一次，在map方法执行之前，每个从节点各自都缓存一份相同的共享数据。若共享数据太大，可以将共享数据分批缓存，重复执行作业。

MapReduce实现矩阵相乘

数学中的矩阵相乘
这里写图片描述
矩阵的规范表示

解题思路

1 矩阵的转置
这里写图片描述
2 将整个右矩阵载入分布式缓存

4 左矩阵作为Map输入

在Map执行之前将缓存的右矩阵以行为单位放入list
在Map计算时从List中取出所有行分别与输入行相乘

code

step1 转置 Mapper类

package step1；
//Mapper泛型有四个参数（行号，输入value类型，输出key类型，输出value类型）
public class Mapper1 extend Mapper<LongWritable,Text，Text，Text，Text>
//定义输出的key和value
    private Text outKey = new Text();
    private Text outValue = new Text();
/**
 *key:1
 *value:1 1_0,2_3,3_-1,4_2,5_3
 */

//重写Mapper方法
    protected void map(LongWritable key,Text value,Context context)throws IOException,InterruptedException{
    //将value转化成string类型，再调用split方法
    String[] rowAndLine = value.toString().split("\t");

    //获得矩阵的行号
    String row = rowAndLine[0];
    //再进行分割矩阵，先分割，
    String[] lines = rowAndLine[1].split(",");

    for(int i = 0;i < lines.length;i++){
        //列号
        String column = lines[i].split("_")[0];
        //值
        String valueStr = lines[i].split("_")[1];

        //输出为key：列号 value：行号_值
        outkey.set(column);
        outValue.set(row+"_"+valueStr);
        context.write(outKey,outValue);
    }
    }

step2 Reduce类

//Mapper类的输出参数应该和Reduce类输出参数一致
//Reduce(输入的key类型，输入的value类型，输出key类型，出value)
public class Reducer1 extend Reduce<Text,Text，value>{
    private Text outKey = new Text();
    private Text outValue = new Text();

    //重写Reduce方法
    @Override
    protected void reduce(Text key,Iterable<Text>values,Context context)
        throws IOEexception,InterrputedException{
        StringBuilder sb = new StringBuilder();
        for(Text text:values){
        sb.append(text+",");    
        }
        String line = null;
        if(sb toString().endsWith(",")){
            line = sb.substring(0,sb.length()-1);
            }
            //设置输出的key
            outKey.set(key);
            outValue.set(key);

            //输出
            context.write(outKey,outValue);
        }
    }

step3 编写方法类创建这个作业

public class MR1{
    //输入输出路径的地址
    private static String inPath = "";
    Private static String outPath = "";

    //HDFS的地址
    private static String hdfs = "hdfs://master:9000";

    public int num(){
        //创建job配置类
        Configuration conf = new Configuration();
        //设置hdfs的地址
        conf.set("fs.defaultFS",hdfs);
        //创建一个job实例
        Job job = Job.getInstrance(conf,"step1");
        //设置job的主类
        Job.setJarByClass(MR1.class);
        //设置job的Mapper类和Reduce类
        job.setMapperClass(Mapper1.class);
        job.setReduceClass(Reduce11.class);

        //设置Mapper输出的类型
        job.setMapOutputKeyClass(Text.class);
        job.setMaoOutputValueClass(Text.class);

        //设置Reduce输出的类型、
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileSystem fs = FileSystem.get(conf);
        //输入和输出路径
        Path inputPath = new Path(inPath);
        Path outputPath = new Path(outPath);

        //若路径正确，则添加；否则报异常
        if(fs.exists(path)){
            FileInputFormat.addInputPath(job,path);
            }
        Path outputPath = new Path(outPath);
        fs.delete(outputPath,true);

        FileOutputFormat.setOutputPath(job,outputPath);
        //返回运行状态
        return job.waitForCompletion(true)?1:-1;

        return -1;
        }
        public static void main(String[] args){
        int result = -1;
        result = new MR1()run();
        if(result == 1){
            System.out.println("step运行成功")；}
        else if(result == -1){
            System.out.println("step运行失败")；}}
}