MapReduce之使用马尔可夫模型的智能邮件营销(四)

MapReduce之使用马尔可夫模型的智能邮件营销(四)

在这一篇博客中,继续使用上一篇博客MapReduce之使用马尔可夫模型的智能邮件营销(三) 生成的状态序列,来生成马尔可状态转移矩阵

使用MapReduce生成马尔可夫状态转移矩阵

这个MapReduce阶段的目标是生成一个马尔可夫状态转移矩阵,这个阶段的输入是状态序列,格式如下:
    c u s t o m e r − i d , S t a t e 1 , S t a t e 2 , . . . , S t a t e n customer-id,State_1,State_2,...,State_n customerid,State1,State2,...,Staten
输出为一个 N × N N \times N N×N的矩阵,这个N是马尔可夫链模型的状态数(在这里N为9),矩阵中的各项指示从一个状态转移到另一个状态的概率,MapReduce阶段主要目的是统计状态转移的实例数,由于N=9,所以可能会得到81个状态转移。

mapper阶段任务

这个阶段的任务是处理状态转移,从状态序列中的获取customer-id的每一个“从状态”和“到状态” ,生成 &lt; c u s t o m e r − i d , ( S t a t e 1 , S t a t e 2 ) &gt; &lt;customer-id,(State_1,State_2)&gt; <customerid,(State1,State2)>的键值对

mapper阶段编码

package com.deng.MarkovState;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class MarkovStateTransitionModelMapper extends Mapper<LongWritable, Text,PairOfStrings, IntWritable> {
    private PairOfStrings reducerKey = new PairOfStrings();
    private static final IntWritable ONE  = new IntWritable(1);
    int k=0;
    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line=value.toString();
        String[] items = line.split(",");
        if (items.length > 2) {
            for (int i = 1; i < (items.length -1); i++) {
                reducerKey=new PairOfStrings(items[i],items[i+1]);
                context.write(reducerKey, ONE);
            }
        }
    }
}

其中自定义类PairOfString类设计如下

package com.deng.MarkovState;

import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class PairOfStrings implements Writable,WritableComparable<PairOfStrings> {
    private String leftElement;
    private String rightElement;

    public PairOfStrings(){}

    public PairOfStrings(String leftElement,String rightElement){
        set(leftElement,rightElement);
    }

    public void set(String leftElement,String rightElement){
        this.leftElement=leftElement;
        this.rightElement=rightElement;
    }

    public String getLeftElement() {
        return leftElement;
    }

    public String getRightElement() {
        return rightElement;
    }

    @Override
    public int compareTo(PairOfStrings o) {
        if(this.leftElement.compareTo(o.leftElement)!=0){
            return this.leftElement.compareTo(o.leftElement);
        }else if(this.rightElement!=o.rightElement){
            return this.rightElement.compareTo(o.rightElement);
        }else{
            return 0;
        }
    }

    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeUTF(leftElement);
        dataOutput.writeUTF(rightElement);
    }

    @Override
    public void readFields(DataInput dataInput) throws IOException {
        this.leftElement=dataInput.readUTF();
        this.rightElement=dataInput.readUTF();
    }

    public String toString(){
        StringBuffer sb=new StringBuffer();
        sb.append("PairOfString[").append(getLeftElement()).append(",").append(getRightElement()).append(']');
        return sb.toString();
    }
}

combine阶段任务

combine阶段主要是对mapper阶段产生的数据在本地节点进行优化,减少数据传输量,并生成(“从状态”,“到状态”)的部分计数,生成 &lt; ( S t a t e 1 , S t a t e 2 ) , c o u n t &gt; &lt;(State_1,State_2),count&gt; <(State1,State2),count>的键值对

combine阶段编码

package com.deng.MarkovState;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class MarkovStateTransitionModelCombiner extends Reducer<PairOfStrings, IntWritable,PairOfStrings,IntWritable> {
    public void reduce(PairOfStrings key,Iterable<IntWritable> values,Context context){
        int partialSum=0;
        for(IntWritable value:values){
            partialSum+=value.get();
        }

        try {
            context.write(key,new IntWritable(partialSum));
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

reducer阶段任务

统计所有的("从状态"和“到状态”)的计数

reducer阶段编码

package com.deng.MarkovState;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class MarkovStateTransitionModelReducer extends Reducer<PairOfStrings, IntWritable, Text, IntWritable> {

    protected void reduce(PairOfStrings key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int finalCount = 0;
        for (IntWritable value : values) {
            finalCount += value.get();
        }
        String fromState = key.getLeftElement();
        String toState = key.getRightElement();
        String outputkey = fromState + "," + toState+",";
        context.write(new Text(outputkey), new IntWritable(finalCount));
    }

}

所有阶段完整的驱动程序如下

采用作业链的方式实现

package com.deng.MarkovState;

import com.deng.util.FileUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MarkovStateDriver {
    public static void main(String[] args) throws Exception {
        FileUtil.deleteDirs("output");
        FileUtil.deleteDirs("output2");
        FileUtil.deleteDirs("MarkovState");
        Configuration conf=new Configuration();
        String[] otherArgs=new String[]{"input/smart_email_training.txt","output"};
        Job secondSortJob=new Job(conf,"Markov");
        FileInputFormat.setInputPaths(secondSortJob,new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(secondSortJob,new Path(otherArgs[1]));
        secondSortJob.setJarByClass(MarkovStateDriver.class);
        secondSortJob.setMapperClass(SecondarySortProjectionMapper.class);
        secondSortJob.setReducerClass(SecondarySortProjectionReducer.class);
        secondSortJob.setMapOutputKeyClass(CompositeKey.class);
        secondSortJob.setMapOutputValueClass(PairOfLongInt.class);
        secondSortJob.setOutputKeyClass(NullWritable.class);
        secondSortJob.setOutputValueClass(Text.class);
        secondSortJob.setCombinerKeyGroupingComparatorClass(CompositeKeyComparator.class);
        secondSortJob.setGroupingComparatorClass(NaturalKeyGroupingComparator.class);
        if((secondSortJob.waitForCompletion(true)?0:1)==0){
            Job stateTransition=new Job(conf,"MarkovStateTransition");
            FileInputFormat.setInputPaths(stateTransition,new Path("output/part-r-00000"));
            FileOutputFormat.setOutputPath(stateTransition,new Path("output2"));
            stateTransition.setJarByClass(MarkovStateDriver.class);
            stateTransition.setMapperClass(StateTrainitionMapper.class);
            stateTransition.setNumReduceTasks(0);
            stateTransition.setOutputKeyClass(Text.class);
            stateTransition.setOutputValueClass(Text.class);
            if((stateTransition.waitForCompletion(true)?0:1)==0){
                Job markovState=new Job(conf,"MarkState");
                markovState.setJarByClass(MarkovStateDriver.class);
                markovState.setMapperClass(MarkovStateTransitionModelMapper.class);
                markovState.setReducerClass(MarkovStateTransitionModelReducer.class);
            //    markovState.setPartitionerClass(MarkovStateTransitionModelPartitioner.class);
            //    markovState.setNumReduceTasks(81);
                markovState.setMapOutputKeyClass(PairOfStrings.class);
                markovState.setMapOutputValueClass(IntWritable.class);
                markovState.setOutputKeyClass(Text.class);
                markovState.setOutputValueClass(IntWritable.class);
            //    markovState.setCombinerKeyGroupingComparatorClass(MarkovStateKeyComparator.class);
                FileInputFormat.setInputPaths(markovState,new Path("output2/part-m-00000"));
                FileOutputFormat.setOutputPath(markovState,new Path("MarkovState"));
                System.exit(markovState.waitForCompletion(true)?0:1);
            }
        }
    }
}

运行结果如下
在这里插入图片描述

这样就得到了状态转移的实例数,在下一个博客中介绍如何生成马尔可夫模型

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值