MapReduce之使用马尔可夫模型的智能邮件营销（三）

最新推荐文章于 2020-03-22 23:57:00 发布

路人张的鱼生

最新推荐文章于 2020-03-22 23:57:00 发布

阅读量273

点赞数

分类专栏： MapReduce 文章标签： MapReduce

本文链接：https://blog.csdn.net/zhangdy12307/article/details/100564532

版权

MapReduce 专栏收录该内容

41 篇文章 8 订阅

订阅专栏

MapReduce之使用马尔可夫模型的智能邮件营销（三）

在上一篇博文MapReduce之使用马尔可夫模型的智能邮件营销（二）中，我们继续使用已经处理好的如下图所示的数据，生成所需要的状态序列
已经处理好的数据

MapReduce生成状态序列

接下来展示将上一个阶段的MapReduce生成的交易序列转换为一个状态序列
交易序列：

customer-id ( $Date_1$ , $Amount_1$ ) ; ( $Date_2$ , $Amount_2$ );…( $Date_N,Amount_N$ )

状态序列如下：
customer-id, $State_1,State_2,...,State_N$

接下来我们通过Mapreduce中的映射器将交易序列转换为一组马尔可夫状态链，状态由一个二字母代号表示，各个字母的定义如表1所示

表1：指示马尔可夫链状态的字母

上一次交易后经过的时间	与前次交易相比的交易额
S：小	L：显著小于
M：中	E：基本相同
L：大	G：显著大于

因此可以得到9个状态，如表2所示

表2：两字母马尔可夫链状态名和定义

状态名	上一次交易后经过的时间：与前次交易相比的交易额
SL	小：显著小于
SE	小：基本相同
SG	小：显著大于
ML	中：显著小于
ME	中：基本相同
MG	中：显著大于
LL	大：显著小于
LE	大：基本相同
LG	大：显著大于

可以看到马尔可夫模型有9个状态（ 9 $\times$ 9 转移矩阵）

mapper阶段任务

将上一个MapReduce阶段生成的数据进行处理，生成状态序列

mapper阶段编码

package com.deng.MarkovState;

import com.deng.util.DateUtil;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class StateTrainitionMapper extends Mapper<LongWritable,Text, NullWritable, Text> {
    public String customerID;
    public int amount,priorAmount;
    public long date,priorDate,daysDiff,amountDiff;
    public long aDay=24*60*60*1000;

    public void map(LongWritable key,Text value,Context context){
       String line=value.toString();
       String[] tokens=line.split(",");
       if(tokens.length<5){
           return ;
       }
       StringBuffer sequence=new StringBuffer();
       customerID=tokens[0];
       int i=4;
       String dd,ad;
       while(i<tokens.length){
            amount=Integer.parseInt(tokens[i]);
            priorAmount=Integer.parseInt(tokens[i-2]);
           try {
               date= DateUtil.getDateAsMilliSeconds(tokens[i-1]);
               priorDate=DateUtil.getDateAsMilliSeconds(tokens[i-3]);
           } catch (Exception e) {
               e.printStackTrace();
           }
           daysDiff=(date-priorDate)/aDay;
           amountDiff=amount-priorAmount;
//与上一次交易日期相差小于30天则为小
//小于60天为中
//其他为大
           if(daysDiff<30){
               dd="S";
           }else if(daysDiff<60){
               dd="M";
           }else {
               dd="L";
           }
//与上次交易额相比小于上次交易额%90为小
//小于上次交易额%110为中
//其他为大
           if(priorAmount<0.9*amount){
               ad="L";
           }else if(priorAmount<1.1*amount){
               ad="E";
           }else{
               ad="G";
           }
           sequence.append(dd).append(ad).append(",");
           i+=2;
       }
       sequence.deleteCharAt(sequence.length()-1);
        try {
            context.write(NullWritable.get(),new Text(customerID+","+sequence.toString()));
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }
}

reducer阶段任务

该MapReduce阶段任务在mapper阶段可以完成，不需要reducer阶段

驱动程序如下

package com.deng.MarkovState;

import com.deng.util.FileUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class MarkovStateDriver {
    public static void main(String[] args) throws Exception {
//        FileUtil.deleteDirs("output");
//        FileUtil.deleteDirs("output2");
//        FileUtil.deleteDirs("MarkovState");
        Configuration conf=new Configuration();
//        String[] otherArgs=new String[]{"input/smart_email_training.txt","output"};
//        Job secondSortJob=new Job(conf,"Markov");
//        FileInputFormat.setInputPaths(secondSortJob,new Path(otherArgs[0]));
//        FileOutputFormat.setOutputPath(secondSortJob,new Path(otherArgs[1]));
//        secondSortJob.setJarByClass(MarkovStateDriver.class);
//        secondSortJob.setMapperClass(SecondarySortProjectionMapper.class);
//        secondSortJob.setReducerClass(SecondarySortProjectionReducer.class);
//        secondSortJob.setMapOutputKeyClass(CompositeKey.class);
//        secondSortJob.setMapOutputValueClass(PairOfLongInt.class);
//        secondSortJob.setOutputKeyClass(NullWritable.class);
//        secondSortJob.setOutputValueClass(Text.class);
//        secondSortJob.setCombinerKeyGroupingComparatorClass(CompositeKeyComparator.class);
//        secondSortJob.setGroupingComparatorClass(NaturalKeyGroupingComparator.class);
//        if((secondSortJob.waitForCompletion(true)?0:1)==0){
            Job stateTransition=new Job(conf,"MarkovStateTransition");
            FileInputFormat.setInputPaths(stateTransition,new Path("output/part-r-00000"));
            FileOutputFormat.setOutputPath(stateTransition,new Path("output2"));
            stateTransition.setJarByClass(MarkovStateDriver.class);
            stateTransition.setMapperClass(StateTrainitionMapper.class);
            stateTransition.setNumReduceTasks(0);
            stateTransition.setOutputKeyClass(Text.class);
            stateTransition.setOutputValueClass(Text.class);
            System.exit(stateTransition.waitForCompletion(true)?0:1);
      }
 }

运行结果如下:
在这里插入图片描述
在下一篇博文中，将展示如何用MapReduce生成马尔可夫状态转移矩阵