写map-reduce，两份数据，一份是设备id+用户id，一份是设备id+点击的广告+点击时间，找出每个用户id每天3条最新的广告点击记录

最新推荐文章于 2022-08-04 22:44:47 发布

ambitfly

最新推荐文章于 2022-08-04 22:44:47 发布

阅读量671

点赞数 1

分类专栏：大数据文章标签： hadoop mapreduce spark 大数据分布式

本文链接：https://blog.csdn.net/qq_42575907/article/details/123492804

版权

大数据专栏收录该内容

29 篇文章 4 订阅

订阅专栏

题目

写map-reduce，两份数据，一份是设备id+用户id，一份是设备id+点击的广告+点击时间，找出每个用户id每天3条最新的广告点击记录
文件1：diviceid,userid
文件2：diviceid,adname,cdate

设定一个用户id有多个设备id，一个设备id不能有多个用户id

思考

由于要用diviceid join 然后对userid 排序，所以要进行两次map-reduce

第一次map-reduce对diviceid join 得到 userid,adname,cdate

第二次map-reduce 得到 userid，（（adname，cdate），（adname，cdate），（adname，cdate））

数据生成

package com.ambitfly.test001;

import java.util.ArrayList;
import java.util.UUID;

/**
 *
 * 1.写map-reduce，两份数据，一份是设备id+用户id，一份是设备id+点击的广告+点击时间，找出每个用户id每天3条最新的广告点击记录
 * diviceid,userid
 * diviceid,adname,cdate
 */
public class GenerateData {
    public static void main(String[] args) {
        //System.out.println((int)(Math.random()*10));
        // 造点数据
        //定义设备ID,用户ID  一个用户可以有多个设备ID,假定多个用户不能登录一个设备ID
        ArrayList<String> diviceIdList = new ArrayList<String>();
        //定义广告
        String[] ads = {"手机5","手机6","睫毛膏","雨伞","足球","剃须刀","jAVA基础课程","洗脸巾","卫生纸"};

        for (int i = 0; i < 10; i++) {
            int userid = (int)(Math.random()*100000000);
            int num = (int)(Math.random()*10);
            int divicenum = num > 8 ? 2 : 1;
            while (divicenum >= 1){
                String diviceId = UUID.randomUUID().toString();
                System.out.println(userid+","+diviceId);
                diviceIdList.add(diviceId);
                divicenum--;
            }
        }
        System.out.println();
        System.out.println();
        // 随机
        for (String diviceId : diviceIdList) {
            int num = (int)(Math.random()*10);
            while (num >= 1){
                String ad = ads[(int)(Math.random()*9)];
                int time = (int)(Math.random()*1000);
                System.out.println(diviceId+","+ad+","+time);
                num--;
            }
        }
    }
}

第一次map-reduce

map识别切片的文件名对两份数据处理并做标记输出

reduce 识别标记做join后输出

UserClickTop10OneMapper

package com.ambitfly.test001;

import jdk.nashorn.internal.ir.CallNode;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;

import java.io.IOException;

public class UserClickTop10OneMapper extends Mapper<LongWritable, Text,Text,Text> {
    private String filename;
    Text keyOut = new Text();
    Text valueOut = new Text();

    @Override
    protected void setup(Context context) throws IOException, InterruptedException {
        // 获取切片对应的文件名称
        InputSplit inputSplit = context.getInputSplit();
        filename = ((FileSplit)inputSplit).getPath().getName();
    }

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        if(filename.contains("user")){
            String[] user_divice = line.split(",");
            // 83198328,c2eb780d-4754-485a-bbac-143d9fb4cd52
            keyOut.set(user_divice[1]);
            valueOut.set("user_divice"+":"+user_divice[0]);
            context.write(keyOut,valueOut);
        }else {
            String[] divice_click = line.split(",");
            // c2eb780d-4754-485a-bbac-143d9fb4cd52,洗脸巾,483
            keyOut.set(divice_click[0]);
            valueOut.set("divice_click"+":"+divice_click[1]+","+divice_click[2]);
            context.write(keyOut,valueOut);

        }

    }
}

UserClickTop10OneReducer

package com.ambitfly.test001;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.ArrayList;

public class UserClickTop10OneReducer extends Reducer<Text,Text,Text, NullWritable> {
    Text keyOut = new Text();
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        String userid = "null";
        ArrayList<String> divice_clickList = new ArrayList<String>();
        for (Text value : values) {
            String line = value.toString();
            if(line.startsWith("user_divice")){
                userid = line.replace("user_divice:", "");
            }else {
                divice_clickList.add(line.replace("divice_click:", ""));
            }
        }

        for (String line : divice_clickList) {
            keyOut.set(userid+","+line);
            context.write(keyOut,NullWritable.get());
        }


    }
}

UserClickTop10OneDriver

package com.ambitfly.test001;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.File;
import java.io.IOException;

public class UserClickTop10OneDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        job.setJarByClass(UserClickTop10OneDriver.class);

        job.setMapperClass(UserClickTop10OneMapper.class);
        job.setReducerClass(UserClickTop10OneReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(NullWritable.class);


        FileInputFormat.addInputPath(job, new Path("E:\\code\\hadoop3\\target\\classes\\data\\test001\\input"));

        File file = new File("E:\\code\\hadoop3\\target\\classes\\data\\test001\\output1");
        if(file.exists()){
            for (File listFile : file.listFiles()) {
                listFile.delete();
            }
            file.delete();
        }
        FileOutputFormat.setOutputPath(job, new Path("E:\\code\\hadoop3\\target\\classes\\data\\test001\\output1"));
        //7提交job
        boolean b = job.waitForCompletion(true);
        System.exit(b?0:1);

    }
}

第二次map-reduce

输入为上一个map-reduce的输出

map 把userid为key，adname,cdate为value输出

reduce 用定长数组的方法求TOPN输出

UserClickTop10TwoMapper

package com.ambitfly.test001;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.Map;

public class UserClickTop10TwoMapper extends Mapper<LongWritable, Text,Text,Text> {
    Text keyOut = new Text();
    Text valueOut = new Text();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        String line = value.toString();
        String[] user_click = line.split(",");
        keyOut.set(user_click[0]);
        valueOut.set(user_click[1]+","+user_click[2]);
        context.write(keyOut, valueOut);
    }
}

定长数组求topN的测试程序

package com.ambitfly.test001;

import java.util.Arrays;

public class TopNTest {
    public static void main(String[] args) {
        int topn = 3;
        Integer[] top3 = new Integer[topn];

        Integer[] arr = {3,4,6,8,10,5,7,20,34,9,18,14};
        for (Integer num : arr) {
            for(int i=0;i<topn;i++){
                if(top3[i]==null){
                    top3[i] = num;
                    break;
                }else{
                    if(num > top3[i]){
                        for(int j=topn-1;j>i;j--){
                            top3[j] = top3[j-1];
                        }
                        top3[i] = num;
                        break;
                    }else{
                    continue;
                    }
                }
            }
        }

        System.out.println(Arrays.toString(top3));


    }
}

UserClickTop10TwoReducer

package com.ambitfly.test001;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


import java.io.IOException;

public class UserClickTop10TwoReducer extends Reducer<Text, Text,Text,Text> {
    Text valueOut = new Text();
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        int topn = 3;
        String[] top3 = new String[topn];
        for (Text value : values) {
            int time2 = Integer.parseInt(value.toString().split(",")[1].trim());
            //jAVA基础课程,836
            for(int i=0;i<topn;i++){
                if(top3[i]==null){
                    top3[i] = value.toString();
                    break;
                }else{
                    System.out.println("=============:"+top3[i]);
                    int time1 = Integer.parseInt(top3[i].split(",")[1].trim());

                    if(time2 > time1){
                        for(int j=topn-1;j>i;j--){
                            top3[j] = top3[j-1];
                        }
                        top3[i] = value.toString();
                        break;
                    }else{
                        continue;
                    }
                }
            }
        }
        StringBuilder valueO = new StringBuilder();
        for (String value : top3) {
            valueO.append("(");
            valueO.append(value);
            valueO.append("),");
        }

        valueO.delete(valueO.length()-1,valueO.length());

        valueOut.set(valueO.toString());
        context.write(key,valueOut);

    }
}

UserClickTop10TwoDriver

package com.ambitfly.test001;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.File;
import java.io.IOException;

public class UserClickTop10TwoDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        job.setJarByClass(UserClickTop10TwoDriver.class);

        job.setMapperClass(UserClickTop10TwoMapper.class);
        job.setReducerClass(UserClickTop10TwoReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);

        FileInputFormat.addInputPath(job, new Path("E:\\code\\hadoop3\\target\\classes\\data\\test001\\output1"));

        File file = new File("E:\\code\\hadoop3\\target\\classes\\data\\test001\\output2");
        if(file.exists()){
            for (File listFile : file.listFiles()) {
                listFile.delete();
            }
            file.delete();
        }
        FileOutputFormat.setOutputPath(job, new Path("E:\\code\\hadoop3\\target\\classes\\data\\test001\\output2"));
        //7提交job
        boolean b = job.waitForCompletion(true);
        System.exit(b?0:1);
    }
}

ambitfly

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
写map-reduce，两份数据，一份是设备id+用户id，一份是设备id+点击的广告+点击时间，找出每个用户id每天3条最新的广告点击记录

题目写map-reduce，两份数据，一份是设备id+用户id，一份是设备id+点击的广告+点击时间，找出每个用户id每天3条最新的广告点击记录文件1：diviceid,userid文件2：diviceid,adname,cdate设定一个用户id有多个设备id，一个设备id不能有多个用户id思考由于要用diviceid join 然后对userid 排序，所以要进行两次map-reduce第一次map-reduce对diviceid join 得到 userid,adname,cdate第
复制链接

扫一扫

专栏目录