生产实习Day4-手机流量分析项目1（代码）

屋顶橙子味cheng

已于 2024-06-19 23:07:55 修改

阅读量252

点赞数 3

文章标签： hadoop 大数据

于 2024-06-17 14:34:35 首次发布

本文链接：https://blog.csdn.net/m0_62223331/article/details/139655005

版权

文章目录

1.Hadoop的Writable接口
编写一个表示访问记录的Java类，实现了Hadoop的Writable接口，以便在MapReduce作业中进行序列化和反序列化。它包含电话号码、上传流量、下载流量和总流量等属性，并提供了相应的构造函数和方法来操作这些属性。

package com.example;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;

public class Access implements Writable {
    private String phone;
    private long upFlow;
    private long downFlow;
    private long sumFlow;

    public Access() {}

    public Access(String phone, long upFlow, long downFlow) {
        this.phone = phone;
        this.upFlow = upFlow;
        this.downFlow = downFlow;
        this.sumFlow = upFlow + downFlow;
    }

    public void write(DataOutput out) throws IOException {
        out.writeUTF(phone);
        out.writeLong(upFlow);
        out.writeLong(downFlow);
        out.writeLong(sumFlow);
    }

    public void readFields(DataInput in) throws IOException {
        this.phone = in.readUTF();
        this.upFlow = in.readLong();
        this.downFlow = in.readLong();
        this.sumFlow = in.readLong();
    }

    @Override
    public String toString() {
        return phone + "\t" + upFlow + "\t" + downFlow + "\t" + sumFlow;
    }

    public String getPhone() {
        return phone;
    }

    public long getUpFlow() {
        return upFlow;
    }

    public long getDownFlow() {
        return downFlow;
    }

    public long getSumFlow() {
        return sumFlow;
    }
}

2.Map阶段

定义一个 Mapper 类，用于将输入的文本数据解析为键值对，并将其写入到 MapReduce 框架中的上下文中。在 map 方法中，它首先将输入的文本数据分割为字段，然后提取电话号码、上行流量和下行流量信息。最后，将这些信息封装为键值对的形式输出。

package com.example;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class FlowMapper extends Mapper<LongWritable, Text, Text, Access> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] fields = value.toString().split("\\s+");
        String phone = fields[1];
        long upFlow = Long.parseLong(fields[fields.length - 3]);
        long downFlow = Long.parseLong(fields[fields.length - 2]);

        context.write(new Text(phone), new Access(phone, upFlow, downFlow));
    }
}

3. Reduce阶段

定义一个 Reducer 类，用于对具有相同键的值进行汇总计算，并输出结果。它接受键值对，其中键是电话号码，而值是访问对象，包含上行和下行流量。在 reduce 方法中，它遍历所有值，累加它们的上行和下行流量，并计算总流量。最后，将电话号码和对应的总上行、总下行和总流量以制表符分隔的形式写入到输出上下文中。

package com.example;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class FlowReducer extends Reducer<Text, Access, NullWritable, Text> {
    @Override
    protected void reduce(Text key, Iterable<Access> values, Context context) throws IOException, InterruptedException {
        long upSum = 0;
        long downSum = 0;

        for (Access access : values) {
            upSum += access.getUpFlow();
            downSum += access.getDownFlow();
        }

        long sumFlow = upSum + downSum;
        context.write(NullWritable.get(), new Text(key.toString() + "\t" + upSum + "\t" + downSum + "\t" + sumFlow));
    }
}

4. partitioner

定义一个 Hadoop MapReduce 分区器，用于根据电话号码的开头数字将键值对分配到不同的分区中。PhonePartitioner 类继承自 Hadoop 的 Partitioner 类，重写了 getPartition 方法来定义分区逻辑。该方法接受键、值以及总的分区数作为参数，并根据电话号码的开头数字返回对应的分区号。
结果：
如果电话号码以 “13” 开头，则返回分区号 0。
如果电话号码以 “15” 开头，则返回分区号 1。
否则，返回分区号 2。

package com.example;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

public class PhonePartitioner extends Partitioner<Text, Access> {
    @Override
    public int getPartition(Text key, Access value, int numPartitions) {
        String phone = key.toString();

        if (phone.startsWith("13")) {
            return 0;
        } else if (phone.startsWith("15")) {
            return 1;
        } else {
            return 2;
        }
    }
}

5.Main

package com.example;

// Press Shift twice to open the Search Everywhere dialog and type `show whitespaces`,
// then press Enter. You can now see whitespace characters in your code.
public class Main {
    public static void main(String[] args) {
        // Press Alt+Enter with your caret at the highlighted text to see how
        // IntelliJ IDEA suggests fixing it.
        System.out.printf("Hello and welcome!");

        // Press Shift+F10 or click the green arrow button in the gutter to run the code.
        for (int i = 1; i <= 5; i++) {

            // Press Shift+F9 to start debugging your code. We have set one breakpoint
            // for you, but you can always add more by pressing Ctrl+F8.
            System.out.println("i = " + i);
        }
    }
}

6.FlowJob

编写一个Hadoop MapReduce作业代码，用于处理电话数据。它包括了配置作业的各种参数，设置Mapper和Reducer的类，指定输入和输出路径，并启动作业的执行。

package com.example;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class FlowJob {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Phone Flow Job");

        job.setJarByClass(FlowJob.class);
        job.setMapperClass(FlowMapper.class);
        job.setReducerClass(FlowReducer.class);
        job.setPartitionerClass(PhonePartitioner.class);
        job.setNumReduceTasks(3);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Access.class);
        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(Text.class);

        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

屋顶橙子味cheng

关注

3
点赞
踩
9

收藏

觉得还不错? 一键收藏
0
评论
生产实习Day4-手机流量分析项目1（代码）

该方法接受键、值以及总的分区数作为参数，并根据电话号码的开头数字返回对应的分区号。在 reduce 方法中，它遍历所有值，累加它们的上行和下行流量，并计算总流量。定义一个 Mapper 类，用于将输入的文本数据解析为键值对，并将其写入到 MapReduce 框架中的上下文中。编写一个表示访问记录的Java类，实现了Hadoop的Writable接口，以便在MapReduce作业中进行序列化和反序列化。它包括了配置作业的各种参数，设置Mapper和Reducer的类，指定输入和输出路径，并启动作业的执行。
复制链接

扫一扫