生产实习Day4-手机流量分析项目1(代码)

1.Hadoop的Writable接口
编写一个表示访问记录的Java类,实现了Hadoop的Writable接口,以便在MapReduce作业中进行序列化和反序列化。它包含电话号码、上传流量、下载流量和总流量等属性,并提供了相应的构造函数和方法来操作这些属性。

package com.example;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.Writable;

public class Access implements Writable {
    private String phone;
    private long upFlow;
    private long downFlow;
    private long sumFlow;

    public Access() {}

    public Access(String phone, long upFlow, long downFlow) {
        this.phone = phone;
        this.upFlow = upFlow;
        this.downFlow = downFlow;
        this.sumFlow = upFlow + downFlow;
    }

    public void write(DataOutput out) throws IOException {
        out.writeUTF(phone);
        out.writeLong(upFlow);
        out.writeLong(downFlow);
        out.writeLong(sumFlow);
    }

    public void readFields(DataInput in) throws IOException {
        this.phone = in.readUTF();
        this.upFlow = in.readLong();
        this.downFlow = in.readLong();
        this.sumFlow = in.readLong();
    }

    @Override
    public String toString() {
        return phone + "\t" + upFlow + "\t" + downFlow + "\t" + sumFlow;
    }

    public String getPhone() {
        return phone;
    }

    public long getUpFlow() {
        return upFlow;
    }

    public long getDownFlow() {
        return downFlow;
    }

    public long getSumFlow() {
        return sumFlow;
    }
}

2.Map阶段

定义一个 Mapper 类,用于将输入的文本数据解析为键值对,并将其写入到 MapReduce 框架中的上下文中。在 map 方法中,它首先将输入的文本数据分割为字段,然后提取电话号码、上行流量和下行流量信息。最后,将这些信息封装为键值对的形式输出。

package com.example;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class FlowMapper extends Mapper<LongWritable, Text, Text, Access> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] fields = value.toString().split("\\s+");
        String phone = fields[1];
        long upFlow = Long.parseLong(fields[fields.length - 3]);
        long downFlow = Long.parseLong(fields[fields.length - 2]);

        context.write(new Text(phone), new Access(phone, upFlow, downFlow));
    }
}

3. Reduce阶段

定义一个 Reducer 类,用于对具有相同键的值进行汇总计算,并输出结果。它接受键值对,其中键是电话号码,而值是访问对象,包含上行和下行流量。在 reduce 方法中,它遍历所有值,累加它们的上行和下行流量,并计算总流量。最后,将电话号码和对应的总上行、总下行和总流量以制表符分隔的形式写入到输出上下文中。

package com.example;

import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class FlowReducer extends Reducer<Text, Access, NullWritable, Text> {
    @Override
    protected void reduce(Text key, Iterable<Access> values, Context context) throws IOException, InterruptedException {
        long upSum = 0;
        long downSum = 0;

        for (Access access : values) {
            upSum += access.getUpFlow();
            downSum += access.getDownFlow();
        }

        long sumFlow = upSum + downSum;
        context.write(NullWritable.get(), new Text(key.toString() + "\t" + upSum + "\t" + downSum + "\t" + sumFlow));
    }
}

4. partitioner

定义一个 Hadoop MapReduce 分区器,用于根据电话号码的开头数字将键值对分配到不同的分区中。PhonePartitioner 类继承自 Hadoop 的 Partitioner 类,重写了 getPartition 方法来定义分区逻辑。该方法接受键、值以及总的分区数作为参数,并根据电话号码的开头数字返回对应的分区号。
结果:
如果电话号码以 “13” 开头,则返回分区号 0。
如果电话号码以 “15” 开头,则返回分区号 1。
否则,返回分区号 2。

package com.example;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;

public class PhonePartitioner extends Partitioner<Text, Access> {
    @Override
    public int getPartition(Text key, Access value, int numPartitions) {
        String phone = key.toString();

        if (phone.startsWith("13")) {
            return 0;
        } else if (phone.startsWith("15")) {
            return 1;
        } else {
            return 2;
        }
    }
}

5.Main

`

package com.example;

// Press Shift twice to open the Search Everywhere dialog and type `show whitespaces`,
// then press Enter. You can now see whitespace characters in your code.
public class Main {
    public static void main(String[] args) {
        // Press Alt+Enter with your caret at the highlighted text to see how
        // IntelliJ IDEA suggests fixing it.
        System.out.printf("Hello and welcome!");

        // Press Shift+F10 or click the green arrow button in the gutter to run the code.
        for (int i = 1; i <= 5; i++) {

            // Press Shift+F9 to start debugging your code. We have set one breakpoint
            // for you, but you can always add more by pressing Ctrl+F8.
            System.out.println("i = " + i);
        }
    }
}

6.FlowJob

编写一个Hadoop MapReduce作业代码,用于处理电话数据。它包括了配置作业的各种参数,设置Mapper和Reducer的类,指定输入和输出路径,并启动作业的执行。

package com.example;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class FlowJob {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Phone Flow Job");

        job.setJarByClass(FlowJob.class);
        job.setMapperClass(FlowMapper.class);
        job.setReducerClass(FlowReducer.class);
        job.setPartitionerClass(PhonePartitioner.class);
        job.setNumReduceTasks(3);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Access.class);
        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(Text.class);

        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

  • 3
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值