mr编程实现手机流量统计和读取MySQL数据_查询手机号码总流量用mysq表l查询-CSDN博客

本文链接：https://blog.csdn.net/xwcj3/article/details/104012997

一、MapReduce编程规范

Mapper阶段
    extends Mapper<LongWritable, Text, Text, IntWritable>
        LongWritable, Text 输入K1V1
        Text, IntWritable 输出K2V2
    map
        处理我们的业务逻辑
        MapTask中运行的
        每个K1V1执行一次

Reducer阶段
    extends Reducer<Text, IntWritable, Text, IntWritable>
        Text, IntWritable：Reducer的输入数据类型 = Mapper的输出类型
        Text, IntWritable：Reducer的输出类型
    reduce
        完成业务逻辑操作
        是在ReduceTask中执行
        相同的key对应调用一次

Driver
    组装
        设置运行的主类
        Mapper/Reducer
        Map的输出KV类型
        Reduce的输出KV类型
        输入输出：输出一定要配置，不能事先存在

二、序列化

XXXWritable、Text：序列化对象
序列化：Hadoop、Spark、Flink
内存中的对象 ==> 字节数组
反序列化：字节数组 ==> 内存中的对象
分布式计算框架里，是需要序列化/反序列化网络传输

Java：
Hadoop自定义序列化实现 Writable
紧凑速度扩展互操作

需求：Key Value是普通的Hadoop build-in支持不了
==> 自定义序列化类

public interface WritableComparable extends Writable, Comparable {

Writable 自定义序列化类的顶层类
每个手机号上下行流量和以及总流量
==>
手机号：第二个字段
上行流量：倒数第三个字段
下行流量：倒数第二个字段
总流量：上行+下行

Mapper： <LongWritable, Text, Text，Access>
Access：phone，up，down，sum

Reducer：<Text，Access,NullWritable,Access>

java.lang.NoSuchMethodException:
com.ruozedata.bigdata.hadoop.mapreduce.ser.Access.()
构造器的问题，要求要有一个无参的构造器

自定义序列化类的实现步骤

1）implements Writable
2）留一个无参构造
3）write和readFields
4）写出去的字段顺序和读进来的字段顺序必须一致
5）可选的：toString

int maps = writeSplits(job, submitJobDir);

InputFormat<?, ?> input =
ReflectionUtils.newInstance(job.getInputFormatClass(), conf);
已经明确知道现在使用的InputFormat
在MapReduce里面如果你想读数据一定要用到InputFormat
只不过是InputFormat的某个子类

InputSplit 被一个Mapper处理等同于Block
200M ==> 128M + 72M

long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job));
1
long maxSize = getMaxSplitSize(job);
Long.MAX_VALUE

isSplitable：你的输入文件是否能被切分

200M ==> 128 72 ==> 2MapTasks
200M ==> 1MapTasks

long blockSize = file.getBlockSize(); // 128M
long splitSize = computeSplitSize(blockSize, minSize, maxSize);
return Math.max(minSize, Math.min(maxSize, blockSize));
max(1, min(Long.MAX_VALUE, 128M))
==> max(1, 128M)
==> 128M

long bytesRemaining = length; 150M
while (((double) bytesRemaining)/splitSize > SPLIT_SLOP)
150M/128M
SPLIT_SLOP = 1.1 10%

129M block：？ InputSplit=1

128M
…
…|…

FileInputFormat.setInputPaths(job, new Path(input));
input：可以是文件，也可以是文件夹
遍历文件夹==>Path==>BlockLocation <== 你输入的数据在哪个节点上
拿到文件，对应的大小
long splitSize = computeSplitSize(blockSize, minSize, maxSize);

Math.max(minSize, Math.min(maxSize, blockSize));
        max(1, min(Long.MAX_VALUE, 128M))

maxSize > blockSize
但是 别动

100M 128BLOCK 100 28

TextInputFormat extends FileInputFormat {
isSplitable
RecordReader<LongWritable, Text> createRecordReader
}

TextInputFormat是FileInputFormat的实现类
按照行进行数据的读取
K:LongWritable 该行数据在整个文件中的offset
V:Text 该行数据的内容

MySQLReadDriver
一定是可以在本地运行的
是不能在服务器运行的！！！

本地：mysql驱动

==> mysql驱动传到服务器上去

MySQLReadDriverV2这个版本生产上推荐的
extends Configured implements Tool
把mysql jar加载到hadoop能访问的到的路径 *****

流量统计

package com.ccj.pxj.phone;

import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class Access implements Writable {
    private  String phone;
    private  long up;
    private  long down;
    private  long sum;

    public String getPhone() {
        return phone;
    }

    public void setPhone(String phone) {
        this.phone = phone;
    }

    public long getUp() {
        return up;
    }

    public void setUp(long up) {
        this.up = up;
    }

    public long getDown() {
        return down;
    }

    public void setDown(long down) {
        this.down = down;
    }

    public Access(String phone, long up, long down) {
        this.phone = phone;
        this.up = up;
        this.down = down;
        this.sum=up+down;
    }

    public Access() {
    }

    @Override
    public void write(DataOutput out) throws IOException {
            out.writeUTF(phone);
            out.writeLong(up);
            out.writeLong(down);
            out.writeLong(sum);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
         this.phone= in.readUTF();
         this.up= in.readLong();
         this.down=in.readLong();
         this.sum=in.readLong();
    }

    @Override
    public String toString() {
        return
                phone + '\t' +
                 up +
                "\t" + down +
                "\t" + sum ;
    }
}
package com.ccj.pxj.phone;

import com.ccj.pxj.phone.utils.FileUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class SerDriver {
    public static void main(String[] args) throws Exception {
        String input = "data";
        String output = "out";

        // 1）获取Job对象
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);

        FileUtils.deleteOutput(configuration, output);

        // 2）本job对应要执行的主类是哪个
        job.setJarByClass(SerDriver.class);

        // 3）设置Mapper和Reducer
        job.setMapperClass(MyMaper .class);
        job.setReducerClass(MyReduce.class);

        // 4）设置Mapper阶段输出数据的类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Access.class);

        // 5）设置Reducer阶段输出数据的类型
        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(Access.class);

        // 6）设置输入和输出路径
        FileInputFormat.setInputPaths(job, new Path(input));
        FileOutputFormat.setOutputPath(job, new Path(output));

        // 7）提交作业
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);
    }

   public static class  MyMaper extends Mapper<LongWritable, Text,Text,Access>{
       @Override
       protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
           String[] data = value.toString().split("\t");
           String phone = data[1];
           // 上行流量
           long up = Long.parseLong(data[data.length - 3]);

           // 下行流量
           long down = Long.parseLong(data[data.length - 2]);
           context.write(new Text(phone),new Access(phone,up,down));

       }
   }
   public  static  class MyReduce extends Reducer<Text,Access, NullWritable,Access>{
       @Override
       protected void reduce(Text key, Iterable<Access> values, Context context) throws IOException, InterruptedException {
            long ups=0;
            long downs=0;

           for (Access value : values) {
               ups+=value.getUp();
               downs+=value.getDown();
           }
           context.write(NullWritable.get(),new Access(key.toString(),ups,downs));
       }
   }
}

MySQL

package com.ccj.wfy.mysql.mr;

import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.lib.db.DBWritable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

public class DeptWritable implements DBWritable, Writable {
    private int deptno;
    private String dname;
    private String loc;

    public int getDeptno() {
        return deptno;
    }

    public void setDeptno(int deptno) {
        this.deptno = deptno;
    }

    public String getDname() {
        return dname;
    }

    public void setDname(String dname) {
        this.dname = dname;
    }

    public String getLoc() {
        return loc;
    }

    public DeptWritable() {
    }

    public DeptWritable(int deptno, String dname, String loc) {
        this.deptno = deptno;
        this.dname = dname;
        this.loc = loc;
    }

    public void setLoc(String loc) {
        this.loc = loc;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(deptno);
        out.writeUTF(dname);
        out.writeUTF(loc);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
      this.deptno= in.readInt();
      this.dname=in.readUTF();
      this.loc=in.readUTF();
    }

    @Override
    public void write(PreparedStatement statement) throws SQLException {
        statement.setInt(1,deptno);
        statement.setString(2,dname);
        statement.setString(3,loc);
    }

    @Override
    public void readFields(ResultSet result) throws SQLException {
        deptno=result.getInt(1);
        dname=result.getString(2);
        loc=result.getString(3);
    }

    @Override
    public String toString() {
        return
                 deptno +
                "\t" + dname +
                 loc
                ;
    }
}

package com.ccj.wfy.mysql.mr;
import com.ccj.pxj.phone.utils.FileUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class MySQLReadDriver  {
    public static void main(String[] args) throws Exception {
        String output = "out";

        // 1）获取Job对象
        Configuration configuration = new Configuration();
//        configuration.set(DBConfiguration.DRIVER_CLASS_PROPERTY, "com.mysql.jdbc.Driver");
        DBConfiguration.configureDB(configuration, "com.mysql.jdbc.Driver", "jdbc:mysql://localhost:3306/mrtest", "root", "");

        Job job = Job.getInstance(configuration);
        FileUtils.deleteOutput(configuration, output);

        // 2）本job对应要执行的主类是哪个
        job.setJarByClass(MySQLReadDriver.class);

        // 3）设置Mapper和Reducer
        job.setMapperClass(MyMapper.class);

        // 4）设置Mapper阶段输出数据的类型
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(DeptWritable.class);

        // 6）设置输入和输出路径
        String[] fields = {"deptno", "dname", "loc"};
        DBInputFormat.setInput(job, DeptWritable.class, "dept", null, null, fields);

        FileOutputFormat.setOutputPath(job, new Path(output));

        // 7）提交作业
        boolean result = job.waitForCompletion(true);
        System.exit(result ? 0 : 1);

    }

    public static class MyMapper extends Mapper<LongWritable, DeptWritable, NullWritable, DeptWritable> {

        @Override
        protected void map(LongWritable key, DeptWritable value, Context context) throws IOException, InterruptedException {
            context.write(NullWritable.get(), value);
        }
    }
}
package com.ccj.wfy.mysql.mr;

import com.ccj.pxj.phone.utils.FileUtils;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.ToolRunner;


import java.io.IOException;
public class MySQLReadDriverV2 extends Configured implements Tool {
    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        int run = ToolRunner.run(configuration, new MySQLReadDriverV2(), args);
        System.exit(run);
    }
    @Override
    public int run(String[] strings) throws Exception {
        String output = "out1";

        // 1）获取Job对象
        Configuration configuration = super.getConf();
//        configuration.set(DBConfiguration.DRIVER_CLASS_PROPERTY, "com.mysql.jdbc.Driver");
        DBConfiguration.configureDB(configuration, "com.mysql.jdbc.Driver", "jdbc:mysql://localhost:3306/mrtest", "root", "");

        Job job = Job.getInstance(configuration);
        FileUtils.deleteOutput(configuration, output);

        // 2）本job对应要执行的主类是哪个
        job.setJarByClass(MySQLReadDriverV2.class);

        // 3）设置Mapper和Reducer
        job.setMapperClass(MyMapper.class);

        // 4）设置Mapper阶段输出数据的类型
        job.setMapOutputKeyClass(NullWritable.class);
        job.setMapOutputValueClass(DeptWritable.class);

        // 6）设置输入和输出路径
        String[] fields = {"deptno", "dname", "loc"};
        DBInputFormat.setInput(job, DeptWritable.class, "dept", null, null, fields);

        FileOutputFormat.setOutputPath(job, new Path(output));

        // 7）提交作业
        boolean result = job.waitForCompletion(true);
        return 1;
    }
    public static class MyMapper extends Mapper<LongWritable, DeptWritable, NullWritable, DeptWritable> {

        @Override
        protected void map(LongWritable key, DeptWritable value, Context context) throws IOException, InterruptedException {
            context.write(NullWritable.get(), value);
        }
    }
}

pom配置

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.cc.pxj.wfy</groupId>
    <artifactId>phoneWcRuoZe</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <hadoop.version>2.6.0-cdh5.16.2</hadoop.version>
    </properties>
    <repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>

    <dependencies>
        <!-- 添加Hadoop依赖 -->
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>${hadoop.version}</version>
        </dependency>


        <!-- https://mvnrepository.com/artifact/junit/junit -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.11</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.17</version>
        </dependency>

    </dependencies>

    <build>
        <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
            <plugins>
                <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
                <plugin>
                    <artifactId>maven-clean-plugin</artifactId>
                    <version>3.1.0</version>
                </plugin>
                <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
                <plugin>
                    <artifactId>maven-resources-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.8.0</version>
                </plugin>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.22.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-jar-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-install-plugin</artifactId>
                    <version>2.5.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-deploy-plugin</artifactId>
                    <version>2.8.2</version>
                </plugin>
                <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
                <plugin>
                    <artifactId>maven-site-plugin</artifactId>
                    <version>3.7.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-project-info-reports-plugin</artifactId>
                    <version>3.0.0</version>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>
</project>

补充MySQL的上传服务器的操作：

[pxj@pxj /home/pxj/lib]$export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/home/pxj/app/hive-1.1.0-cdh5.16.2/lib/mysql-connector-java-5.1.27-bin.jar
[pxj@pxj /home/pxj/app/hive-1.1.0-cdh5.16.2/lib]$hadoop  jar /home/pxj/lib/phoneWcRuoZe-1.0-SNAPSHOT.jar  com.ccj.wfy.mysql.mr.MySQLReadDriverV2 -libjars ~/lib/mysql-connector-java-5.1.27-bin.jar

作者：pxj（潘陈）
日期：2020-01-07 1:41:32