大数据电信客服-数据分析（三）

最新推荐文章于 2024-07-02 10:49:28 发布

lambda-小张

最新推荐文章于 2024-07-02 10:49:28 发布

阅读量2.9k

点赞数 1

分类专栏：大数据电信客服项目文章标签：大数据 hadoop mapreduce 1024程序员节

本文链接：https://blog.csdn.net/m0_55834564/article/details/127470903

版权

大数据电信客服项目专栏收录该内容

3 篇文章 3 订阅

订阅专栏

我们的数据已经完整的采集到了HBase集群中，这次我们需要对采集到的数据进行分析，统计出我们想要的结果。注意，在分析的过程中，我们不一定会采取一个业务指标对应一个mapreduce-job的方式，如果情景允许，我们会采取一个mapreduce分析多个业务指标的方式来进行任务。具体何时采用哪种方式，我们后续会详细探讨。

分析模块流程如图3所示：

业务指标：

a) 用户每天主叫通话个数统计，通话时间统计。

b) 用户每月通话记录统计，通话时间统计。

c) 用户之间亲密关系统计。（通话次数与通话时间体现用户亲密关系）

一、需求分析

根据需求目标，设计出下述（3.2.2）表结构。我们需要按照时间范围（年月日），结合MapReduce统计出所属时间范围内所有手机号码的通话次数总和以及通话时长总和。

思路：

a) 维度，即某个角度，某个视角，按照时间维度来统计通话，比如我想统计2017年所有月份所有日子的通话记录，那这个维度我们大概可以表述为2017年*月*日

b) 通过Mapper将数据按照不同维度聚合给Reducer

c) 通过Reducer拿到按照各个维度聚合过来的数据，进行汇总，输出

d) 根据业务需求，将Reducer的输出通过Outputformat把数据

数据输入：HBase

数据输出：Mysql

HBase中数据源结构：

标签	举例&说明
rowkey	hashregion_call1_datetime_call2_flag_duration 01_15837312345_20170527081033_13766889900_1_0180
family	f1列族：存放主叫信息 f2列族：存放被叫信息
call1	第一个手机号码
call2	第二个手机号码
date_time	通话建立的时间，例如：20171017081520
date_time_ts	date_time对应的时间戳形式
duration	通话时长(单位：秒)
flag	标记call1是主叫还是被叫（call1的身份与call2的身份互斥）

a) 已知目标，那么需要结合目标思考已有数据是否能够支撑目标实现；

b) 根据目标数据结构，构建Mysql表结构，建表；

c) 思考代码需要涉及到哪些功能模块，建立不同功能模块对应的包结构。

d) 描述数据，一定是基于某个维度（视角）的，所以构建维度类。比如按照“年”与“手机号码”的组合作为key聚合所有的数据，便可以统计这个手机号码，这一年的相关结果。

e) 自定义OutputFormat用于对接Mysql，使数据输出。

f) 创建相关工具类。

二、Mysql表结构设计

我们将分析的结果数据保存到Mysql中，以方便Web端进行查询展示。

1) 表7：db_telecom.tb_contacts

用于存放用户手机号码与联系人姓名。

表7 db_telecom.tb_contacts

列	备注	类型
id	自增主键	int(11) NOT NULL
telephone	手机号码	varchar(255) NOT NULL
name	联系人姓名	varchar(255) NOT NULL

2) 表8：db_telecom.tb_call

用于存放某个时间维度下通话次数与通话时长的总和。

表8 db_telecom.tb_call

列	备注	类型
id_date_contact	复合主键（联系人维度id，时间维度id）	varchar(255) NOT NULL
id_date_dimension	时间维度id	int(11) NOT NULL
id_contact	查询人的电话号码	int(11) NOT NULL
call_sum	通话次数总和	int(11) NOT NULL DEFAULT 0
call_duration_sum	通话时长总和	int(11) NOT NULL DEFAULT 0

3) 表9：db_telecom.tb_dimension_date

用于存放时间维度的相关数据

表9 db_telecom.tb_dimension_date

列	备注	类型
id	自增主键	int(11) NOT NULL
year	年，当前通话信息所在年	int(11) NOT NULL
month	月，当前通话信息所在月，如果按照年来统计信息，则month为-1。	int(11) NOT NULL
day	日，当前通话信息所在日，如果是按照月来统计信息，则day为-1。	int(11) NOT NULL

4) 表10：db_telecom.tb_intimacy

用于存放所有用户用户关系的结果数据。（作业中使用）

表10 db_telecom.tb_intimacy

列	备注	类型
id	自增主键	int(11) NOT NULL
intimacy_rank	好友亲密度排名	int(11) NOT NULL
id_contact1	联系人1，当前所查询人	int(11) NOT NULL
id_contact2	联系人2，与联系人为好友	int(11) NOT NULL
call_count	两联系人通话次数	int(11) NOT NULL DEFAULT 0
call_duration_count	两联系人通话持续时间	int(11) NOT NULL DEFAULT 0

三、环境准备

1) 新建module：ct_analysis

pom文件配置：

<dependencies>
        <dependency>
            <groupId>org.example</groupId>
            <artifactId>ct-common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>
        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>3.1.0</version>
        </dependency>

    </dependencies>

2) 创建包结构，根包：com.atguigu(不同颜色代表不同层级的递进)

表11

analysis	constants	utils
analysis.converter
analysis.format
analysis.kv
analysis.mapper
analysis.reducer
analysis.runner
analysis.converter.impl
analysis.kv.base
analysis.kv.impl

3) 类表

表12

类名	备注
CountDurationMapper	数据分析的Mapper类，继承自TableMapper
CountDurationReducer	数据分析的Reducer类，继承自Reduccer
CountDurationRunner	数据分析的驱动类，组装Job
MySQLOutputFormat	自定义Outputformat，对接Mysql
BaseDimension	维度（key）基类
BaseValue	值（value）基类
ComDimension	时间维度+联系人维度的组合维度
ContactDimension	联系人维度
DateDimension	时间维度
CountDurationValue	通话次数与通话时长的封装
JDBCUtil	连接Mysql的工具类
JDBCCacheBean	单例JDBCConnection
IConverter	转化接口，用于根据传入的维度对象，得到该维度对象对应的数据库主键id
DimensionConverter	IConverter实现类，负责实际的维度转id功能
LRUCache	用于缓存已知的维度id，减少对mysql的操作次数，提高效率
Constants	常量类

四、需求实现

ct.analysis

创建AnalysisData

public class AnalysisData {
    public static void main(String[] args) throws Exception {
//        int result = ToolRunner.run(new AnalysisTextTool(), args);
        int result = ToolRunner.run(new AnalysisBeanTool(), args);

    }

ct.analysis.io

创建MySQLBeanOutputFormat


import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;

import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;

/**
 * @Author 
 * @Date 2021/2/26 20:55
 * @Version 1.0
 * MySQL数据格式化输出对象
 */
public class MySQLBeanOutputFormat extends OutputFormat<AnalysisKey, AnalysisValue> {

    protected static class MySQLRecordWriter extends RecordWriter<AnalysisKey, AnalysisValue> {

        private Connection connection=null;
        Map<String,Integer> userMap=new HashMap<>();
        Map<String,Integer> dateMap=new HashMap<>();

        public MySQLRecordWriter(){
            //获取资源
            connection= JDBCUtil.getConnection();
            PreparedStatement pstat = null;
            ResultSet rs =null;
            try {
                //读取用户，时间的数据
                String queryUserSql="select id,tel from ct_user";
                pstat=connection.prepareStatement(queryUserSql);
                rs=pstat.executeQuery();
                while (rs.next()){
                    Integer id = rs.getInt(1);
                    String tel=rs.getString(2);
                    userMap.put(tel,id);
                }
                rs.close();

                String queryDateSql="select id,year,month,day from ct_date";
                pstat=connection.prepareStatement(queryDateSql);
                rs=pstat.executeQuery();
                while (rs.next()){
                    Integer id = rs.getInt(1);
                    String year=rs.getString(2);
                    String month=rs.getString(3);
                    String day=rs.getString(4);
                    if (month.length()==1){
                        month="0"+month;
                    }
                    if (day.length()==1){
                        day="0"+day;
                    }
                    dateMap.put(year+month+day,id);
                }
            }catch (Exception e){
                e.printStackTrace();
            }finally {
                if(pstat!=null){
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
                if (rs!=null){
                    try {
                        rs.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
            }



        }

        //输出数据
        @Override
        public void write(AnalysisKey key, AnalysisValue value) throws IOException, InterruptedException {

            PreparedStatement pstat=null;
            try {
                String insertSQL="insert into ct_call(telid,dateid,sumcall,sumduration) values(?,?,?,?)";
                pstat=connection.prepareStatement(insertSQL);

                pstat.setInt(1,userMap.get(key.getTel()));
                pstat.setInt(2,dateMap.get(key.getDate()));
                pstat.setInt(3,Integer.parseInt(value.getSumCall()));
                pstat.setInt(4,Integer.parseInt(value.getSumDuration()));
                pstat.executeUpdate();
            } catch (SQLException e) {
                e.printStackTrace();
            }finally {

                if (pstat!=null){
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }


            }
        }

        //释放资源
        @Override
        public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
            if (connection!=null){
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }

        }

    }

    @Override
    public RecordWriter<AnalysisKey, AnalysisValue> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        return new MySQLRecordWriter();
    }

    @Override
    public void checkOutputSpecs(JobContext jobContext) throws IOException, InterruptedException {

    }


    private FileOutputCommitter committer = null;

    public static Path getOutputPath(JobContext job) {
        String name = job.getConfiguration().get("mapreduce.output.fileoutputformat.outputdir");
        return name == null ? null : new Path(name);
    }

    @Override
    public OutputCommitter getOutputCommitter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        if (this.committer == null) {
            Path output = getOutputPath(taskAttemptContext);
            this.committer = new FileOutputCommitter(output, taskAttemptContext);
        }

        return this.committer;
    }
}

创建 MySQLRedisTextOutputFormat


import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import redis.clients.jedis.Jedis;

import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;

/**
 * @Author 
 * @Date 2021/2/26 20:55
 * @Version 1.0
 * MySQL数据格式化输出对象
 */
public class MySQLRedisTextOutputFormat extends OutputFormat<Text,Text> {

    protected static class MySQLRecordWriter extends RecordWriter<Text, Text> {

        private Connection connection=null;
        Jedis jedis=null;

        public MySQLRecordWriter(){
            //获取资源
            connection= JDBCUtil.getConnection();
            Jedis jedis = new Jedis("192.168.86.101", 6379);
            jedis.auth("123456");
        }

        //输出数据
        @Override
        public void write(Text key, Text value) throws IOException, InterruptedException {

            String[] values=value.toString().split("_");
            String sumCall=values[0];
            String sumDuration=values[1];
            PreparedStatement pstat=null;
            try {
                String insertSQL="insert into ct_call(telid,dateid,sumcall,duration) values(?,?,?,?)";
                pstat=connection.prepareStatement(insertSQL);
                String k = key.toString();
                String[] ks=k.split("_");

                String tel=ks[0];
                String date=ks[1];

                pstat.setInt(1,Integer.parseInt(jedis.hget("ct_user",tel)));
                pstat.setInt(2,Integer.parseInt(jedis.hget("ct_date",date)));
                pstat.setInt(3,Integer.parseInt(sumCall));
                pstat.setInt(4,Integer.parseInt(sumDuration));
                pstat.executeUpdate();
            } catch (SQLException e) {
                e.printStackTrace();
            }finally {
                if (pstat!=null){
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }


            }
        }

        //释放资源
        @Override
        public void close(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
            if (connection!=null){
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if (jedis!=null){
                jedis.close();
            }

        }

    }

    @Override
    public RecordWriter<Text, Text> getRecordWriter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        return new MySQLRecordWriter();
    }

    @Override
    public void checkOutputSpecs(JobContext jobContext) throws IOException, InterruptedException {

    }


    private FileOutputCommitter committer = null;

    public static Path getOutputPath(JobContext job) {
        String name = job.getConfiguration().get("mapreduce.output.fileoutputformat.outputdir");
        return name == null ? null : new Path(name);
    }

    @Override
    public OutputCommitter getOutputCommitter(TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException {
        if (this.committer == null) {
            Path output = getOutputPath(taskAttemptContext);
            this.committer = new FileOutputCommitter(output, taskAttemptContext);
        }

        return this.committer;
    }
}

创建 MySQLTextOutputFormat


import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


import java.io.IOException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;

/**
 * MySQL数据格式化输入对象
 */
public class MySQLTextOutputFormat extends OutputFormat<Text, Text> {

    protected static class MySQLRecordWriter extends RecordWriter<Text, Text> {

        private Connection connection = null;
        Map<String, Integer> userMap = new HashMap<String, Integer>();
        Map<String, Integer> dateMap = new HashMap<String, Integer>();

        public MySQLRecordWriter() {
            // 获取资源
            connection = JDBCUtil.getConnection();
            PreparedStatement pstat = null;
            ResultSet rs = null;

            try {

                String queryUserSql = "select id, tel from ct_user";
                pstat = connection.prepareStatement(queryUserSql);
                rs = pstat.executeQuery();
                while ( rs.next() ) {
                    Integer id = rs.getInt(1);
                    String tel = rs.getString(2);
                    userMap.put(tel, id);
                }

                rs.close();

                String queryDateSql = "select id, year, month, day from ct_date";
                pstat = connection.prepareStatement(queryDateSql);
                rs = pstat.executeQuery();
                while ( rs.next() ) {
                    Integer id = rs.getInt(1);
                    String year = rs.getString(2);
                    String month = rs.getString(3);
                    if ( month.length() == 1 ) {
                        month = "0" + month;
                    }
                    String day = rs.getString(4);
                    if ( day.length() == 1 ) {
                        day = "0" + day;
                    }
                    dateMap.put(year + month + day, id);
                }

            } catch (Exception e) {
                e.printStackTrace();
            } finally {
                if ( rs != null ) {
                    try {
                        rs.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
                if ( pstat != null ) {
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
            }
        }

        /**
         * 输出数据
         * @param key
         * @param value
         * @throws IOException
         * @throws InterruptedException
         */
        public void write(Text key, Text value) throws IOException, InterruptedException {

            String[] values = value.toString().split("_");
            String sumCall = values[0];
            String sumDuration = values[1];

            PreparedStatement pstat = null;
            try {
                String insertSQL = "insert into ct_call ( telid, dateid, sumcall, sumduration ) values ( ?, ?, ?, ? )";
                pstat = connection.prepareStatement(insertSQL);

                String k = key.toString();
                String[] ks = k.split("_");

                String tel = ks[0];
                String date = ks[1];

                pstat.setInt(1, userMap.get(tel));
                pstat.setInt(2, dateMap.get(date));
                pstat.setInt(3, Integer.parseInt(sumCall) );
                pstat.setInt(4, Integer.parseInt(sumDuration));
                pstat.executeUpdate();
            } catch (SQLException e) {
                e.printStackTrace();
            } finally {
                if ( pstat != null ) {
                    try {
                        pstat.close();
                    } catch (SQLException e) {
                        e.printStackTrace();
                    }
                }
            }
        }

        /**
         * 释放资源
         * @param context
         * @throws IOException
         * @throws InterruptedException
         */
        public void close(TaskAttemptContext context) throws IOException, InterruptedException {
            if ( connection != null ) {
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
        }
    }

    public RecordWriter<Text, Text> getRecordWriter(TaskAttemptContext context) throws IOException, InterruptedException {
        return new MySQLRecordWriter();
    }

    public void checkOutputSpecs(JobContext context) throws IOException, InterruptedException {

    }
    private FileOutputCommitter committer = null;
    public static Path getOutputPath(JobContext job) {
        String name = job.getConfiguration().get(FileOutputFormat.OUTDIR);
        return name == null ? null: new Path(name);
    }
    public OutputCommitter getOutputCommitter(TaskAttemptContext context) throws IOException, InterruptedException {
        if (committer == null) {
            Path output = getOutputPath(context);
            committer = new FileOutputCommitter(output, context);
        }
        return committer;
    }
}

ct.analysis.kv

创建AnalysisKey



import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * 自定义分析数据Key
 */
public class AnalysisKey implements WritableComparable<AnalysisKey> {

    public AnalysisKey() {

    }

    public AnalysisKey( String tel, String date ) {
        this.tel = tel;
        this.date = date;
    }

    private String tel;
    private String date;
    private String tel2;

    public String getTel() {
        return tel;
    }

    public void setTel(String tel) {
        this.tel = tel;
    }

    public String getDate() {
        return date;
    }

    public void setDate(String date) {
        this.date = date;
    }

    /**
     * 比较:tel, date
     * @param key
     * @return
     */
    public int compareTo(AnalysisKey key) {
        int result = tel.compareTo(key.getTel());

        if ( result == 0 ) {
            result = date.compareTo(key.getDate());
        }

        return result;
    }

    /**
     * 写数据
     * @param out
     * @throws IOException
     */
    public void write(DataOutput out) throws IOException {
        out.writeUTF(tel);
        out.writeUTF(date);
    }

    /**
     * 读数据
     * @param in
     * @throws IOException
     */
    public void readFields(DataInput in) throws IOException {
        tel = in.readUTF();
        date = in.readUTF();
    }
}

创建 AnalysisValue


import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class AnalysisValue implements Writable {

    private String sumCall;
    private String sumDuration;

    public String getSumCall() {
        return sumCall;
    }

    public void setSumCall(String sumCall) {
        this.sumCall = sumCall;
    }

    public String getSumDuration() {
        return sumDuration;
    }

    public void setSumDuration(String sumDuration) {
        this.sumDuration = sumDuration;
    }

    public AnalysisValue(){}

    public AnalysisValue(String sumCall, String sumDuration){
        this.sumCall=sumCall;
        this.sumDuration=sumDuration;
    }

    //序列化
    @Override
    public void write(DataOutput dataOutput) throws IOException {
        dataOutput.writeUTF(sumCall);
        dataOutput.writeUTF(sumDuration);
    }

    //反序列化
    @Override
    public void readFields(DataInput dataInput) throws IOException {
        sumCall=dataInput.readUTF();
        sumDuration=dataInput.readUTF();
    }
}

ct.analysis.mapper

创建AnalysisBeanMapper


import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;

import java.io.IOException;

public class AnalysisBeanMapper extends TableMapper<AnalysisKey, Text> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {

        // 5_19154926260_20180802160747_16574556259_0054_1
        String rowkey= Bytes.toString(key.get());
        String[] values=rowkey.split("_");

        String call1=values[1];
        String call2=values[3];
        String callTime=values[2];
        String duration=values[4];

        String year=callTime.substring(0,4);
        String month=callTime.substring(0,6);
        String day=callTime.substring(0,8);

        //主叫用户-年
        context.write(new AnalysisKey(call1,year),new Text(duration));
        //主叫用户-月
        context.write(new AnalysisKey(call1,month),new Text(duration));
        //主叫用户-日
        context.write(new AnalysisKey(call1,day),new Text(duration));

        //被叫用户-年
        context.write(new AnalysisKey(call2,year),new Text(duration));
        //被叫用户-月
        context.write(new AnalysisKey(call2,month),new Text(duration));
        //被叫用户-日
        context.write(new AnalysisKey(call2,day),new Text(duration));

    }
}

创建 AnalysisTextMapper



import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
/**
 * @program: IntelliJ IDEA
 * @description: 分析数据Mapper
 * @author:
 * @create: 2022-10-23 10:36
 */
public class AnalysisTextMapper extends TableMapper<Text, Text> {
    @Override
    protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {

        String rowkey = Bytes.toString(key.get());
        // 5_19154926260_20180802160747_16574556259_0054_1

        String[] values = rowkey.split("_");

        String call1 = values[1];
        String call2 = values[3];
        String calltime = values[2];
        String duration = values[4];

        String year = calltime.substring(0, 4);
        String month = calltime.substring(0, 6);
        String date = calltime.substring(0, 8);

        // 主叫用户 - 年
        context.write(new Text(call1 + "_" + year), new Text(duration));
        // 主叫用户 - 月
        context.write(new Text(call1 + "_" + month), new Text(duration));
        // 主叫用户 - 日
        context.write(new Text(call1 + "_" + date), new Text(duration));

        // 被叫用户 - 年
        context.write(new Text(call2 + "_" + year), new Text(duration));
        // 被叫用户 - 月
        context.write(new Text(call2 + "_" + month), new Text(duration));
        // 被叫用户 - 日
        context.write(new Text(call2 + "_" + date), new Text(duration));
    }
}

ct.analysis.reducer

创建AnalysisBeanReducer


import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * @Author
 * @Date 2021/2/26 19:18
 * @Version 1.0
 * 分析数据的reducer
 */
public class AnalysisBeanReducer extends Reducer<AnalysisKey, Text, AnalysisKey, AnalysisValue> {
    @Override
    protected void reduce(AnalysisKey key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

        int sumCall = 0;
        int sumDuration = 0;
        for (Text value : values) {
            int duration = Integer.parseInt(value.toString());
            sumDuration = sumDuration + duration;

            sumCall++;
        }

        context.write(key, new AnalysisValue(""+sumCall,""+sumDuration));
    }
}

创建 AnalysisTextReducer



import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

/**
 * 分析数据Reducer
 */
public class AnalysisTextReducer extends Reducer<Text, Text, Text, Text >{
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {


        int call_sum = 0;
        int call_duration_sum = 0;

        for (Text value : values) {
            int duration = Integer.parseInt(value.toString());
            call_duration_sum = call_duration_sum + duration;

            call_sum++;
        }

        context.write(key, new Text(call_sum + "_" + call_duration_sum));
    }
}

ct.analysis.tool

创建AnalysisBeanTool


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.util.Tool;

/**
 * @Author 
 * @Date 2021/2/26 16:51
 * @Version 1.0
 * 分析数据的工具类
 */
public class AnalysisBeanTool implements Tool {
    @Override
    public int run(String[] strings) throws Exception {

        Job job = Job.getInstance();
        job.setJarByClass(AnalysisBeanTool.class);

        Scan scan =new Scan();
        //只要一半的数据（主叫），避免重复
        scan.addFamily(Bytes.toBytes(Names.CF_CALLER.getValue()));

        //mapper
        TableMapReduceUtil.initTableMapperJob(
                Names.TABLE.getValue(),
                scan,
                AnalysisBeanMapper.class,
                AnalysisKey.class,
                Text.class,
                job
        );

        //reducer
        job.setReducerClass(AnalysisBeanReducer.class);
        job.setOutputKeyClass(AnalysisKey.class);
        job.setOutputValueClass(AnalysisValue.class);

        //outputformat
        job.setOutputFormatClass(MySQLBeanOutputFormat.class);
        boolean flg = job.waitForCompletion(true);

        if (flg){
            return JobStatus.State.SUCCEEDED.getValue();
        }else {
            return JobStatus.State.FAILED.getValue();
        }

    }

    @Override
    public void setConf(Configuration configuration) {

    }

    @Override
    public Configuration getConf() {
        return null;
    }
}

创建 AnalysisTextTool


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.JobStatus;
import org.apache.hadoop.util.Tool;

/**
 * @Author 
 * @Date 2021/2/26 16:51
 * @Version 1.0
 * 分析数据的工具类
 */
public class AnalysisTextTool implements Tool {
    @Override
    public int run(String[] strings) throws Exception {

        Job job = Job.getInstance();
        job.setJarByClass(AnalysisTextTool.class);

        Scan scan =new Scan();
        //只要一半的数据（主叫），避免重复
        scan.addFamily(Bytes.toBytes(Names.CF_CALLER.getValue()));

        //mapper
        TableMapReduceUtil.initTableMapperJob(
                Names.TABLE.getValue(),
                scan,
                AnalysisBeanMapper.class,
                Text.class,
                Text.class,
                job
        );

        //reducer
        job.setReducerClass(AnalysisBeanReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        //outputformat
        job.setOutputFormatClass(MySQLBeanOutputFormat.class);
        boolean flg = job.waitForCompletion(true);

        if (flg){
            return JobStatus.State.SUCCEEDED.getValue();
        }else {
            return JobStatus.State.FAILED.getValue();
        }

//        return 0;
    }

    @Override
    public void setConf(Configuration configuration) {

    }

    @Override
    public Configuration getConf() {
        return null;
    }
}

在 ct.common.util

修改JDBCUtil


import java.sql.Connection;
import java.sql.DriverManager;

public class JDBCUtil {

    private static final String MYSQL_DRIVER_CLASS = "com.mysql.jdbc.Driver";
    private static final String MYSQL_URL = "jdbc:mysql://hadoop01:3306/ct221023?useUnicode=true&characterEncoding=UTF-8";
    private static final String MYSQL_USERNAME = "root";
    private static final String MYSQL_PASSWORD = "123456";

    public static Connection getConnection(){
        Connection conn=null;
        try{
            Class.forName(MYSQL_DRIVER_CLASS);
            conn= DriverManager.getConnection(MYSQL_URL,MYSQL_USERNAME,MYSQL_PASSWORD);
        }catch (Exception e){
            e.printStackTrace();
        }
        return conn;

    }
}

创建项目ct.cache

pom.xml

 <dependencies>
        <dependency>
            <groupId>redis.clients</groupId>
            <artifactId>jedis</artifactId>
            <version>3.1.0</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.46</version>
        </dependency>
        <dependency>
            <groupId>org.example</groupId>
            <artifactId>ct-common</artifactId>
            <version>1.0-SNAPSHOT</version>
        </dependency>
    </dependencies>

创建Bootstrap

package ct.cache;


import com.sun.tools.doclets.formats.html.SourceToHTMLConverter;
import redis.clients.jedis.Jedis;

import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

/**
 * 启动缓存客户端，向redis中增加缓存数据
 */
public class Bootstrap {
    public static void main(String[] args) {

        // 读取Mysql中的数据
        Map<String, Integer> userMap = new HashMap<String, Integer>();
        Map<String, Integer> dateMap = new HashMap<String, Integer>();

        // 读取用户，时间数据

        Connection connection = null;
        PreparedStatement pstat = null;
        ResultSet rs = null;

        try {

            connection = JDBCUtil.getConnection();

            String queryUserSql = "select id, tel from ct_user";
            pstat = connection.prepareStatement(queryUserSql);
            rs = pstat.executeQuery();
            while ( rs.next() ) {
                Integer id = rs.getInt(1);
                String tel = rs.getString(2);
                userMap.put(tel, id);
            }

            rs.close();

            String queryDateSql = "select id, year, month, day from ct_date";
            pstat = connection.prepareStatement(queryDateSql);
            rs = pstat.executeQuery();
            while ( rs.next() ) {
                Integer id = rs.getInt(1);
                String year = rs.getString(2);
                String month = rs.getString(3);
                if ( month.length() == 1 ) {
                    month = "0" + month;
                }
                String day = rs.getString(4);
                if ( day.length() == 1 ) {
                    day = "0" + day;
                }
                dateMap.put(year + month + day, id);
            }

        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if ( rs != null ) {
                try {
                    rs.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if ( pstat != null ) {
                try {
                    pstat.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
            if ( connection != null ) {
                try {
                    connection.close();
                } catch (SQLException e) {
                    e.printStackTrace();
                }
            }
        }

//        System.out.println(userMap.size());
//        System.out.println(dateMap.size());

        // 向redis中存储数据
        Jedis jedis = new Jedis("192.168.86.101", 6379);
        jedis.auth("123456");

        Iterator<String> keyIterator = userMap.keySet().iterator();
        while ( keyIterator.hasNext() ) {
            String key = keyIterator.next();
            Integer value = userMap.get(key);
            jedis.hset("ct_user", key, "" + value);
        }

        keyIterator = dateMap.keySet().iterator();
        while ( keyIterator.hasNext() ) {
            String key = keyIterator.next();
            Integer value = dateMap.get(key);
            jedis.hset("ct_date", key, "" + value);
        }

    }
}

五、运行测试

1) 将mysql驱动包放入到hadoop根目录的lib目录下

$ cp -a /opt/module/mysql-libs/mysql-connector-java-5.1.27/mysql-connector-java-5.1.27-bin.jar ./lib

2) 提交任务

$ bin/yarn jar ~/call/jars/ct_analysis-1.0-SNAPSHOT.jar

com.atguigu.analysis.runner.CountDurationRunner -libjars ./lib/*

观察Mysql中的结果。

lambda-小张

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
打赏
1
评论
大数据电信客服-数据分析（三）

我们的数据已经完整的采集到了HBase集群中，这次我们需要对采集到的数据进行分析，统计出我们想要的结果。注意，在分析的过程中，我们不一定会采取一个业务指标对应一个mapreduce-job的方式，如果情景允许，我们会采取一个mapreduce分析多个业务指标的方式来进行任务。具体何时采用哪种方式，我们后续会详细探讨。a) 用户每天主叫通话个数统计，通话时间统计。b) 用户每月通话记录统计，通话时间统计。c) 用户之间亲密关系统计。（通话次数与通话时间体现用户亲密关系）
复制链接

扫一扫