[博学谷学习记录] 超强总结,用心分享|陌陌综合案例

注:大家觉得博客好的话,别忘了点赞收藏呀,本人每周都会更新关于人工智能和大数据相关的内容,内容多为原创,Python Java Scala SQL 代码,CV NLP 推荐系统等,Spark Flink Kafka Hbase Hive Flume等等~写的都是纯干货,各种顶会的论文解读,一起进步。
今天继续和大家分享一下陌陌综合案例
#博学谷IT学习技术支持


前言

在这里插入图片描述
这是一个陌陌真是综合案例
项目架构如图所示
1.离线部分:flume+kafka+HBase+Hive/phoneix
2:实时部分:flume+kafka+flink+MySQL+FineBI


一、apache flume

flume目前是apache旗下的一款顶级开源项目, 最初是有cloudera公司开发的, 后期贡献给apache, flume是一款专门用于数据数据采集的工作, 主要的目的将数据从一端传输的另一端操作
整个flume启动后, 就是一个agent实例对象, 而一个agent实例对象一般有三大组件组成:

    1. source组件: 数据源 主要用于对接数据源, 从数据源中采集数据 flume提供多种source组件
    1. sink组件: 下沉地(目的地) 主要用于将数据源采集过来数据通过sink下沉具体的目的中 flume提供多种sink组件
    1. channel组件: 管道 主要起到缓存的作用, 从source将数据写入到channel从, sink从channel获取数据, 然后继续下沉即可, flume提供多种channel组件
  • 在这里插入图片描述
    采集的需求:
    监听 /export/data/momo_data/MOMO_DATA.dat 此文件, 一旦这个文件中有新的内容出现, 将对应数据写入到Kafka中, 同时还支持未来的扩展需要, 要求既能监听文件, 在未来也可以扩展监听目录
vim momo_tailDir_kafka.conf

添加以下内容:
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /export/data/flume/taildir_position.json
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /export/data/momo_data/MOMO_DATA.dat
a1.sources.r1.maxBatchCount = 10

a1.channels.c1.type = memory

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = MOMO_MSG
a1.sinks.k1.kafka.bootstrap.servers = node1.itcast.cn:9092,node2.itcast.cn:9092,node3.itcast.cn:9092
a1.sinks.k1.kafka.flumeBatchSize = 10
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

二、kafka

flume采集的数据作为kafka的生产者。消费者1为HBASE,消费者2为FLINK。
在Kafka中创建 MOMO_MSG 的Topic

创建Topic
./kafka-topics.sh --create --zookeeper node1:2181,node2:2181,node3:2181 --topic MOMO_MSG --partitions 6 --replication-factor 2

启动Flume组件, 准备进行数据采集工作

启动Flume
cd /export/server/flume/bin
./flume-ng agent -n a1  -c ../conf  -f ../conf/momo_tailDir_kafka.conf  -Dflume.root.logger=INFO,console 

测试是否正常采集数据

cd /export/server/kafka/bin/
./kafka-console-consumer.sh --bootstrap-server node1:9092,node2:9092,node3:9092 --topic MOMO_MSG

三、陌陌案例_接收消息, 写入到HBase

1.在HBase中创建表

create_namespace 'MOMO_CHAT'
create 'MOMO_CHAT:MOMO_MSG',{NAME=>'C1',COMPRESSION=>'GZ'},{NUMREGIONS=>6,SPLITALGO=>'HexStringSplit'}

2.陌陌的rowkey设计

MD5HASH_发件人账户_收件人账户_消息时间(时间戳)

3.创建maven的项目 添加相关的依赖:

    <repositories><!--代码库-->
        <repository>
            <id>aliyun</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
            <releases><enabled>true</enabled></releases>
            <snapshots>
                <enabled>false</enabled>
                <updatePolicy>never</updatePolicy>
            </snapshots>
        </repository>
    </repositories>


    <dependencies>

        <!--Hbase 客户端-->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>2.1.0</version>
        </dependency>
        <!--kafka 客户端-->
        <dependency>
            <groupId>org.apache.kafka</groupId>
            <artifactId>kafka-clients</artifactId>
            <version>2.4.1</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.1</version>
                <configuration>
                    <target>1.8</target>
                    <source>1.8</source>
                </configuration>
            </plugin>
        </plugins>
    </build>

代码实现:

package com.itheima.momo_chat;

import org.apache.commons.lang.text.StrBuilder;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Table;
import org.apache.hadoop.hbase.util.MD5Hash;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.text.SimpleDateFormat;
import java.time.Duration;
import java.util.Arrays;
import java.util.Properties;

public class MomoChatConsumerToHBase {

    private static Connection hbaseConn;
    private static Table table;

    static{
        try {
            // 2.1 根据hbase的连接工厂类 创建hbase的连接对象
            Configuration conf = HBaseConfiguration.create();
            conf.set("hbase.zookeeper.quorum","node1:2181,node2:2181,node3:2181");
            hbaseConn = ConnectionFactory.createConnection(conf);

            // 2.2 获取Hbase的管理类对象: admin / table
            table = hbaseConn.getTable(TableName.valueOf("MOMO_CHAT:MOMO_MSG"));
        }catch (Exception e){
            e.printStackTrace();
        }
    }

    public static void main(String[] args) throws Exception {

        //1. 接收Kafka中消息数据: topic 为 MOMO_MSG
        //1.1 创建Kafka的消费者核心类对象
        Properties props = new Properties();
        props.setProperty("bootstrap.servers","node1:9092,node2:9092,node3:9092");
        props.setProperty("group.id","MOMO_G1");
        props.setProperty("enable.auto.commit","true");
        props.setProperty("auto.commit.interval.ms","1000");
        props.setProperty("key.deserializer","org.apache.kafka.common.serialization.StringDeserializer");
        props.setProperty("value.deserializer","org.apache.kafka.common.serialization.StringDeserializer");

        KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);

        //1.2 设置监听Topic
        consumer.subscribe(Arrays.asList("MOMO_MSG"));

        // 1.3 从Kafka中获取消息数据
        while(true){

            ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofSeconds(1));

            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {

                String msg = consumerRecord.value();

                System.out.println(msg);

                //2. 写入HBase
                if(msg != null && !"".equals(msg.trim()) && msg.split("\001").length == 20){


                    // 2.3 执行相关的操作: 写入数据
                    // 2.3.1 生成rowkey的数据: MD5HASH_发件人账户_收件人账户_消息时间(时间戳)
                    byte[] rowkey = getRowkey(msg);

                    // 2.3.2 封装一行数据
                    String[] fields = msg.split("\001");

                    Put put = new Put(rowkey);

                    put.addColumn("C1".getBytes(),"msg_time".getBytes(),fields[0].getBytes());
                    put.addColumn("C1".getBytes(),"sender_nickyname".getBytes(),fields[1].getBytes());
                    put.addColumn("C1".getBytes(),"sender_account".getBytes(),fields[2].getBytes());
                    put.addColumn("C1".getBytes(),"sender_sex".getBytes(),fields[3].getBytes());
                    put.addColumn("C1".getBytes(),"sender_ip".getBytes(),fields[4].getBytes());
                    put.addColumn("C1".getBytes(),"sender_os".getBytes(),fields[5].getBytes());
                    put.addColumn("C1".getBytes(),"sender_phone_type".getBytes(),fields[6].getBytes());
                    put.addColumn("C1".getBytes(),"sender_network".getBytes(),fields[7].getBytes());
                    put.addColumn("C1".getBytes(),"sender_gps".getBytes(),fields[8].getBytes());
                    put.addColumn("C1".getBytes(),"receiver_nickyname".getBytes(),fields[9].getBytes());
                    put.addColumn("C1".getBytes(),"receiver_ip".getBytes(),fields[10].getBytes());
                    put.addColumn("C1".getBytes(),"receiver_account".getBytes(),fields[11].getBytes());
                    put.addColumn("C1".getBytes(),"receiver_os".getBytes(),fields[12].getBytes());
                    put.addColumn("C1".getBytes(),"receiver_phone_type".getBytes(),fields[13].getBytes());
                    put.addColumn("C1".getBytes(),"receiver_network".getBytes(),fields[14].getBytes());
                    put.addColumn("C1".getBytes(),"receiver_gps".getBytes(),fields[15].getBytes());
                    put.addColumn("C1".getBytes(),"receiver_sex".getBytes(),fields[16].getBytes());
                    put.addColumn("C1".getBytes(),"msg_type".getBytes(),fields[17].getBytes());
                    put.addColumn("C1".getBytes(),"distance".getBytes(),fields[18].getBytes());
                    put.addColumn("C1".getBytes(),"message".getBytes(),fields[19].getBytes());

                    table.put(put);
                }

            }

        }

    }
    private static SimpleDateFormat format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    // 此方法用于生成rowkey数据: MD5HASH_发件人账户_收件人账户_消息时间(时间戳)
    private static byte[] getRowkey(String msg) throws Exception{

        //1. 切割数据
        String[] fields = msg.split("\001");

        //2. 获取 发件人账户, 收件人账户  消息时间
        String msgTime = fields[0];
        String sender_account = fields[2];
        String receiver_account = fields[11];

        // 3- 拼接rowkey数据
        //生成 MD5HASH
        String md5Hash = MD5Hash.getMD5AsHex((sender_account+"_"+receiver_account).getBytes()).substring(0,8);

        // 将时间转换为时间戳
        long time = format.parse(msgTime).getTime();

        return (md5Hash+"_"+sender_account+"_"+receiver_account +"_"+time).getBytes();
    }
}

四、陌陌案例_对接Phoenix

-- 创建视图

create view MOMO_CHAT.MOMO_MSG(
    "id" varchar primary key,
    C1."msg_time" varchar,
    C1."sender_nickyname" varchar,
    C1."sender_account" varchar,
    C1."sender_sex" varchar,
    C1."sender_ip" varchar,
    C1."sender_os" varchar,
    C1."sender_phone_type" varchar,
    C1."sender_network" varchar,
    C1."sender_gps" varchar,
    C1."receiver_nickyname" varchar,
    C1."receiver_ip" varchar,
    C1."receiver_account" varchar,
    C1."receiver_os" varchar,
    C1."receiver_phone_type" varchar,
    C1."receiver_network" varchar,
    C1."receiver_gps" varchar,
    C1."receiver_sex" varchar,
    C1."msg_type" varchar,
    C1."distance" varchar,
    C1."message" varchar
);

五、陌陌案例_对接HIVE

create database if not exists MOMO_CHAT;
use MOMO_CHAT;
create external table MOMO_CHAT.MOMO_MSG (
    id string,
    msg_time string,
    sender_nickyname string,
    sender_account string,
    sender_sex string,
    sender_ip string,
    sender_os string,
    sender_phone_type string,
    sender_network string,
    sender_gps string,
    receiver_nickyname string,
    receiver_ip string,
    receiver_account string,
    receiver_os string,
    receiver_phone_type string,
    receiver_network string,
    receiver_gps string,
    receiver_sex string,
    msg_type string,
    distance string,
    message string
)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
with serdeproperties('hbase.columns.mapping'=':key,C1:msg_time,
C1:sender_nickyname,
C1:sender_account,
C1:sender_sex,
C1:sender_ip,
C1:sender_os,
C1:sender_phone_type,
C1:sender_network,
C1:sender_gps,
C1:receiver_nickyname,
C1:receiver_ip,
C1:receiver_account,
C1:receiver_os,
C1:receiver_phone_type,
C1:receiver_network,
C1:receiver_gps,
C1:receiver_sex,
C1:msg_type,
C1:distance,
C1:message')
tblproperties('hbase.table.name'='MOMO_CHAT:MOMO_MSG');

六、陌陌案例_基于Flink 进行实时统计计算

1- 实时统计总消息量
2- 实时统计各个地区发送消息总量
3- 实时统计各个地区接收消息总量
4- 实时统计各个客户发送的消息总量
5- 实时统计各个客户接收的消息总量
在这里插入图片描述

1.创建maven的项目 添加相关的依赖:

		<dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-runtime-web_2.11</artifactId>
            <version>1.10.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-shaded-hadoop-2-uber</artifactId>
            <version>2.7.5-10.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-kafka_2.11</artifactId>
            <version>1.10.0</version>
        </dependency>
		<dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.4</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.10.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.11</artifactId>
            <version>1.10.0</version>
        </dependency>

        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.36</version>
        </dependency>
        
        <dependency>
            <groupId>com.alibaba</groupId>
            <artifactId>fastjson</artifactId>
            <version>1.2.41</version>
        </dependency>

2.封装pojo类:

package com.itheima.momo_chat.pojo;

public class MoMoCountBean {

    private Integer id ;
    private  Long moMoTotalCount ;
    private String moMoProvince ;
    private String moMoUsername ;
    private Long moMo_MsgCount ;
    private String groupType ;

    public String getGroupType() {
        return groupType;
    }

    public void setGroupType(String groupType) {
        this.groupType = groupType;
    }

    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

    public Long getMoMoTotalCount() {
        return moMoTotalCount;
    }

    public void setMoMoTotalCount(Long moMoTotalCount) {
        this.moMoTotalCount = moMoTotalCount;
    }

    public String getMoMoProvince() {
        return moMoProvince;
    }

    public void setMoMoProvince(String moMoProvince) {
        this.moMoProvince = moMoProvince;
    }

    public String getMoMoUsername() {
        return moMoUsername;
    }

    public void setMoMoUsername(String moMoUsername) {
        this.moMoUsername = moMoUsername;
    }

    public Long getMoMo_MsgCount() {
        return moMo_MsgCount;
    }

    public void setMoMo_MsgCount(Long moMo_MsgCount) {
        this.moMo_MsgCount = moMo_MsgCount;
    }
}

3.封装写入MySQL数据库的类:

package com.itheima.momo_chat.steam;


import com.itheima.momo_chat.pojo.MoMoCountBean;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.functions.sink.RichSinkFunction;

import java.sql.*;


public class MysqlSink extends RichSinkFunction<MoMoCountBean> {
    private Statement stat;
    private Connection connection;

    //private String sql;
    private String status;

    public MysqlSink() {
    }

    public MysqlSink(String status) {
        this.status = status;
    }

    /**
     * open() 方法中建立连接,这样不用每次 invoke 的时候都要建立连接和释放连接
     *
     * @param parameters
     * @throws Exception
     */
    @Override
    public void open(Configuration parameters) throws Exception {
        super.open(parameters);
        connection = getConnection();

        stat = connection.createStatement();
    }

    @Override
    public void close() throws Exception {
        super.close();
        //关闭连接和释放资源
        if (connection != null) {
            connection.close();
        }
        if (stat != null) {
            stat.close();
        }
    }

    /**
     * 每条数据的插入都要调用一次 invoke() 方法
     *
     * @param value
     * @param context
     * @throws Exception
     */
    @Override
    public void invoke(MoMoCountBean value, Context context) throws Exception {

        if(status.equals("1")){
            String sql = "select * from momo_count where momo_groupType = '1'";
            ResultSet resultSet = stat.executeQuery(sql);
            boolean flag = resultSet.next();

            if(flag) {
                sql = "update momo_count set momo_totalcount= '"+value.getMoMoTotalCount()+ "' where momo_groupType = '1'";
            }else {
                sql = "insert into momo_count( momo_totalcount,momo_groupType) values ("+value.getMoMoTotalCount()+",'1') ";
            }
            stat.executeUpdate(sql);


        }else if (status.equals("2")){
            String sql = "select * from momo_count where momo_groupType = '2' and momo_province= '"+value.getMoMoProvince()+"' ";
            ResultSet resultSet = stat.executeQuery(sql);
            boolean flag = resultSet.next();

            if(flag) {
                sql = "update momo_count set momo_msgcount= '"+value.getMoMo_MsgCount()+ "' where momo_groupType = '2' and momo_province= '"+value.getMoMoProvince()+"' ";
            }else {
                sql = "insert into momo_count( momo_province,momo_msgcount,momo_groupType) values ('"+value.getMoMoProvince()+"',"+value.getMoMo_MsgCount()+",'2') ";
            }
            stat.executeUpdate(sql);
        }else if (status.equals("3")){
            String sql = "select * from momo_count where momo_groupType = '3' and momo_province= '"+value.getMoMoProvince()+"' ";
            ResultSet resultSet = stat.executeQuery(sql);
            boolean flag = resultSet.next();

            if(flag) {
                sql = "update momo_count set momo_msgcount= '"+value.getMoMo_MsgCount()+ "' where momo_groupType = '3' and momo_province= '"+value.getMoMoProvince()+"' ";
            }else {
                sql = "insert into momo_count( momo_province,momo_msgcount,momo_groupType) values ('"+value.getMoMoProvince()+"',"+value.getMoMo_MsgCount()+",'3') ";
            }
            stat.executeUpdate(sql);

        }else if (status.equals("4")){
            String sql = "select * from momo_count where momo_groupType = '4' and momo_username= '"+value.getMoMoUsername()+"' ";
            ResultSet resultSet = stat.executeQuery(sql);
            boolean flag = resultSet.next();

            if(flag) {
                sql = "update momo_count set momo_msgcount= '"+value.getMoMo_MsgCount()+ "' where momo_groupType = '4' and momo_username= '"+value.getMoMoUsername()+"' ";
            }else {
                sql = "insert into momo_count( momo_username,momo_msgcount,momo_groupType) values ('"+value.getMoMoUsername()+"',"+value.getMoMo_MsgCount()+",'4') ";
            }
            stat.executeUpdate(sql);


        }else if (status.equals("5")){

            String sql = "select * from momo_count where momo_groupType = '5' and momo_username= '"+value.getMoMoUsername()+"' ";
            ResultSet resultSet = stat.executeQuery(sql);
            boolean flag = resultSet.next();

            if(flag) {
                sql = "update momo_count set momo_msgcount= '"+value.getMoMo_MsgCount()+ "' where momo_groupType = '5' and momo_username= '"+value.getMoMoUsername()+"' ";
            }else {
                sql = "insert into momo_count( momo_username,momo_msgcount,momo_groupType) values ('"+value.getMoMoUsername()+"',"+value.getMoMo_MsgCount()+",'5') ";
            }
            stat.executeUpdate(sql);

        }

    }

    private static Connection getConnection() {
        Connection con = null;
        try {
            Class.forName("com.mysql.jdbc.Driver");
            con = DriverManager.getConnection("jdbc:mysql://node1:3306/momo?useUnicode=true&characterEncoding=UTF-8", "root", "123456");
        } catch (Exception e) {
            System.out.println("-----------mysql get connection has exception , msg = "+ e.getMessage());
        }
        return con;
    }
}

4.代码实现,写入MySQL数据库

package com.itheima.momo_chat.steam;
import com.itheima.momo_chat.pojo.MoMoCountBean;
import com.itheima.momo_chat.utils.HttpClientUtils;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import java.util.Properties;
public class MomoFlinkSteam {
    public static void main(String[] args) throws Exception {
        //1. 创建Flink核心类环境类对象
        StreamExecutionEnvironment env =
        StreamExecutionEnvironment.getExecutionEnvironment();
        //2. 设置三大组件
        // 2.1 设置Source组件
        Properties props = new Properties();
        props.setProperty("bootstrap.servers","node1:9092,node2:9092,node3:9092");
        props.setProperty("group.id","MOMO_G2");
        FlinkKafkaConsumer<String> kafkaConsumer = new FlinkKafkaConsumer<String>("MOMO_MSG", new
        SimpleStringSchema(), props);
        DataStreamSource<String> streamSource = env.addSource(kafkaConsumer);

        // 2.2 设置转换组件, 对消息数据进行实时处理操作
        //需求一: 实时统计总消息量
        totalMsgCount(streamSource);

        // 需求二: 实时统计各个地区发送的消息量
        totalProvinceSenderMsgCount(streamSource);

        // 需求三: 实时统计各个地区接受的消息量
        totalProvinceReceiverMsgCount(streamSource);

        // 需求四: 时统计各个用户发送的消息量
        totalUserSenderMsgCount(streamSource);

        // 需求五: 时统计各个用户接收的消息量
        totalUserReceiverMsgCount(streamSource);

        //3. 启动Flink程序
        env.execute("FlinkMoMo");
    }


    private static void totalMsgCount(DataStreamSource<String> streamSource) {
        SingleOutputStreamOperator<Tuple1<Long>> streamOperator = streamSource.map(new MapFunction<String, Tuple1<Long>>() {
            @Override
            public Tuple1<Long> map(String msg) throws Exception {
                return new Tuple1<>(1L);
            }
        }).keyBy(0).sum(0);

        SingleOutputStreamOperator<MoMoCountBean> operator = streamOperator.map(new MapFunction<Tuple1<Long>, MoMoCountBean>() {
            @Override
            public MoMoCountBean map(Tuple1<Long> tuple1) throws Exception {
                Long totalMsgCount = tuple1.f0;
                MoMoCountBean moMoCountBean = new MoMoCountBean();
                moMoCountBean.setMoMoTotalCount(totalMsgCount);
                return moMoCountBean;
            }
        });

        // 2.3 设置Sink组件,将数据进行输出到MYSQL
        operator.addSink(new MysqlSink("1"));
    }

    private static void totalProvinceSenderMsgCount(DataStreamSource<String> streamSource) {
        SingleOutputStreamOperator<String> filterOperator = streamSource.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String msg) throws Exception {
                return msg != null && !"".equals(msg.trim()) && msg.split("\001").length == 20;
            }
        });
        SingleOutputStreamOperator<Tuple2<String, Long>> sumOperator = filterOperator.map(new MapFunction<String, Tuple2<String, Long>>() {
            @Override
            public Tuple2<String, Long> map(String msg) throws Exception {
                String[] fields = msg.split("\001");
                String[] latAndLng = fields[8].split(",");
                String lng = latAndLng[0].trim();
                String lat = latAndLng[1].trim();

                String province = HttpClientUtils.findByLatAndLng(lat, lng);

                return new Tuple2<>(province, 1L);
            }
        }).keyBy(0).sum(1);

        SingleOutputStreamOperator<MoMoCountBean> operator = sumOperator.map(new MapFunction<Tuple2<String, Long>, MoMoCountBean>() {
            @Override
            public MoMoCountBean map(Tuple2<String, Long> tuple2) throws Exception {
                String province = tuple2.f0;
                Long msgCount = tuple2.f1;
                MoMoCountBean moMoCountBean = new MoMoCountBean();
                moMoCountBean.setMoMoProvince(province);
                moMoCountBean.setMoMo_MsgCount(msgCount);
                return moMoCountBean;
            }
        });

        operator.addSink(new MysqlSink("2"));
    }

    private static void totalProvinceReceiverMsgCount(DataStreamSource<String> streamSource) {
        SingleOutputStreamOperator<String> filterOperator = streamSource.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String msg) throws Exception {
                return msg != null && !"".equals(msg.trim()) && msg.split("\001").length == 20;
            }
        });
        SingleOutputStreamOperator<Tuple2<String, Long>> sumOperator = filterOperator.map(new MapFunction<String, Tuple2<String, Long>>() {
            @Override
            public Tuple2<String, Long> map(String msg) throws Exception {
                String[] fields = msg.split("\001");
                String[] latAndLng = fields[15].split(",");
                String lng = latAndLng[0].trim();
                String lat = latAndLng[1].trim();

                String province = HttpClientUtils.findByLatAndLng(lat, lng);

                return new Tuple2<>(province, 1L);
            }
        }).keyBy(0).sum(1);

        SingleOutputStreamOperator<MoMoCountBean> operator = sumOperator.map(new MapFunction<Tuple2<String, Long>, MoMoCountBean>() {
            @Override
            public MoMoCountBean map(Tuple2<String, Long> tuple2) throws Exception {
                String province = tuple2.f0;
                Long msgCount = tuple2.f1;
                MoMoCountBean moMoCountBean = new MoMoCountBean();
                moMoCountBean.setMoMoProvince(province);
                moMoCountBean.setMoMo_MsgCount(msgCount);
                return moMoCountBean;
            }
        });

        operator.addSink(new MysqlSink("3"));
    }

    private static void totalUserSenderMsgCount(DataStreamSource<String> streamSource) {
        SingleOutputStreamOperator<String> filterOperator = streamSource.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String msg) throws Exception {
                return msg != null && !"".equals(msg.trim()) && msg.split("\001").length == 20;
            }
        });
        SingleOutputStreamOperator<Tuple2<String, Long>> sumOperator = filterOperator.map(new MapFunction<String, Tuple2<String, Long>>() {
            @Override
            public Tuple2<String, Long> map(String msg) throws Exception {
                String[] fields = msg.split("\001");
                String senderName = fields[1];

                return new Tuple2<>(senderName, 1L);
            }
        }).keyBy(0).sum(1);

        SingleOutputStreamOperator<MoMoCountBean> operator = sumOperator.map(new MapFunction<Tuple2<String, Long>, MoMoCountBean>() {
            @Override
            public MoMoCountBean map(Tuple2<String, Long> tuple2) throws Exception {
                String senderName = tuple2.f0;
                Long msgCount = tuple2.f1;
                MoMoCountBean moMoCountBean = new MoMoCountBean();
                moMoCountBean.setMoMoUsername(senderName);
                moMoCountBean.setMoMo_MsgCount(msgCount);
                return moMoCountBean;
            }
        });

        operator.addSink(new MysqlSink("4"));
    }

    private static void totalUserReceiverMsgCount(DataStreamSource<String> streamSource) {
        SingleOutputStreamOperator<String> filterOperator = streamSource.filter(new FilterFunction<String>() {
            @Override
            public boolean filter(String msg) throws Exception {
                return msg != null && !"".equals(msg.trim()) && msg.split("\001").length == 20;
            }
        });
        SingleOutputStreamOperator<Tuple2<String, Long>> sumOperator = filterOperator.map(new MapFunction<String, Tuple2<String, Long>>() {
            @Override
            public Tuple2<String, Long> map(String msg) throws Exception {
                String[] fields = msg.split("\001");
                String receiverName = fields[9];

                return new Tuple2<>(receiverName, 1L);
            }
        }).keyBy(0).sum(1);

        SingleOutputStreamOperator<MoMoCountBean> operator = sumOperator.map(new MapFunction<Tuple2<String, Long>, MoMoCountBean>() {
            @Override
            public MoMoCountBean map(Tuple2<String, Long> tuple2) throws Exception {
                String receiverName = tuple2.f0;
                Long msgCount = tuple2.f1;
                MoMoCountBean moMoCountBean = new MoMoCountBean();
                moMoCountBean.setMoMoUsername(receiverName);
                moMoCountBean.setMoMo_MsgCount(msgCount);
                return moMoCountBean;
            }
        });

        operator.addSink(new MysqlSink("5"));
    }
}

七、陌陌案例_MySQL数据

再第六步之前先再MySQL创建5各需求所对应的表

CREATE DATABASE /*!32312 IF NOT EXISTS*/`momo` /*!40100 DEFAULT CHARACTER SET utf8mb4 */;

USE `momo`;

/*Table structure for table `momo_count` */

CREATE TABLE `momo_count` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `momo_totalcount` bigint(20) DEFAULT '0' COMMENT '总消息量',
  `momo_province` varchar(20) DEFAULT '-1' COMMENT '省份',
  `momo_username` varchar(20) DEFAULT '-1' COMMENT '用户名',
  `momo_msgcount` bigint(20) DEFAULT '0' COMMENT '消息量',
  `momo_grouptype` varchar(20) DEFAULT '-1' COMMENT '统计类型:1 总消息量 2 各省份发送量 3 各省份接收量 4 各用户发送量 5各用户接收量',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

在这里插入图片描述

九、FineBI集成实时功能

最后将MySQL的数据集成到FineBI做实时化的展示即可。
在这里插入图片描述


总结

这是一个从数据采集到最后前端展示的一个案例。
1.离线部分:flume+kafka+HBase+Hive/phoneix
2:实时部分:flume+kafka+flink+MySQL+FineBI

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值