Flink实时数据的预处理-架构-分主题写入Hdfs(文本和Parquet格式)

一、架构图

  • ① flume将埋点日志采集到kafka中
  • ② 从kafka中拉取数据,完成数据的过滤、维度的关联、主题的拆分
  • ③ 在关联地理位置信息时,将查询的存入本地Redis中,下次优先从本地中查
  • ④通过测流输出,再分主题写回至kafka
  • ④ 将分好主题的数据存储至HDFS中(文本和parquet)

在这里插入图片描述

二、数据及思路

2019-02-27 15:42:53,473 --> {"u":{"id":"1P6Ri2apN","account":"0wsvag","email":"wuuDQD@T3fLe.com","phoneNbr":"66230495244","birthday":"1998-06-6","isRegistered":true,"isLogin":true,"addr":"YzObIHBTS","gender":"F","phone":{"imei":"7ErIg1SkfBwTbWoq","osName":"macos","osVer":"10.05","resolution":"1366*768","androidId":"","manufacture":"apple","deviceId":"nperIB0T"},"app":{"appid":"com.51doit.mall","appVer":"1.8.2","release_ch":"应用超市","promotion_ch":"腾讯"},"loc":{"areacode":341182103,"longtitude":118.39359635566875,"latitude":32.662502568345516,"carrier":"中国移动","netType":"WIFI","cid_sn":"SivrSjJ89dOk","ip":"30.243.144.92"},"sessionId":"fvJpyZ6jZlrS"},"logType":"pgview","commit_time":1551253373473,"event":{"pgid":"10-01-32847660","title":"包教包会3入门","skuid":"32847660","url":"/a/b/32847660.html"}}  
2019-02-27 15:42:53,382 --> {"u":{"id":"","account":"4p40VJ","email":"","phoneNbr":"","birthday":"","isRegistered":false,"isLogin":false,"addr":"","gender":"","phone":{"imei":"HNrN2J8qHMQsVJYJ","osName":"android","osVer":"10.0","resolution":"1356*768","androidId":"1q3gb1BKNcVv","manufacture":"联想","deviceId":""},"app":{"appid":"com.51doit.mall","appVer":"2.2.8","release_ch":"安软市场","promotion_ch":"百度"},"loc":{"areacode":220403,"longtitude":125.15014857862096,"latitude":42.98636494637781,"carrier":"中国移动","netType":"4G","cid_sn":"KXAAEuG4HRsX","ip":"109.21.229.238"},"sessionId":"k54ORGX5ITeS"},"logType":"favor","commit_time":1551253373382,"event":{"skuid":"32847658"}}  
  1. 从kafka拉取数据 topic:homework
  2. 写方法、将数据转换成json 并转换成 bean 然后关联地理位置维度(RichMapFunction open close )
  3. 使用侧流输出 打上标签 :流量Tag 活动Tag
活动Tag:.
        LogType:act_join 活动加入
流量Tag:
		LogType:pgview浏览  search搜索 thumpup点赞  click_ad收藏  favor关注  addcart添加购物车 
  1. 拿出两个测流,然后将bean转换成json
  2. 将json写回到kafka去 两个topic:flow 和 act 放入配置文件
  3. 将数据存储在HDFS中

三、代码

1、主线代码

程序的功能:实时的ETL(拉取、过滤、关联维度、筛选字段、字段脱敏、数据拆分、转换格式或类型),并使用侧流输出实现数据拆分,写入存储系统中。

  • (1)使用RichMapFunction关联维度数据(可以使用异步IO进行优化)
  • (2)将数据使用测流(旁路)输出进行查分
3)将数据在写回到Kafka
  	  - Kafka吞吐量高,并且可以保证ExactlyOnce
  	  -【FlinkKafkaProducer实现了分两个阶段提交,继承了一个类TwoPhaseCommitSinkFunction】
  	  - 实现了两个接口:CheckpointedFunction和CheckpointListener,可以保证Checkpoint成功在提交事物,如果checkpoint失败将事物回滚
  • (4)将主流的是还可以写入到HDFS、可以使用BulkSink,以Parquet格式写入

  • 全局配置文件 conf.properties
checkpoint.interval=30000
bootstrap.servers=node-1.51doit.cn:9092,node-2.51doit.cn:9092,node-3.51doit.cn:9092
group.id=g10
auto.offset.reset=earliest
kafka.topics=wordcount
import cn._51doit.flink.day06.FlinkUtils;
import cn._51doit.flink.day06.KafkaStringSerializationSchema;
import com.alibaba.fastjson.JSON;
import org.apache.flink.api.common.functions.FilterFunction;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.common.serialization.SimpleStringEncoder;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.core.fs.Path;
import org.apache.flink.formats.parquet.avro.ParquetAvroWriters;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.functions.ProcessFunction;
import org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink;
import org.apache.flink.streaming.api.functions.sink.filesystem.bucketassigners.DateTimeBucketAssigner;
import org.apache.flink.streaming.api.functions.sink.filesystem.rollingpolicies.DefaultRollingPolicy;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;
import org.apache.flink.util.Collector;
import org.apache.flink.util.OutputTag;
import java.time.ZoneId;
import java.util.Properties;
import java.util.concurrent.TimeUnit;

public class PreETLAndTopicSplit {

    public static void main(String[] args) throws Exception{

        ParameterTool parameters = ParameterTool.fromPropertiesFile(args[0]);

        DataStream<String> lines = FlinkUtils.createKafkaStream(parameters, SimpleStringSchema.class);

        //设置全局的参数
        FlinkUtils.env.getConfig().setGlobalJobParameters(parameters);

        SingleOutputStreamOperator<LogBean> beanDataStream = lines.map(new ToJSONMapFunction());

        SingleOutputStreamOperator<LogBean> filteredStream = beanDataStream.filter(new FilterFunction<LogBean>() {
            @Override
            public boolean filter(LogBean bean) throws Exception {
                return bean != null;
            }
        });

        //将数据进行拆分
        //原来使用split方式,再select,现在使用侧流输出
        //流量的Tag
        OutputTag<LogBean> flowOutputTag = new OutputTag<LogBean>("flow-output") {};

        //活动的Tag
        OutputTag<LogBean> activityOutputTag = new OutputTag<LogBean>("activity-output") {};


        SingleOutputStreamOperator<LogBean> mainStream = filteredStream.process(new ProcessFunction<LogBean, LogBean>() {

            @Override
            public void processElement(LogBean bean, Context ctx, Collector<LogBean> out) throws Exception {

                //根据数据所携带的具有类型进行判断
                String logType = bean.getLogType();

                if (logType.startsWith("act")) {
                    //打上标签
                    ctx.output(activityOutputTag, bean);
                } else {
                    //流量的类型
                    //打上标签
                    ctx.output(flowOutputTag, bean);
                }
                //没有打标签的也输出
                out.collect(bean);
            }
        });

        DataStream<LogBean> flowStream = mainStream.getSideOutput(flowOutputTag);

        DataStream<LogBean> activityStream = mainStream.getSideOutput(activityOutputTag);

        String flowTopic = parameters.getRequired("flow.topic");

        Properties properties = parameters.getProperties();

        FlinkKafkaProducer<String> kafkaProducer1 = new FlinkKafkaProducer<String>(
                flowTopic, //指定topic
                new KafkaStringSerializationSchema(flowTopic), //指定写入Kafka的序列化Schema
                properties, //指定Kafka的相关参数
                FlinkKafkaProducer.Semantic.EXACTLY_ONCE //指定写入Kafka为EXACTLY_ONCE语义
        );

        //将数据存储转换成JSON字符串在写到Kafka
        flowStream.map(new MapFunction<LogBean, String>() {
            @Override
            public String map(LogBean bean) throws Exception {
                return JSON.toJSONString(bean);
            }
        }).addSink(kafkaProducer1);

        String activityTopic = parameters.getRequired("activity.topic");

        FlinkKafkaProducer<String> kafkaProducer2 = new FlinkKafkaProducer<String>(
                activityTopic, //指定topic
                new KafkaStringSerializationSchema(activityTopic), //指定写入Kafka的序列化Schema
                properties, //指定Kafka的相关参数
                FlinkKafkaProducer.Semantic.EXACTLY_ONCE //指定写入Kafka为EXACTLY_ONCE语义
        );

        activityStream.map(new MapFunction<LogBean, String>() {
            @Override
            public String map(LogBean bean) throws Exception {
                return JSON.toJSONString(bean);
            }
        }).addSink(kafkaProducer2);

        String path = parameters.getRequired("mainstream.hdfs.out.path");

//        StreamingFileSink<String> streamingFileSink = StreamingFileSink
//                .forRowFormat(new Path(path), new SimpleStringEncoder<String>("UTF-8"))
//                .withRollingPolicy(
//                        DefaultRollingPolicy.builder()
//                                .withRolloverInterval(TimeUnit.SECONDS.toMillis(30)) //
//                                .withInactivityInterval(TimeUnit.SECONDS.toMillis(10))
//                                .withMaxPartSize(1024 * 1024 * 1024)
//                                .build())
//                .build();

        //存储到HDFS中
//        mainStream.map(new MapFunction<LogBean, String>() {
//            @Override
//            public String map(LogBean bean) throws Exception {
//                return JSON.toJSONString(bean);
//            }
//        }).addSink(streamingFileSink);

        //指定文件目录生成的格式
        DateTimeBucketAssigner<LogBean> bucketAssigner = new DateTimeBucketAssigner<>(
                "yyyy-MM-dd--HH-mm",
                ZoneId.of("Asia/Shanghai"));

        //构建一个StreamingFileSink,数据使用Bulk批量写入方式,存储格式为Parquet列式存储
        StreamingFileSink<LogBean> streamingFileSink = StreamingFileSink.
                forBulkFormat(
                        new Path(path), //数据写入的目录
                        ParquetAvroWriters.forReflectRecord(LogBean.class) //以Parquet格式写入
                )
                .withBucketAssigner(bucketAssigner).build();

        FlinkUtils.env.execute();
    }
}
2、主线代码中用到的ToJSONMapFunction类
  • 将一行数据转换成标准json
  • 转换成 LogBean
  • 根据经纬度,请求高德API,查询地理位置,存入Redis中
  • LogBean中关联地理位置信息
  • Event 的Map集合也可以这样:Map<String, Object> event = jsonObject.getJSONObject(“event”).getInnerMap()
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.oracle.javafx.jmx.json.JSONException;
import org.apache.commons.httpclient.HttpClient;
import org.apache.flink.api.common.functions.RichMapFunction;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.configuration.Configuration;
import redis.clients.jedis.Jedis;
import java.util.HashMap;

public class ToJSONMapFunction extends RichMapFunction<String, LogBean> {

    private transient Jedis jedis = null;

    private transient HttpClient httpClient = null;

    @Override
    public void open(Configuration parameters) throws Exception {
        //获取一个HTTP连接、Redis的连接
        ParameterTool params = (ParameterTool) getRuntimeContext()
                .getExecutionConfig()
                .getGlobalJobParameters(); //获取全局的参数
        String host = params.getRequired("redis.host");
        String password = params.getRequired("redis.password");
        int db = params.getInt("redis.db", 0);
        jedis = new Jedis(host, 6379, 5000);
        jedis.auth(password);
        jedis.select(db);

        //访问高德地图API
        httpClient = new HttpClient();
    }

    @Override
    public LogBean map(String line) throws Exception {
        //判断jedis是否断开,断开重连
        if(!jedis.isConnected()) {
            jedis.connect();
        }

        LogBean logBean = null;
        try {
            String[] fields = line.split(" --> ");
            String dateTime = fields[0];
            String dt = dateTime.split(" ")[0];
            String json = fields[1];
            //使用FastJSON解析
            JSONObject jsonObj = JSON.parseObject(json);

            JSONObject uObj = jsonObj.getJSONObject("u");
            JSONObject phoneObj = uObj.getJSONObject("phone");
            JSONObject locObj = uObj.getJSONObject("loc");
            JSONObject appObj = uObj.getJSONObject("app");

            // 取出user对象中的扁平字段
            String id = uObj.getString("id");
            String account = uObj.getString("account");
            String sessionId = uObj.getString("sessionId");

            // 取出手机设备信息
            String imei = phoneObj.getString("imei");
            String osName = phoneObj.getString("osName");
            String osVer = phoneObj.getString("osVer");
            String resolution = phoneObj.getString("resolution");
            String androidId = phoneObj.getString("androidId");
            String manufacture = phoneObj.getString("manufacture");
            String deviceId = phoneObj.getString("deviceId");

            // 取出loc位置信息
            String areacode = locObj.getString("areacode");
            double longtitude = locObj.getDouble("longtitude");
            double latitude = locObj.getDouble("latitude");

            //根据经纬度查找省份、市、商圈
            String[] areaInfo = GeoUtils.getAreaInfoByLongitudeAndLatitude(httpClient, jedis, longtitude, latitude);

            String province = areaInfo[0];
            String city = areaInfo[1];
            String district = areaInfo[2];
            String bizNames = areaInfo[3];

            String carrier = locObj.getString("carrier");
            String netType = locObj.getString("netType");
            String cid_sn = locObj.getString("cid_sn");
            String ip = locObj.getString("ip");

            // 取出app各个字段
            String appid = appObj.getString("appid");
            String appVer = appObj.getString("appVer");
            String release_ch = appObj.getString("release_ch");
            String promotion_ch = appObj.getString("promotion_ch");

            //事件类型
            String logType = jsonObj.getString("logType");
            //提交时间
            long commit_time = jsonObj.getLong("commit_time");

            JSONObject eventObj = jsonObj.getJSONObject("event");
            // 构造一个用于装event数据的hashmap
            HashMap<String, String> eventMap = new HashMap<>();
            // 迭代取出event中每一对kv
            for (String k : eventObj.keySet()) {
                String v = eventObj.getString(k);
                // 添加到hashmap中
                eventMap.put(k, v);
            }
            // 组装数据并返回
            logBean = new LogBean(id,
                    account,
                    sessionId,
                    imei,
                    osName,
                    osVer,
                    resolution,
                    androidId,
                    manufacture,
                    deviceId,
                    areacode,
                    longtitude,
                    latitude,
                    province,
                    city,
                    district,
                    bizNames,
                    carrier,
                    netType,
                    cid_sn,
                    ip,
                    appid,
                    appVer,
                    release_ch,
                    promotion_ch,
                    logType,
                    commit_time,
                    dt,
                    eventMap
            );

        } catch (JSONException e) {
            e.printStackTrace();
            System.out.println("XXXXXXXX ====>" + line);
            //logger.error("parse json error -->" + line)
            //写入到HDFS的指定目录、Hbase
        }
        return logBean;
    }

    @Override
    public void close() throws Exception {
       //关闭Http连接和Redis连接
        jedis.close();
        httpClient = null;
    }
}
3、主线代码中用的FlinkUtils
  • 创造KafkaSourced的,相当于Kafka的消费者
import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.common.serialization.DeserializationSchema;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.api.java.utils.ParameterTool;
import org.apache.flink.runtime.state.filesystem.FsStateBackend;
import org.apache.flink.streaming.api.CheckpointingMode;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.CheckpointConfig;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer;
import java.util.Arrays;
import java.util.List;
import java.util.Properties;

public class FlinkUtils {

    public static final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

    public static <T> DataStream<T> createKafkaStream(ParameterTool parameters, Class<? extends DeserializationSchema<T>> clazz) throws Exception{

        //如果开启Checkpoint,偏移量会存储到哪呢?
      env.enableCheckpointing(parameters.getLong("checkpoint.interval", 300000));
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
        //就是将job cancel后,依然保存对应的checkpoint数据
        env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);
        String checkPointPath = parameters.get("checkpoint.path");
        if(checkPointPath != null) {
            env.setStateBackend(new FsStateBackend(checkPointPath));
        }
        int restartAttempts = parameters.getInt("restart.attempts", 30);
        int delayBetweenAttempts = parameters.getInt("delay.between.attempts", 30000);
        env.setRestartStrategy(RestartStrategies.fixedDelayRestart(restartAttempts, delayBetweenAttempts));
        Properties properties = parameters.getProperties();
        String topics = parameters.getRequired("kafka.topics");
        List<String> topicList = Arrays.asList(topics.split(","));
        FlinkKafkaConsumer<T> flinkKafkaConsumer = new FlinkKafkaConsumer<T>(topicList, clazz.newInstance(), properties);
        //在Checkpoint的时候将Kafka的偏移量保存到Kafka特殊的Topic中,默认是true
        flinkKafkaConsumer.setCommitOffsetsOnCheckpoints(false);
        return env.addSource(flinkKafkaConsumer);
    }
}
4、ToJSONMapFunction用到的GeoUtils类
  • 将经纬度使用GeoHash进行编码
  • 根据GeoHash编码从Redis查找
  • 如果没找到,就通过HttpClient请求高德API
  • 将查到的信息,保存至Redis中
import ch.hsr.geohash.GeoHash;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.methods.GetMethod;
import org.apache.commons.lang3.StringUtils;
import redis.clients.jedis.Jedis;
import java.util.ArrayList;

public class GeoUtils {
    public static final String key = "9823xxxxxxxxxxxxxx31d";

    public static String[] getAreaInfoByLongitudeAndLatitude(HttpClient httpClient, Jedis jedis, double longitude, double latitude) {
        String province = "";
        String city = "";
        String district = "";
        String bizNames = "";
        //将经纬度使用GEOHash进行编码

        try {
            GeoHash geoHash = GeoHash.withCharacterPrecision(latitude, longitude, 8);
            String base32Code = geoHash.toBase32();
            //根据geoHash的编码,到Redis中进行查找
            String areaInfo = jedis.get(base32Code);
            //{wx4sqk42 -> 省,市,区|商圈1,商圈2}
            //如果有就关联地理信息和商圈信息
            if (areaInfo != null) {
                String[] fields = areaInfo.split("[|]");
                String area = fields[0];
                //判断是否有商圈信息
                if (fields.length > 1) {
                    bizNames = fields[1];
                }
                String[] pcd = area.split(",");
                province = pcd[0];
                city = pcd[1];
                district = pcd[2];
            } else {
                //如果没有查找到,就通过httpclient请求高德的API
                //通过外网调用高德的API
                //构造一个get对象
                GetMethod getMethod = new GetMethod("https://restapi.amap.com/v3/geocode/regeo?key="+ key +"&location=" + longitude + "," + latitude);
                //发送请求
                int status = httpClient.executeMethod(getMethod);

                if (status == 200) {
                    //获取请求的json字符串
                    String jsonStr = getMethod.getResponseBodyAsString();
                    //转成json对象
                    JSONObject jsonObj = JSON.parseObject(jsonStr);
                    //获取位置信息
                    JSONObject regeocode = jsonObj.getJSONObject("regeocode");

                    if (regeocode != null && !regeocode.isEmpty()) {

                        JSONObject address = regeocode.getJSONObject("addressComponent");
                        //获取省市区、商圈信息
                        province = address.getString("province");
                        city = address.getString("city");
                        district = address.getString("district");

                        ArrayList<String> lb = new ArrayList<>();

                        //商圈数组(多个)
                        JSONArray businessAreas = address.getJSONArray("businessAreas");

                        for (int i = 0; i < businessAreas.size(); i++) {

                            JSONObject businessArea = null;
                            try {
                                businessAreas.getJSONObject(i);
                            } catch (Exception e) {
                                //e.printStackTrace();
                            }
                            if (businessArea != null) {

                                String businessName = businessArea.getString("name");

                                String longitudeAndLatitude = businessArea.getString("location");

                                String[] fds = longitudeAndLatitude.split(",");

                                lb.add(businessName);

                                //将商圈的经纬度使用GeoHash进行编码
                                GeoHash geohash = GeoHash.withCharacterPrecision(Double.parseDouble(fds[1]), Double.parseDouble(fds[0]), 8);
                                //将地理位置信息和商圈信息更新到Redis中

                                //更新完善本地的商圈知识库
                                //将查询处理的商圈的实际信息保存到Redis
                                jedis.set(geohash.toBase32(), province + "," + city + "," + district + "|" + businessName);
                            }
                        }
                        bizNames = StringUtils.join(lb.toArray(), ",");

                        jedis.set(base32Code, province + "," + city + "," + district + "|" + bizNames);
                    }

                }

            }
        } catch (Exception e) {
            e.printStackTrace();
        }

        return new String[]{province, city, district, bizNames};
    }
}
5、LogBean
import java.util.HashMap;
public class LogBean {
    private String id;
    private String account;
    private String sessionId;
    private String imei;
    private String osName;
    private String osVer;
    private String resolution; //屏幕分辨率
    private String androidId = "";
    private String manufacture; //手机厂商
    private String deviceId = "";
    private String areacode;
    private double longtitude;
    private double latitude;
    private String province;
    private String city;
    private String district;
    private String bizNames; //商圈名称
    private String carrier; //sim卡类型
    private String netType;
    private String cid_sn; //基站ID
    private String ip;
    private String appid;
    private String appVer;
    private String release_ch;
    private String promotion_ch;
    private String logType;
    private long commit_time;
    private String dt; //分区字段 2019-05-08
    private HashMap<String, String> event;

    public LogBean(String id, String account, String sessionId, String imei, String osName, String osVer, String resolution, String androidId, String manufacture, String deviceId,
            String areacode, double longtitude, double latitude, String province, String city, String district, String bizNames, String carrier, String netType, String cid_sn,
            String ip, String appid, String appVer, String release_ch, String promotion_ch, String logType, long commit_time, String dt, HashMap<String, String> event) {
        this.id = id;
        this.account = account;
        this.sessionId = sessionId;
        this.imei = imei;
        this.osName = osName;
        this.osVer = osVer;
        this.resolution = resolution;
        this.androidId = androidId;
        this.manufacture = manufacture;
        this.deviceId = deviceId;
        this.areacode = areacode;
        this.longtitude = longtitude;
        this.latitude = latitude;
        this.province = province;
        this.city = city;
        this.district = district;
        this.bizNames = bizNames;
        this.carrier = carrier;
        this.netType = netType;
        this.cid_sn = cid_sn;
        this.ip = ip;
        this.appid = appid;
        this.appVer = appVer;
        this.release_ch = release_ch;
        this.promotion_ch = promotion_ch;
        this.logType = logType;
        this.commit_time = commit_time;
        this.dt = dt;
        this.event = event;
    }
    
    public String getId() {
        return id;
    }
    public void setId(String id) {
        this.id = id;
    }
    public String getAccount() {
        return account;
    }
    public void setAccount(String account) {
        this.account = account;
    }
    public String getSessionId() {
        return sessionId;
    }
    public void setSessionId(String sessionId) {
        this.sessionId = sessionId;
    }
    public String getImei() {
        return imei;
    }
    public void setImei(String imei) {
        this.imei = imei;
    }
    public String getOsName() {
        return osName;
    }
    public void setOsName(String osName) {
        this.osName = osName;
    }
    public String getOsVer() {
        return osVer;
    }
    public void setOsVer(String osVer) {
        this.osVer = osVer;
    }
    public String getResolution() {
        return resolution;
    }
    public void setResolution(String resolution) {
        this.resolution = resolution;
    }
    public String getAndroidId() {
        return androidId;
    }
    public void setAndroidId(String androidId) {
        this.androidId = androidId;
    }
    public String getManufacture() {
        return manufacture;
    }
    public void setManufacture(String manufacture) {
        this.manufacture = manufacture;
    }
    public String getDeviceId() {
        return deviceId;
    }
    public void setDeviceId(String deviceId) {
        this.deviceId = deviceId;
    }
    public String getAreacode() {
        return areacode;
    }
    public void setAreacode(String areacode) {
        this.areacode = areacode;
    }
    public double getLongtitude() {
        return longtitude;
    }
    public void setLongtitude(double longtitude) {
        this.longtitude = longtitude;
    }
    public double getLatitude() {
        return latitude;
    }
    public void setLatitude(double latitude) {
        this.latitude = latitude;
    }
    public String getProvince() {
        return province;
    }
    public void setProvince(String province) {
        this.province = province;
    }
    public String getCity() {
        return city;
    }
    public void setCity(String city) {
        this.city = city;
    }
    public String getDistrict() {
        return district;
    }
    public void setDistrict(String district) {
        this.district = district;
    }
    public String getBizNames() {
        return bizNames;
    }
    public void setBizNames(String bizNames) {
        this.bizNames = bizNames;
    }
    public String getCarrier() {
        return carrier;
    }
    public void setCarrier(String carrier) {
        this.carrier = carrier;
    }
    public String getNetType() {
        return netType;
    }
    public void setNetType(String netType) {
        this.netType = netType;
    }
    public String getCid_sn() {
        return cid_sn;
    }
    public void setCid_sn(String cid_sn) {
        this.cid_sn = cid_sn;
    }
    public String getIp() {
        return ip;
    }
    public void setIp(String ip) {
        this.ip = ip;
    }
    public String getAppid() {
        return appid;
    }
    public void setAppid(String appid) {
        this.appid = appid;
    }
    public String getAppVer() {
        return appVer;
    }
    public void setAppVer(String appVer) {
        this.appVer = appVer;
    }
    public String getRelease_ch() {
        return release_ch;
    }
    public void setRelease_ch(String release_ch) {
        this.release_ch = release_ch;
    }
    public String getPromotion_ch() {
        return promotion_ch;
    }
    public void setPromotion_ch(String promotion_ch) {
        this.promotion_ch = promotion_ch;
    }
    public String getLogType() {
        return logType;
    }
    public void setLogType(String logType) {
        this.logType = logType;
    }
    public long getCommit_time() {
        return commit_time;
    }
    public void setCommit_time(long commit_time) {
        this.commit_time = commit_time;
    }
    public String getDt() {
        return dt;
    }
    public void setDt(String dt) {
        this.dt = dt;
    }
    public HashMap<String, String> getEvent() {
        return event;
    }
    public void setEvent(HashMap<String, String> event) {
        this.event = event;
    }

    @Override
    public String toString() {
        return "LogBean{" + "id='" + id + '\'' + ", account='" + account + '\'' + ", sessionId='" + sessionId + '\'' + ", imei='" + imei + '\'' + ", osName='" + osName + '\'' + ", osVer='" + osVer + '\'' + ", resolution='" + resolution + '\'' + ", androidId='" + androidId + '\'' + ", manufacture='" + manufacture + '\'' + ", deviceId='" + deviceId + '\'' + ", areacode='" + areacode + '\'' + ", longtitude=" + longtitude + ", latitude=" + latitude + ", province='" + province + '\'' + ", city='" + city + '\'' + ", district='" + district + '\'' + ", bizNames='" + bizNames + '\'' + ", carrier='" + carrier + '\'' + ", netType='" + netType + '\'' + ", cid_sn='" + cid_sn + '\'' + ", ip='" + ip + '\'' + ", appid='" + appid + '\'' + ", appVer='" + appVer + '\'' + ", release_ch='" + release_ch + '\'' + ", promotion_ch='" + promotion_ch + '\'' + ", logType='" + logType + '\'' + ", commit_time=" + commit_time + ", dt='" + dt + '\'' + ", event=" + event + '}';
    }
}

四、技术点

  • HttpClient请求高德API
  • 使用Jedis(java客户端)连接Redis以及使用
  • FastJson解析字符串
  • GeoHash字典编码
  • Flink中 KafkaSource 和 KafkaSink的使用
  • Flink的Process测流输出
  • Flink的streamingFileSink 分别以文本形式和Parquet形式写入HDFS
//设定root用户,拿到读写的权限
System.setProperty("HADOOP_USER_NAME", "root");

五、Maven依赖相关

		<!--连接Redis的依赖 -->
		<dependency>
			<groupId>org.apache.bahir</groupId>
			<artifactId>flink-connector-redis_2.11</artifactId>
			<version>1.1-SNAPSHOT</version>
		</dependency>


		<!--发送HTTP请求的Java工具包 -->
		<dependency>
			<groupId>org.apache.httpcomponents</groupId>
			<artifactId>httpclient</artifactId>
			<version>4.5.7</version>
		</dependency>


		<!--GEOHASH  编码工具包 -->
		<dependency>
			<groupId>ch.hsr</groupId>
			<artifactId>geohash</artifactId>
			<version>1.3.0</version>
		</dependency>
		

		<!-- 解析JSON的依赖 -->
		<dependency>
			<groupId>com.alibaba</groupId>
			<artifactId>fastjson</artifactId>
			<version>1.2.57</version>
		</dependency>


		<!-- 将数据以Parquet格式写入到HDFS中的依赖 -->
		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-parquet_2.12</artifactId>
			<version>${flink.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-avro</artifactId>
			<version>${flink.version}</version>
		</dependency>

		<dependency>
			<groupId>org.apache.parquet</groupId>
			<artifactId>parquet-avro</artifactId>
			<version>1.11.0</version>
		</dependency>

		<dependency>
			<groupId>org.apache.parquet</groupId>
			<artifactId>parquet-hadoop</artifactId>
			<version>1.11.0</version>
		</dependency>

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值