Flink1.11.2-事件时间分片存储文件

本文介绍了Flink 1.11.2版本中关于事件时间的分片存储文件机制。通过ToHdfs、EventTimeBucketer和PageLog等组件,详细阐述了如何将数据按照事件时间进行切分并存储到HDFS上,同时提及了从Kafka数据源的数据格式转换,并提供了pom.xml配置示例。
摘要由CSDN通过智能技术生成

Flink1.11.2-事件时间分片存储文件

参考链接:https://developer.aliyun.com/article/719786

ToHdfs

package com.toHdfs;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.bean.PageLog;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.datastream.SingleOutputStreamOperator;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.fs.StringWriter;
import org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer011;
import java.util.Properties;
public class ToHdfs {
   
    public static void main(String[] args) {
   
        // 创建执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        Properties properties = new Properties();
        properties.setProperty("bootstrap.servers", "192.168.1.162:9092");
        properties.setProperty("zookeeper.connect", "192.168.1.162:2181");
        properties.setProperty("group.id", "FromKafka001");
        properties.setProperty("key.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.setProperty("value.deserializer",
                "org.apache.kafka.common.serialization.StringDeserializer");
        properties.setProperty("auto.offset.reset", "latest");
        FlinkKafkaConsumer011 kafkaSource = new FlinkKafkaConsumer011<String>("pageLog", new
                SimpleStringSchema(), properties);
        DataStreamSource<String> dsJsonStr = env.addSource(kafkaSource);
        SingleOutputStreamOperator<PageLog> dsStr = dsJsonStr.map(new MapFunction<String, PageLog>() {
   
            @Override
            public PageLog map(String in) throws
                    Exception {
   
                System.out.println(in);
                JSONObject js = JSON.parseObject(in);
                PageLog pageLog = new PageLog();
                pageLog.setApp_id(js.getString("app_id").toString());
                pageLog.setDevId(js.getString("device_id").toString());
                pageLog.setPageId(js.getString("page_id").toString());
                pageLog.setuId(js.getLong("uid"));
                //System.out.println(pageLog.getuId());
                return pageLog;
            }
        });
        BucketingSink<PageLog> sink = new BucketingSink<>("hdfs://192.168.1.162:8020/kafkaTohdfs/");
//通过这样的方式来实现数据跨天分区
//        sink.setBucketer(new EventTimeBucketer<PageLog>("yyyy/MM/dd"));
        sink.setBucketer(new EventTimeBuc
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值